In brief: Whether you love them or hate them, generative AI tools like ChatGPT and Stable Diffusion are here to stay and evolving at a rapid pace. Researchers have been working on new implementations that are slowly coming into focus, such as a new tool called DragGAN that looks like Photoshop's Warp tool on steroids.
By now even the most casual followers of tech news are familiar with generative AI tools like ChatGPT, Stable Diffusion, Midjourney, and DALL-E. Big Tech is racing to develop the best large language models and bake them into every piece of software or web service we use, and a flurry of startups are working on specialized AI tools for a wide variety of niche use cases.
Many of these tools can generate useful images or text using simple prompts that describe what the user wants to find out or the kind of work they're trying to achieve. When it works, this makes services like ChatGPT and DALL-E seem like magic. When it doesn't, we get reminded of how far we are from AI replacing human creativity, if ever. In fact, many of these tools are "trained" on works authored by people and require human supervision to improve their output to a meaningful level.
Have you thought about interactively 'dragging' objects in the image? Our #SIGGRAPH2023 work #DragGAN makes this come true!ðÂ¥³– Xingang Pan (@XingangP) May 19, 2023
Project page: https://t.co/ZqAEPHNMNF https://t.co/UQXarwl481 pic.twitter.com/LrWjEsIVHs
That said, new AI research shows that progress is still being made at a rapid pace, particularly in the area of image manipulation. A group of scientists from Google, MIT, the University of Pennsylvania, and the Max Planck Institute for Informatics in Germany have published a paper detailing an experimental tool that could make image editing easier and more accessible for regular people.
To get an idea of what is possible with the new tool, you can significantly change the appearance of a person or an object by simply clicking and dragging on a particular feature. You can also do things like altering the expression on someone's face, modifying the clothing of a fashion model, or rotating the subject in a photo as if it were a 3D model. The video demos are certainly impressive, though the tool isn't available to the public as of writing this.
This may just look like Photoshop on steroids, but it has generated enough interest to send the research team's website crashing. After all, text prompts may sound simple in theory, but they require a lot of tweaking when you need something very specific or require multiple steps to generate the desired output.
This problem has given rise to a new profession – that of the "AI prompt engineer." Depending on the company and the specifics of the project in question, this kind of job can pay up to $335,000 per year, and it doesn't require a degree.
By contrast, the user interface presented in the demo videos suggests it will soon be possible for the average person to do some of what an AI prompt engineer can do by just clicking and dragging on the first output of any image generation tool. Researchers explain that DragGAN will "hallucinate" occluded content, deform an object, or modify a landscape.
Researchers note that DragGAN can morph the content of an image in just a few seconds when using Nvidia's GeForce RTX 3090 graphics card, as their implementation doesn't need to use multiple neural networks to achieve the desired results. The next step will be to develop a similar model for point-based editing of 3D models. Those of you who want to find out more about DragGAN can read the paper here. The research will also be presented at SIGGRAPH in August.