In April 2022, an amazing text to image AI called DALL-E 2 was published by scientists at OpenAI
I have to pinch myself that I am saying but, but here goes. Today we are going to use Google’s amazing new AI to create not an image, and not even a short image sequence, but we are going to make an entire film. An AI-made film, and all this within seconds.
Yes, that’s right. Less than a year ago, in April 2022, an amazing text to image AI called DALL-E 2 was published by scientists at OpenAI. This took a piece of text, and painted an image for us. It ushered in a new era of AI-assisted tools.
And then, something amazing happened. And not 6 years later, but about only 6 months later, in October 2022. This follow up paper could generate not just images, but short videos of our choosing, but even if it did that really well, this was still made to create only that…short sequences. A true miracle of AI research.
However, here is another one. And hold on to your papers, because now, this new technique called Phenaki promises no less than entire films made by an AI. Entire films! Now I am super excited for this, so let’s have a look at 5 of my favorite examples together.
One. The new work’s superpower is that we can write several text prompts, and the AI finds a way to chain them together in a way that makes sense. That sounds…that sounds exactly like the script for a movie. Amazing. If it works, of course. So, first, we can say “A photorealistic teddy bear is swimming in the ocean in San Francisco.” Then, we say “The teddy bear goes underwater.” Then, “The teddy bear keeps swimming under the water with colorful fishes.” And finally, “A panda bear is swimming underwater.”
And the AI finds out by itself how this magical transformation from teddy bear to a panda should take place. I love it. Now, this was really cool, how about another one?
Two. We start out with a side view of an astronaut who is walking through a puddle on Mars. Who then starts dancing. I am particularly interested in these transitionary moments, this one checks out, really cool. Then, he walks his dog on Mars. And look at that! The dog does not seem to just grow out of nothing, but it gradually enters the frame. Once again, a graceful transition. And then, the astronaut and this good boy watch fireworks together. Well, kind of.
Three, I also loved that we can also give the AI a little more explicit instructions, for instance, we can ask it to zoom out at the end of the video when the campfire appears. And it did exactly that. Good job, little AI!
Four, the paper also showcases a two and a half minute long video that starts out with a scenic motorcycle ride, running through the woods and finding beautiful things, of course, robots, and more.
And five, while we look through the beautiful story of this penguin, let’s talk about pixels. These are 128×128 videos. That is not a lot of pixels. Now, of course, these are not super high-quality videos yet, they do not compare to the super-high quality images that text to image AIs are capable of. So why be so excited for this? Well, we are experienced Fellow Scholars here, so what do we do? We apply the First Law of Papers. The First Law Of Papers says that research is a process. Do not look at where we are, look at where we will be two more papers down the line.
So, where will we be a couple more papers down the line? Well, for reference, this is what DALL-E 1 was capable of, and just one more paper down the line, this was possible with DALL-E 2. And looking at the previous results in image generation, we are likely having a DALL-E 1 moment for AI video generation. Yes, this means we are likely on the cusp of a revolution. Just look at how much progress can be made in just one paper. Wow. So, high-quality AI movies are likely not impossible anymore. And not even decades away. They might very well be just months away from now. What a time to be alive!
And, if you wish to be part of this amazing progress in AI research, consider subscribing and becoming a Fellow Scholar, and even better, hitting the bell icon to not miss it.
Now, plus one, this has become one of my favorite parts of the paper. Get this, it can even animate an already existing image of ours given a text prompt. So, we can not only create entire movies, but we can even tell the AI what kind of movie we wish to create. Wow. My goodness.
So, what do you think? What would you use this for? Let me know in the comments below!
Thanks for watching and for your generous support, and I’ll see you next time!