Today we are going to look at NVIDIA’s incredible new AI that can create images, and more.
Now, wait a second.
Stop right there.
Every Fellow Scholar knows that today, there are plenty of text to image AIs out there, where in goes a piece of text, and out comes an image.
They come in all kinds of flavors these days.
Everyone knows.
So our question today is why publish this paper? Do we really need more of these? Well, this new paper is called StyleGAN-T.
Keep your eyes on this part, because this means that this is a GAN-based technique.
A GAN is a Generative Adversarial Network.
This roughly means that we have two neural networks competing against each other, and as they compete, they get better together.
Okay, that all sounds great, but I am still not convinced.
What does this give us? Why would we even use this? Well, there are two excellent reasons.
Reason number one, GANs are excellent at latent-space interpolation.
What does that mean? It means that we can create these interesting 2D spaces, choose a point on this plane, which in this case, corresponds to a font.
And the points nearby hide other fonts that are similar to this one.
So as we start exploring nearby, we get a beautiful, smooth morphing animation between these fonts.
In our earlier paper, we did something similar with photorealistic material models, so artists can find or even better, adjust a material so that it fits their virtual worlds best.
So this new technique supposedly can do proper latent-space exploration for text to image.
Here is a previous technique, the crowd favorite, Stable Diffusion.
This can make an interesting video, but as you see, the results are quite jumpy.
It doesn’t feel like one result morphs into the next one.
And now, let’s see the new technique.
Oh yes, now that’s what I am talking about! With this, we get more continuous results and can explore these latent spaces as much as we desire, and that is going to be super useful.
You see, what we can do with this is that we write a prompt, for instance, “A corgi’s head depicted as an explosion of a nebula.
” And, we don’t just get an image anymore.
No-no, due to its amazing interpolation capabilities, we get an opportunity to not only witness the birth of the universe, but to choose the good boy that we find to be the most adorable.
I choose this one.
Right before it morphs into a cat.
Yes, this one will do.
Which one is your favorite? Let me know in the comments below.
So its latent-space exploration capability is not only an afterthought here, it is one of the new technique’s key features.
Now, remember, I mentioned that this is reason number one of why we should use it.
So what is reason number two? Well, two, it is fast.
Real fast.
But to know how fast exactly, let’s pop the hood and have a look.
Now hold on to your papers, Fellow Scholars and …what? 0.
1 seconds per image? Is that really possible? Wow.
These animations can be made practically in real time! The age of real-time AI image, and even video synthesis is here.
My goodness! It did not take decades, it didn’t even take years.
Less than a year after OpenAI’s DALL-E 2, which asked for approximately 10-15 seconds per image, we are here.
Real time.
I can’t believe it.
Wow! This is truly incredible.
However, not even this technique is perfect.
Let’s see a failure case.
A sign that says deep learning.
Come on, this one again? Remember our moment with DALL-E 2? It had the same issue.
There are techniques out there that do much better on text, for instance, Imagen Video is better for this, however, it is not nearly as fast as this.
Yes, that one is about a hundred times slower per image.
So, the perfect text to image AI still doesn’t exist, every technique offers its own little tradeoff, but man, are they all getting better and better at an insane pace.
Amazing new papers are popping up every week.
So, what do you think? What would you use this for? Let me know in the comments below! Thanks for watching and for your generous support, and I’ll see you next time!
NVIDIA’s incredible new AI that can create images, and more
Today we are going to look at NVIDIA’s incredible new AI that can create images, and more.
Now, wait a second.
Stop right there.
Every Fellow Scholar knows that today, there are plenty of text to image AIs out there, where in goes a piece of text, and out comes an image.
They come in all kinds of flavors these days.
Everyone knows.
So our question today is why publish this paper? Do we really need more of these? Well, this new paper is called StyleGAN-T.
Keep your eyes on this part, because this means that this is a GAN-based technique.
A GAN is a Generative Adversarial Network.
This roughly means that we have two neural networks competing against each other, and as they compete, they get better together.
Okay, that all sounds great, but I am still not convinced.
What does this give us? Why would we even use this? Well, there are two excellent reasons.
Reason number one, GANs are excellent at latent-space interpolation.
What does that mean? It means that we can create these interesting 2D spaces, choose a point on this plane, which in this case, corresponds to a font.
And the points nearby hide other fonts that are similar to this one.
So as we start exploring nearby, we get a beautiful, smooth morphing animation between these fonts.
In our earlier paper, we did something similar with photorealistic material models, so artists can find or even better, adjust a material so that it fits their virtual worlds best.
So this new technique supposedly can do proper latent-space exploration for text to image.
Supposedly.
Now let’s see if it is true in practice here too.
Here is a previous technique, the crowd favorite, Stable Diffusion.
This can make an interesting video, but as you see, the results are quite jumpy.
It doesn’t feel like one result morphs into the next one.
And now, let’s see the new technique.
Oh yes, now that’s what I am talking about! With this, we get more continuous results and can explore these latent spaces as much as we desire, and that is going to be super useful.
You see, what we can do with this is that we write a prompt, for instance, “A corgi’s head depicted as an explosion of a nebula.
” And, we don’t just get an image anymore.
No-no, due to its amazing interpolation capabilities, we get an opportunity to not only witness the birth of the universe, but to choose the good boy that we find to be the most adorable.
I choose this one.
Right before it morphs into a cat.
Yes, this one will do.
Which one is your favorite? Let me know in the comments below.
So its latent-space exploration capability is not only an afterthought here, it is one of the new technique’s key features.
Now, remember, I mentioned that this is reason number one of why we should use it.
So what is reason number two? Well, two, it is fast.
Real fast.
But to know how fast exactly, let’s pop the hood and have a look.
Now hold on to your papers, Fellow Scholars and …what? 0.
1 seconds per image? Is that really possible? Wow.
These animations can be made practically in real time! The age of real-time AI image, and even video synthesis is here.
My goodness! It did not take decades, it didn’t even take years.
Less than a year after OpenAI’s DALL-E 2, which asked for approximately 10-15 seconds per image, we are here.
Real time.
I can’t believe it.
Wow! This is truly incredible.
However, not even this technique is perfect.
Let’s see a failure case.
A sign that says deep learning.
Come on, this one again? Remember our moment with DALL-E 2? It had the same issue.
There are techniques out there that do much better on text, for instance, Imagen Video is better for this, however, it is not nearly as fast as this.
Yes, that one is about a hundred times slower per image.
So, the perfect text to image AI still doesn’t exist, every technique offers its own little tradeoff, but man, are they all getting better and better at an insane pace.
Amazing new papers are popping up every week.
So, what do you think? What would you use this for? Let me know in the comments below! Thanks for watching and for your generous support, and I’ll see you next time!
Related posts: