How to develop small games with artificial intelligence DeepMind
Today we are going to torment DeepMind’s new AI called Ada with tasks that are impossible. Kind of. And then, see if it is smart enough to solve them.
Now, you see, we talked about previous AI techniques that were able to learn over time. For instance, NVIDIA’s little knights were able to learn to fight by themselves. But this took 10 years of training. Not 10 years in our lives, 10 years in their lives as they live inside a simulation, which, with a quick computer, will only take a few days to simulate.
In a different paper, AI agents started out like this. And over time, they learned to play football and some really advanced techniques. And there was no referee, so they also learned to be not too kind to each other. Ouch. So, how long did this take? Well, these folks also trained for years, in simulation time that is.
And now, with DeepMind’s new AI, this agent will hopefully be able learn cool new things, and hopefully it will not take years. Yes, we are going to build a virtual playhouse, and play a little game. Throughout the game, this little window to the world is what the AI sees. Only we see the whole level.
So now, little AI, you have one job. And that is, to hold the black cube. But, there is a problem. What is the problem? Well, of course, the fact that there is no black cube! Not one in sight. But we have a secret. The secret is that if we touch the black pyramid with the yellow sphere, out comes a black cube. However, psst, this is a secret. The AI does not know about this rule and has to find out by itself. And you know what, let’s make it even worse. If it touches the yellow sphere to the purple pyramid, both get destroyed, making this task impossible to finish. And you know what, let’s make it even worse. Give it a strict 20 second time limit. So, good luck with that, little AI! I can’t wait to see this. This is going to be a lot of fun.
So, let’s see. Round 1. It starts exploring, likely to find the black cube, picks up the yellow sphere, dashes away, and…oh boy! Bad news. Real bad. It proceeds to touch the purple pyramid, you know what that means, right? Oh yes, both of them get destroyed. Little does the AI know, the task is now impossible.
It still tries to combine some objects together, maybe something good happens, but we already know. Nothing good happens here. And then, the time runs out.
Now comes the interesting part. Did it learn anything from this? I am so excited! Let’s start again! Hmm…it says let’s not do what we did earlier. That’s a good start. It takes the black pyramid, and, there we go! Good job! We got the black cube, and now it is time…for a victory dance! Fantastic.
So, is there any point in running round 3? Oh yes. Yes there is! The goal is that we wish to see what it learned from the previous success. Did it do it just by chance? Does it really understand what just happened? Let’s see. Oh boy! It is going straight for the correct answer. It really knows what just happened. It truly uncovered the rules of this game, and now it is busy optimizing its route to solve it even quicker.
I love it. And when running a similar task for two of these Ada agents, they learn the rules of the game independently, but that is expected. However, what is not expected is…look at that! Wow. They learned to throw to get this task done quicker, and later on, they even learned to work together to be even more efficient. So now, yes, it is time to hold on to your papers Fellow Scholars, because what you are seeing here is learning happening not in years, but in a matter of seconds. Learning is so quick here, it is happening right before our eyes. I can’t believe it.
So if we can’t believe this, what do we do? Of course, we make the task even more difficult. The previous task could only be solved by lifting things. So how about creating a level where lifting things make you lose immediately? That sounds fun, right? This new level can only be solved by pushing. Of course, the AI does not know that, so in round 1, it starts out lifting with predictable results. This was not a success. So let’s see what has it learned? Round 2. Look, it starts pushing instead. And when pushing the two cubes together, got ‘em! Good job. That was super quick learning. Once again, the learning happened right before our eyes.
So, what do we do now? Of course, we make it even harder. Let’s add a bunch of unnecessary rules and a ton of objects to distract the AI from finding the yellow pyramid. Which, by the way, doesn’t even exist. Yet. See if it can see through our little tricks. First, some exploration happens, and transmutation also happens. A lot of it. Something touches something, and some other something appears. This doesn’t even seem to make any sense. Note once again that only we see these rules and the whole map. The AI does not know anything about the rules and is playing this game for the first time, and only sees this tiny window to the world. And later, my goodness, look at that! With a stylish move, it chucks away the yellow box that is in the way. Fabulous! Then it holds the purple box, so what happens then? Look at the rules, yes, it makes the highly coveted yellow pyramid appear. Great! Then, it goes straight for the goal. Learning is happening here too…and super quickly!
And in a different cooperative level, each player has to touch their corresponding sphere to get their pyramids, and when these new pyramids touch, we are finished. For the first try, they try to explore and eventually succeed. But do they really know how they succeeded? Do they know which actions were the ones responsible for their success? Wow. For only the second try, they absolutely smashed it. That is just about the quickest and most effective solution that I can imagine. Holy mother of papers. This little AI is learning incredibly quickly.
And I have to be honest, when reading the paper, I was a little worried if it could learn at all. Why? Well, because it is not getting any intermediate rewards. What does that mean? It means that this game is a cruel teacher that does not tell the AI during the game how well it is doing. Only when it won the level does it tell the AI so. Before that, no information is given to the AI whether it is doing well or poorly. This is especially difficult if we need to perform a chain of actions to win the level, like you see here. And, my goodness, I am absolutely stunned that the AI can still do it at all, let alone this quickly.
So, in just one paper, we went from learning in a matter of years to a matter of seconds. Wow. This truly feels like seeing history in the making in artificial intelligence. What a time to be alive!
So, what do you think? What would you use this for? Let me know in the comments below!
Thanks for watching and for your generous support, and I’ll see you next time!