“With Genie, our future AI agents can be trained in a never-ending curriculum of new, generated worlds,” Google.

The following report is a page on Google’s corporate website:

The last few years have seen an emergence of generative AI, with models capable of generating novel and creative content via language, images, and even videos. Today, we introduce a new paradigm for generative AI, generative interactive environments (Genie), whereby interactive, playable environments can be generated from a single image prompt. 

Genie can be prompted with images it has never seen before, such as real world photographs or sketches, enabling people to interact with their imagined virtual worlds-–essentially acting as a foundation world model. This is possible despite training without any action labels. Instead, Genie is trained from a large dataset of publicly available Internet videos. We focus on videos of 2D platformer games and robotics but our method is general and should work for any type of domain, and is scalable to ever larger Internet datasets. 

https://twitter.com/_rockt/status/1762026090262872161

What makes Genie unique is its ability to learn fine-grained controls exclusively from Internet videos. This is a challenge because Internet videos do not typically have labels regarding which action is being performed, or even which part of the image should be controlled. Remarkably, Genie learns not only which parts of an observation are generally controllable, but also infers diverse latent actions that are consistent across the generated environments.

Amazingly, it only takes a single image to create an entire new interactive environment. This opens the door to a variety of new ways to generate and step into virtual worlds, for instance, we can take a state-of-the-art text-to-image generation model and use it to produce starting frames that we can then bring to life with Genie. Here we generate images with Imagen2 and bring them to life with Genie. 

But it doesn’t stop there, we can even step into human designed creations such as sketches! 🧑‍🎨 Or real world images 🤯.

Genie also has implications for training generalist agents. Previous works have shown that game environments can be an effective testbed for developing AI agents, but we are often limited by the number of games available. With Genie, our future AI agents can be trained in a never-ending curriculum of new, generated worlds. In our paper we have a proof of concept that the latent actions learned by Genie can transfer to real human-designed environments, but this is just scratching the surface of what may be possible in the future.

Finally, while we have focused on results from Platformers on this website, Genie is a general method and can be applied to a multitude of domains without requiring any additional domain knowledge. 

We trained a smaller 2.5B model on action-free videos from RT1. As was the case for Platformers, trajectories with the same latent action sequence typically display similar behaviors. This indicates Genie is able to learn a consistent action space which may be amenable to training embodied generalist agents.

Genie can also simulate deformable objects 👕, a challenging task for human-designed simulators that can instead be learned from data.

Genie introduces the era of being able to generate entire interactive worlds from images or text. We also believe it will be a catalyst for training the generalist AI agents of the future. 🤖

https://www.youtube.com/watch?v=mQzrN4tSgUI

AUTHOR COMMENTARY

But thou, O Daniel, shut up the words, and seal the book, even to the time of the end: many shall run to and fro, and knowledge shall be increased.

Daniel 12:4

Though the end product right now looks fresh paint in the rain, give it some more time and the technology release to the public will probably be able to recreate very detailed videogames and movie reels just from a collection of still images.


[7] Who goeth a warfare any time at his own charges? who planteth a vineyard, and eateth not of the fruit thereof? or who feedeth a flock, and eateth not of the milk of the flock? [8] Say I these things as a man? or saith not the law the same also? [9] For it is written in the law of Moses, Thou shalt not muzzle the mouth of the ox that treadeth out the corn. Doth God take care for oxen? [10] Or saith he it altogether for our sakes? For our sakes, no doubt, this is written: that he that ploweth should plow in hope; and that he that thresheth in hope should be partaker of his hope. (1 Corinthians 9:7-10).

The WinePress needs your support! If God has laid it on your heart to want to contribute, please prayerfully consider donating to this ministry. If you cannot gift a monetary donation, then please donate your fervent prayers to keep this ministry going! Thank you and may God bless you.

CLICK HERE TO DONATE

2 Comments

Leave a Comment

×