Learning Generative Interactive Environments by Trained Agent Exploration

INSAIT, Sofia University
Under Review

*Indicates Equal Contribution
Your GIF

We present GenieRedux and its variant GenieRedux-G, models based on Genie, that use reinforcement learning agents for data generation. Our evaluation shows that GenieRedux-G (shown above) achieves superior visual fidelity and controllability using the trained agent exploration.

Abstract

World models are increasingly pivotal in interpreting and simulating the rules and actions of complex environments. Genie, a recent model, excels at learning from visually diverse environments but relies on costly human-collected data. We observe that their alternative method of using random agents is too limited to explore the environment. We propose to improve the model by employing reinforcement learning based agents for data generation. This approach produces diverse datasets that enhance the model's ability to adapt and perform well across various scenarios and realistic actions within the environment. In this paper, we first release the model GenieRedux - an implementation based on Genie. Additionally, we introduce GenieRedux-G, a variant that uses the agent's readily available actions to factor out action prediction uncertainty during validation. Our evaluation, including a replication of the Coinrun case study, shows that GenieRedux-G achieves superior visual fidelity and controllability using the trained agent exploration. The proposed approach is reproducable, scalable and adaptable to new types of environments.

Environment control with GenieRedux-G

We base our models on Genie, a model that excels at learning to apply frame-by-frame motion control across visually diverse environments. However, in contrast, our best model - GenieRedux-G-TA, is trained on data, generated by RL agent exploration, rather than a random agent or costly human demonstrations. In addition, it is guided by the readily available agent's actions.

We demonstrate that GenieRedux-G-TA is able to represent all actions from the Coinrun case study environment, independent of visual appearance.

Your GIF

BibTeX

@misc{2409.06445,
        Author = {Naser Kazemi and Nedko Savov and Danda Paudel and Luc Van Gool},
        Title = {Learning Generative Interactive Environments By Trained Agent Exploration},
        Year = {2024},
        Eprint = {arXiv:2409.06445},
        }