World models are increasing in importance for interpreting and simulating the rules and actions of complex environments. Genie, a recent model, excels at learning from visually diverse environments but relies on costly human-collected data. We observe that their alternative method of using random agents is too limited to explore the environment. We propose to improve the model by employing reinforcement learning based agents for data generation. This approach produces diverse datasets that enhance the model’s ability to adapt and perform well across various scenarios and realistic actions within the environment. In this paper, we first build, evaluate and release the model GenieRedux - a complete reproduction of Genie. Additionally, we introduce GenieRedux-G, a variant that uses the agent’s readily available actions to factor out action prediction uncertainty during validation. Our evaluation, including a replication of the Coinrun case study, shows that GenieRedux-G achieves superior visual fidelity and controllability using the trained agent exploration. The proposed approach is reproducable, scalable and adaptable to new types of environments. Our codebase is available at https://github.com/insait- institute/GenieRedux.
While GenieRedux-G gives us results with superior to GenieRedux's visual fidelity and controllability, we show that it represents well the environment's actions and is able to replicate a ground truth seuquence of observations, given the actions.
We demonstrate that GenieRedux-G-TA is able to represent all actions from the Coinrun case study environment, independent of visual appearance.
In addition, we show that GenieRedux-G-TA is able to reconstruct a ground truth sequence of observations, given the actions and the first observation.
In this work, GenieRedux is our baseline. We implemented all three components - the Tokenizer, the Latent Action Model and the Dynamics Module, and we evaluated it on the CoinRun case study from the original paper. We validate the correctness of our implementation by demonstrating its performance.
Genie's approach is to collect human demonstration videos, which is time-consuming and expensive. In their case study, they use a random agent, which is limited in exploration abilities. We propose to use reinforcement learning agents for data generation. This approach effortlessly produces diverse datasets that enhance the model’s ability to adapt and perform well across various scenarios and actions within the environment.
We perform qualitative evaluation of visual fidelity (FID, PSNR, SSIM), and controllability (ΔPSNR).
We show that GenieRedux-G achieves superior visual fidelity than GenieRedux. As GenieRedux relies on predictions from LAM, it can be affected by the uncertainty in the action prediction. GenieRedux-G uses the agent's readily available actions to factor out action prediction uncertainty during validation.
In addition, we show that the models utilizing trained agent actions (-TA models) significantly outperforms the ones using random agent actions (-Base), both in terms of visual fidelity and controllability.This shows that trained agents helps our models perform better when subjected to more diverse situations in an environment.
@inproceedings{kazemi2024learning,
title={Learning Generative Interactive Environments By Trained Agent Exploration},
author={Kazemi, Naser and Savov, Nedko and Paudel, Danda Pani and Van Gool, Luc},
booktitle={NeurIPS 2024 Workshop on Data-driven and Differentiable Simulations, Surrogates, and Solvers}
}