Diffusion Models Revolutionize Real-Time Game Engines

BY Mark Howell 28 August 20243 MINS READ
article cover

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

Full Gameplay Videos Architecture

Data Collection via Agent Play

Since we cannot collect human gameplay at scale, as a first stage we train an automatic RL-agent to play the game, persisting its training episodes of actions and observations, which become the training data for our generative model.

Training the Generative Diffusion Model

We re-purpose a small diffusion model, Stable Diffusion v1.4, and condition it on a sequence of previous actions and observations (frames). To mitigate auto-regressive drift during inference, we corrupt context frames by adding Gaussian noise to encoded frames during training. This allows the network to correct information sampled in previous frames, and we found it to be critical for preserving visual stability over long time periods.

Latent Decoder Fine-Tuning

The pre-trained auto-encoder of Stable Diffusion v1.4, which compresses 8x8 pixel patches into 4 latent channels, results in meaningful artifacts when predicting game frames, which affect small details and particularly the bottom bar HUD. To leverage the pre-trained knowledge while improving image quality, we train just the decoder of the latent auto-encoder using an MSE loss computed against the target frame pixels.

BibTeX Acknowledgements

Image Description: A side-by-side comparison of the classic game DOOM and its simulation by GameNGen.
Remember these 3 key ideas for your startup:

  1. Harnessing AI for Real-Time Interaction: The development of GameNGen showcases the potential of neural models to enable real-time interaction with complex environments. Startups can leverage similar AI technologies to create more interactive and responsive applications, enhancing user experience and engagement. For more on how AI can be used in project management, check out task automation.

  2. Efficient Data Collection and Training: By using an RL-agent to play and record game sessions, GameNGen efficiently collects training data. This approach can be adopted by startups to gather large datasets without the need for extensive human labor, thus reducing costs and speeding up development. Learn more about how to brand yourself as a remote company.

  3. Improving Visual Stability with Diffusion Models: The use of conditioning augmentations and fine-tuning latent decoders in GameNGen highlights the importance of maintaining visual stability over long periods. Startups can apply these techniques to ensure their AI-driven applications deliver consistent and high-quality visual outputs. For more insights, see the best productivity hacks to get your work done.
    Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.
    For more details, see the original source.

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
CoverEdit PDFs Securely & Freely: Breeze PDF In-Browser SolutionBreeze PDF is a free, offline browser-based PDF editor ensuring privacy. It offers text, image, and signature additions, form fields, merging, page deletion, and password protection without uploads.
BY Mark Howell 16 days ago
CoverDecoding R1: The Future of AI Reasoning ModelsR1 is an affordable, open-source AI model emphasizing reasoning, enabling innovation and efficiency, while influencing AI advancements and geopolitical dynamics.
BY Mark Howell 26 January 2025
CoverSteam Brick: A Minimalist Gaming Console Redefines PortabilitySteam Brick: A modified, screenless Steam Deck for travel, focusing on portability by using external displays and inputs. A creative yet impractical DIY project with potential risks.
BY Mark Howell 26 January 2025
CoverVisual Prompt Injections: Essential Guide for StartupsThe Beginner's Guide to Visual Prompt Injections explores vulnerabilities in AI models like GPT-4V, highlighting security risks for startups and offering strategies to mitigate potential data compromises.
BY Mark Howell 13 November 2024
CoverGraph-Based AI: Pioneering Future Innovation PathwaysGraph-based AI, developed by MIT's Markus J. Buehler, bridges unrelated fields, revealing shared complexity patterns, accelerating innovation by uncovering novel ideas and designs, fostering unprecedented growth opportunities.
BY Mark Howell 13 November 2024
CoverRevolutionary Image Protection: Watermark Anything with Localized MessagesWatermark Anything enables embedding multiple localized watermarks in images, balancing imperceptibility and robustness. It uses Python, PyTorch, and CUDA, with COCO dataset, under CC-BY-NC license.
BY Mark Howell 13 November 2024
CoverJungle Music's Role in Shaping 90s Video Game SoundtracksJungle music in the 90s revolutionized video game soundtracks, enhancing fast-paced gameplay on PlayStation and Nintendo 64, and fostering a cultural revolution through its energetic beats and immersive experiences.
BY Mark Howell 13 November 2024
CoverMastering Probability-Generating Functions: A Guide for EntrepreneursProbability-generating functions (pgfs) are mathematical tools used in probability theory for data analysis, risk management, and predictive modeling, crucial for startups and SMEs in strategic decision-making.
BY Mark Howell 31 October 2024
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now