CHIP-8 in ONNX
A complete CHIP-8 emulator implemented as a pure ONNX computation graph. No custom operators, no execution-provider extensions, no Python in the hot loop β the entire CPU lives inside the model. Standard ONNX Runtime 1.26 CPU EP runs it unmodified.
This is not a machine-learning model. There are no weights, no training,
no inference in the statistical sense. It is a CPU expressed as a
computation graph, because it turns out ONNX has all the primitives a
CPU needs: bitwise ops, indexed memory access, conditional dispatch, and
a Loop operator that's Turing-complete with the rest of the op set.
The image above is the output of Run() on chip8_snake_demo.onnx
β a uint8[90, 32, 64] tensor returned in one call, with no inputs.
Models
| File | Inputs | Outputs | Notes |
|---|---|---|---|
chip8_cpu.onnx |
RAM + register state + key state + trip count | Updated RAM + register state | Load any CHIP-8 ROM into RAM, call once per game tick |
chip8_snake_demo.onnx |
(none β fully baked) | uint8[90, 32, 64] frame stack |
Single Run() returns a 90-frame movie of the Snake title screen |
The two models share the same inner CPU. The demo wraps that CPU in an
outer Loop whose body executes 30 instructions per frame and whose
scan output is the framebuffer β that's how ONNX naturally accumulates
"one frame per outer iteration" into a single tensor.
How it works
State
CHIP-8 has 4 KB of RAM, sixteen 8-bit registers, a 12-bit program counter,
a 12-bit index register, a tiny stack, two 8-bit timers, and a 64Γ32
monochrome display. All of it lives in three tensors that flow through the
Loop as carried dependencies:
| Tensor | Shape | Dtype | Holds |
|---|---|---|---|
regs |
[40] |
int32 |
V0..VF, I, PC, SP, DT, ST, RNG seed, tick counter, stack[16] |
ram |
[4096] |
uint8 |
Program code, font, sprite data, working memory |
display |
[2048] |
uint8 |
64Γ32 framebuffer, one byte per pixel |
The Loop body β one CHIP-8 instruction per iteration
Each iteration of the inner Loop fetches, decodes, and executes one
CHIP-8 instruction.
The dispatch is branchless: every opcode subgraph runs every iteration,
and a chain of Where ops at the end picks the one whose pattern matches.
This trades wasted work for a flat, regular graph that's much easier to
read than a 35-deep nested If ladder β and it doesn't actually cost more
in practice, because the per-node overhead of ONNX Runtime's Loop is the
dominant cost anyway.
The outer structure β wrapping the CPU into a movie
Loop in ONNX has two output kinds:
- Carried outputs β values threaded between iterations (here:
regs,ram,display). - Scan outputs β values emitted per iteration and concatenated along a new leading axis (here: the framebuffer).
The movie model exploits scan outputs: one outer iteration = one frame
emitted = one row of the final frames tensor. There is no Python loop
anywhere in this pipeline; the entire 90-frame animation is produced
inside a single sess.run() call.
What's in the model file
A Loop operator wrapping a single GraphProto body. The body has
~600 nodes β mostly Gather, ScatterND, BitShift, BitwiseAnd,
Equal, and Where. No node is a custom op. The whole chip8_cpu.onnx
file is ~40 KB.
Usage
Run the bundled demo
import onnxruntime as ort
import numpy as np
from PIL import Image
sess = ort.InferenceSession("chip8_snake_demo.onnx",
providers=["CPUExecutionProvider"])
frames, = sess.run(None, {}) # no inputs!
print(frames.shape, frames.dtype)
# (90, 32, 64) uint8
# Save the final frame
final = (frames[-1] > 0).astype(np.uint8) * 255
Image.fromarray(final, mode="L").resize((512, 256)).save("snake_frame.png")
That's the entire usage. No tokenizer, no preprocessing, no postprocessing
β Run() returns pixels.
Load any CHIP-8 ROM into the generic CPU
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("chip8_cpu.onnx",
providers=["CPUExecutionProvider"])
# Initial state
def initial_ram(rom: bytes) -> np.ndarray:
FONT = bytes.fromhex("F0909090F02060202070F010F080F0F010F010F0"
"9090F01010F080F010F0F080F090F0F010204040"
"F090F090F0F090F010F0F090F09090E090E090E0"
"F0808080F0E0909090E0F080F080F0F080F08080")
ram = np.zeros(4096, dtype=np.uint8)
ram[0x50:0x50+80] = np.frombuffer(FONT, dtype=np.uint8)
ram[0x200:0x200+len(rom)] = np.frombuffer(rom, dtype=np.uint8)
return ram
regs = np.zeros(40, dtype=np.int32)
regs[17] = 0x200 # PC
regs[21] = 0xAB # RNG seed
ram = initial_ram(open("snake.ch8", "rb").read())
display = np.zeros(2048, dtype=np.uint8)
keys = np.zeros(16, dtype=np.uint8)
# Run 30 CHIP-8 instructions per tick
for tick in range(60):
regs, ram, display = sess.run(None, {
"regs_in": regs,
"ram_in": ram,
"display_in": display,
"keys": keys,
"trip_count": np.array(30, dtype=np.int64),
})
# `display` is now a uint8[2048] framebuffer β reshape to (32, 64) to view.
A bundled ROM (snake.ch8, public domain) is included so you can try this
straight away.
Why this exists
It's a question about what ONNX is. The ONNX operator set, once it grew
Loop, If, the Bitwise* family (opset 18) and ScatterND with
reduction modes, became Turing-complete in any reasonable sense of the
phrase. This model demonstrates the consequence: ONNX Runtime, designed
for evaluating neural networks, can also evaluate arbitrary
computations β including a working game console β without modification.
Concretely the project exists to:
- Probe how far the standard ONNX op set actually goes as a general computation target.
- Demonstrate that
Loop+Scan outputgive you a clean way to express "run a program for N steps, return one tensor per step" in a singleRun()call. - Provide a tiny, complete, self-contained reference for anyone who wants to do non-ML things with ONNX.
If you want to play CHIP-8 games, there are a hundred better emulators. If you want to see what happens when you treat ONNX as a programming language, you're in the right place.
Performance
Measured on a Windows ARM64 laptop with ONNX Runtime 1.26 CPU EP, opset 21:
| Workload | Throughput |
|---|---|
| CHIP-8 instructions per second | ~2,500 |
| Full snake-title demo (90 frames Γ 30 ipf = 2,700 instructions) | ~1.1 s |
| Inner Loop body | ~600 ONNX nodes |
| Generic CPU model file size | ~40 KB |
| Snake demo model file size | ~48 KB (includes ROM) |
This is plenty fast for CHIP-8 β most CHIP-8 games target 500β1000 Hz CPU
and the model handily exceeds that. ONNX-as-a-CPU is not, however, going
to be competitive with anything that wants to run a real-time emulator
properly; per-node overhead in Loop bodies dominates everything.
What's inside the box
.
βββ chip8_cpu.onnx # Generic CHIP-8 CPU (40 KB)
βββ chip8_snake_demo.onnx # Self-contained Snake-title movie (48 KB)
βββ snake.ch8 # Public-domain Snake ROM (1.4 KB)
βββ example_output.gif # What you get when you Run() the demo
βββ README.md # This file
License
- Code & model files: MIT.
- Bundled
snake.ch8ROM: CC0 (from JohnEarnest/chip8Archive).
Credits
- CHIP-8 was created by Joseph Weisbecker in 1977 for the COSMAC VIP.
snake.ch8is by John Earnest, CC0.- Built with the standard ONNX op set (opset 21) and tested with ONNX Runtime 1.26.

