LLM Inference — Interactive Simulator

Prompt: "The cat sat" — watch how it flows through the system

📥 Input
🔤 Tokenize
⚡ Prefill
🔄 Decode
📤 Output
loop 0 / 3
CPU
idle
RAM
idle
GPU Compute
idle
VRAM / KV Cache
idle
Tokens
Waiting for input...
KV Cache entries: empty
Click Auto to run the full pipeline, or use Forward / Back to walk through one stage at a time.

4 concurrent users sharing one GPU — watch batching, queuing & KV cache pressure

GPU Util
0%
VRAM / KV Cache
0 GB
Prefill (compute) Decode (memory) Tokenize Queued
GPU Scheduler: idle
Click Start to watch 4 users stream through the pipeline concurrently.