LLM Inference — Interactive Simulator

Prompt: "The cat sat" — watch how it flows through the system

📥 Input

🔤 Tokenize

⚡ Prefill

🔄 Decode

📤 Output

loop 0 / 3

CPU

idle

RAM

idle

GPU Compute

idle

VRAM / KV Cache

idle

Tokens

Waiting for input...

KV Cache entries: empty

Click Auto to run the full pipeline, or use Forward / Back to walk through one stage at a time.

4 concurrent users sharing one GPU — watch batching, queuing & KV cache pressure

GPU Util

VRAM / KV Cache

0 GB

Prefill (compute) Decode (memory) Tokenize Queued

GPU Scheduler: idle

Click Start to watch 4 users stream through the pipeline concurrently.