LLM Inference — Interactive Simulator
Prompt: "The cat sat" — watch how it flows through the system
📥 Input
🔤 Tokenize
⚡ Prefill
🔄 Decode
📤 Output
loop
0 / 3
Tokens
Waiting for input...
KV Cache entries:
empty
Click Auto to run the full pipeline, or use Forward / Back to walk through one stage at a time.
4 concurrent users sharing one GPU — watch batching, queuing & KV cache pressure
GPU Scheduler:
idle
Click Start to watch 4 users stream through the pipeline concurrently.