After all transformer blocks, you have 50,000+ scores. One for each token in the vocabulary.

Now pick one. This choice shapes everything.

Pick the highest every time? Boring, repetitive text.

Pick randomly? Incoherent nonsense.

Sampling strategies find the balance.

Temperature controls randomness. Low temperature = focused and predictable. High temperature = creative and diverse.

Top-p adapts to the model’s confidence. When it’s sure, sample from few tokens. When it’s uncertain, sample from many.

Combine them: Apply temperature, filter with top-p, sample from what remains.

Same architecture, different sampling = different personality.

This is how you tune an agent’s voice.