Top-p (nucleus sampling) keeps the smallest set of tokens whose cumulative probability exceeds p.
If p=0.9, keep tokens until their probabilities sum to 90%. Sample from those.
Why it’s adaptive:
When one token dominates (90% probability), nucleus is tiny. Maybe 1-2 tokens. Model is confident.
When probabilities are spread (many at 5-10%), nucleus is large. Maybe 20+ tokens. Model is uncertain.
This adapts to the model’s confidence. Confident? Be focused. Uncertain? Explore options.
Compared to Top-k:
Top-k uses a fixed window. Always keep exactly k tokens.
Problem: what if the top token already has 99% probability? You’re keeping irrelevant options.
Top-p adjusts. When the model is sure, it narrows the choices. When it’s not, it widens them.
Most modern LLMs use this. p=0.9 or p=0.95 typically.
Combine with temperature for fine control. Temperature sets randomness, top-p sets diversity.