Interactive Demo
SPHINX Interactive Demo
Explore a balanced 200-example subset from the SPHINX evaluation split.
Compare cached GPT-5 and GPT-5 Mini responses side by side, or answer the questions yourself and track your score against the same subset.
Ground truth
Compare
Model responses
Quiz
Your answer
Ground truth
Model