Back to project page

Interactive Demo

SPHINX Interactive Demo

Explore a balanced 200-example subset from the SPHINX evaluation split.

Compare cached GPT-5 and GPT-5 Mini responses side by side, or answer the questions yourself and track your score against the same subset.

Example 1