Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping.
Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations.
VLM-Fix is a controlled benchmark over Tic-Tac-Toe, Reversi, Connect Four, and Dots and Boxes. Each of the 300 terminal states per game is reused under paired standard and inverse rules, so the visual evidence stays fixed while the semantic interpretation changes. The benchmark also varies rendering and prompt framing separately, allowing the experiments to isolate semantic remapping failures from ordinary perception errors.
The main semantic-fixation result is a large standard-versus-inverse gap on identical terminal boards. Averaged across games and models, accuracy is 67.1% under standard rules but 52.5% under inverse rules, a 14.6-point gap. The drop appears in all four games, is largest for Dots and Boxes (73.8% versus 50.0%), and 13 of the 14 evaluated models perform worse on inverse rules.
Semantic framing matters more than visual perturbation. Neutral Alias prompts raise inverse-rule accuracy to 63.08% and reduce the average standard-inverse gap to 2.29 points, while SemAlias drops inverse accuracy back to 53.51% and reopens the gap. Glyph and Checkerboard yield only modest changes, which supports the view that the main failures are driven by semantic priors rather than weak perception.
The same qualitative pattern transfers to VLMBias. Across the four counting subsets, Flip and Alias each help on their own, and Flip+Alias is strongest overall, improving accuracy from 11.6% to 20.7% and reducing bias from 76.7% to 58.9%. The largest task-level gain appears on Animals, where accuracy rises from 3.6% to 22.2%.
We study two post-training approaches, supervised fine-tuning and RLVR, in two transfer settings. Within VLM-Fix, D1--D3 probe rule-conditional and cross-game transfer under supervised fine-tuning and RLVR. For VLMBias, we also train on a synthetic leg-count dataset of procedurally rendered birds and quadrupeds and then evaluate transfer to the Animals subset, which provides a more natural counting testbed than the synthetic games alone.
The post-training results are strongly conditional on the training signal. On the VLM-Fix D1--D3 splits, post-training improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves held-out cross-game transfer when both mappings are seen during training. On VLMBias, synthetic leg-count training yields the clearest gains on the difficult Animals slice, where the targeted counting behavior improves substantially more than the easier non-animal categories.
Activation steering provides a complementary editability result without retraining. On VLM-Fix, late-layer donor steering improves patched accuracy when routing is reliable, with the strongest gains in Reversi and Dots and Boxes. On VLMBias Animals, steering from synthetic-leg SFT donors into the corresponding base model also improves the targeted counting behavior, showing that the relevant representations remain partly editable in later layers.
@article{alam2026beyond,
title={Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models},
author={Md Tanvirul Alam},
year={2026},
}