DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay

Published in ACL 2026, 2025

Multimodal abductive reasoning–the generation and selection of explanatory hypotheses from partial observations–is a cornerstone of intelligence. Current evaluations of this ability in vision-language models (VLMs) are largely confined to static, single-agent tasks. Inspired by Dixit, we introduce DixitWorld, a comprehensive evaluation suite designed to deconstruct this challenge. DIXITWORLD features two core components: DixitArena, a dynamic, multi-agent environment that evaluates both hypothesis generation and hypothesis selection under imperfect information; and DixitBench, a static QA benchmark that isolates the listener’s task for efficient, controlled evaluation.