MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment
Abstract
Vision-Language Models (VLMs) continue to struggle to make morally salient judgments in multimodal and socially ambiguous contexts. Prior works typically rely on binary or pairwise supervision, which often fail to capture the continuous and pluralistic nature of human moral reasoning. We present MM-SCALE (Multimodal Moral Scale), a large-scale dataset for aligning VLMs with human moral preferences through 5-point scalar ratings and explicit modality grounding. Each image-scenario pair is annotated with moral acceptability scores and grounded reasoning labels by humans using an interface we tailored for data collection, enabling listwise preference optimization over ranked scenario sets. By moving from discrete to scalar supervision, our framework provides richer alignment signals and finer calibration of multimodal moral reasoning. Experiments show that VLMs fine-tuned on MM-SCALE achieve higher ranking fidelity and more stable safety calibration than those trained with binary signals.
Growth and citations
This paper is currently showing No growth state computed yet..
Citation metrics and growth state from academic sources (e.g. Semantic Scholar). See About for details.
Cited by (0)
No citing papers yet
Papers that cite this one will appear here once data is available.
View citations page →References (0)
No references in DB yet
References for this paper will appear here once ingested.
Related papers in Human-Computer Interaction
- PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization0 citations
- Investigating the Influence of Spatial Ability in Augmented Reality-assisted Robot Programming0 citations
- EarResp-ANS : Audio-Based On-Device Respiration Rate Estimation on Earphones with Adaptive Noise Suppression0 citations
Growth transitions
No transitions recorded yet
Growth state transitions will appear here once computed.