The flexibility to accurately interpret complex visual information is a vital focus of multimodal large language models (MLLMs). Recent work shows that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks,...