Two months since releasing our first batch of Holo2 models, H Company is back with our largest UI localization model yet: Holo2-235B-A22B Preview. This model achieves a brand new State-of-the-Art (SOTA) record of 78.5% on Screenspot-Pro and 79.0% on OSWorld G.
Available on Hugging Face, Holo2-235B-A22B Preview is a research release focused on UI element localization.
Agentic Localization
High-resolution 4K interfaces are difficult for localization models. Small UI elements will be difficult to pinpoint on a big display. With agentic localization, nevertheless, Holo2 can iteratively refine its predictions, improving accuracy with each step and unlocking 10-20% relative gains across all Holo2 model sizes.
Holo2-235B-A22B’s Performance on ScreenSpot-Pro
Holo2-235B-A22B Preview reaches 70.6% accuracy on ScreenSpot-Pro in a single step. In agent mode, it achieves 78.5% inside 3 steps, setting a brand new state-of-the-art on essentially the most difficult GUI grounding benchmark.


