On this planet of web development, turning designs into functional web sites often involves plenty of coding and careful testing. What if we could simplify this process, making it possible to convert web designs into working web sites more easily and quickly? WebSight is a brand new dataset that goals at constructing AI systems capable of reworking screenshots to HTML code.
The challenge
Turning a web site design or screenshot into HTML code often needs an experienced developer. But what if this might be more efficient? Motivated by this query, we investigated how vision-language models (VLMs) might be utilized in web development to create low-code solutions that improve efficiency.
Today, the primary challenge towards that goal is the shortage of high-quality datasets tailored for this task. WebSight goals to fill that gap.
WebSight: A big synthetic dataset of screenshot/HTML code pairs
In January 2024, we introduced WebSight-v0.1, an artificial dataset that consists of 823,000 pairs of HTML codes and their corresponding screenshots. This dataset is designed to coach AI models to process and translate visual web designs into functional HTML code. By specializing in synthetic data, we have managed to bypass the noise and complexity often present in real-world HTML, allowing AI models to learn efficiently.
Following our initial release and constructing on top of the community’s feedback, we have updated our dataset to WebSight-v0.2, introducing significant improvements. These enhancements feature using real images within the screenshots, and switching to Tailwind CSS (as a substitute of traditional CSS). We further scaled the dataset to 2 million examples.

Sightseer: A model fine-tuned on WebSight
Leveraging the WebSight dataset, we’ve fine-tuned our forthcoming foundation vision-language model to acquire Sightseer, a model able to converting webpage screenshots into functional HTML code. Sightseer moreover demonstrates the aptitude to include images into the generated HTML that closely resemble those in the unique screenshots.

Towards more powerful tools unlocked by visual language models
By iterating over WebSight, our goal is to construct more capable AI systems that simplify the technique of turning UI designs into functional code. This might reduce iteration time for developers by transforming a paper UI sketch into functional code rapidly, while making this process more accessible for non-developers. That is certainly one of the numerous real applications of visual language models.. By open-sourcing WebSight, we encourage the community to work with us toward constructing more powerful tools for UI development.
