Optimizing Semiconductor Defect Classification with Generative AI and Vision Foundation Models

In the guts of each modern electronic device lies a silicon chip, built through a producing process so precise that even a microscopic defect can determine success or failure. As semiconductor devices grow more complex, reliably detecting and classifying defects has develop into a critical bottleneck.

Historically, chipmakers have relied on convolutional neural networks (CNNs) to automate defect classification (ADC). But as manufacturing scales and diversifies, CNN-based approaches are hitting their limits, requiring large labeled datasets, frequent retraining, and still struggling to generalize across recent defect types.

On this post, we show how generative AI-powered ADC can overcome these challenges.

The workflows below leverage NVIDIA Metropolis vision language models (VLMs), vision foundation models (VFMs), and the NVIDIA TAO fine-tuning toolkit to modernize defect classification. We outline the constraints of traditional CNN-based systems, detail how VLMs and VFMs address them, and highlight specific approaches and manufacturing challenges they assist solve.

The bounds of CNNs in semiconductor defect classification

CNNs have long been the backbone of defect detection in semiconductor fabs, supporting optical and e-beam inspection, lithographic evaluation, and more. They excel at extracting visual features from large datasets, but manufacturers face persistent challenges related to data requirements, semantic understanding, and retraining.

High data requirements

Achieving high accuracy often requires hundreds of labeled images per defect class. Rare or emerging defects continuously lack sufficient examples for effective training.

Limited semantic understanding

While CNNs capture visual features, they can not interpret context, perform root-cause evaluation, or integrate multimodal data. Additionally they struggle to distinguish visually similar yet operationally distinct defect patterns, similar to center vs. local defects.

Frequent retraining

Real-world manufacturing is dynamic. Process variations, recent tools, and evolving product lines require models to be retrained continuously to acknowledge recent defect types and imaging conditions.

These limitations force fabs to depend on manual inspection, which is expensive, inconsistent, and unable to scale with today’s manufacturing throughput.

Modernizing ADC with VLMs and VFMs

To deal with these challenges, NVIDIA applies VLMs, VFMs, and self-supervised learning across multiple stages of semiconductor manufacturing. Figure 1 illustrates how these models are deployed across front-end-of-line (FEOL) and back-end packaging processes.

On this post, we display how VLMs classify wafer map images and the way VFMs classify die-level images, including optical, e-beam, and back-end optical microscopy (OM) inspection data. With further training, VLMs also show strong potential for die-level inspection.

Different image types that can potentially be used for an automatic defect classification (ADC) system enhanced with vision language models (VLMs) and vision foundation models (VFMs) - wafer defect maps, e-beam, backend, and optical microscopy images. — Figure 1. Examples of various image types that may potentially be used for an automatic defect classification (ADC) system enhanced with vision language models (VLMs) and vision foundation models (VFMs). These include wafer defect maps and various die-level defects present in optical, e-beam, and optical microscopy (OM) images.

Wafer-level intelligence with VLMs

Wafer maps provide a spatial view of defect distributions across a whole wafer. VLMs mix advanced image understanding with natural language reasoning. After fine-tuning, NVIDIA reasoning VLMs, similar to Cosmos Reason, can interpret wafer map images to discover macro defects, generate natural language explanations, perform interactive Q&A, and compare test images against “golden” references for preliminary root-cause evaluation.

Image shows how Cosmos Reason VLM can be used for explainability and auto-labeling in semiconductor defect classification processes. — Figure 2. The left side showcases how Cosmos Reason VLM can mechanically classify this as a middle ring wafer defect and attribute it to chemical contamination. The precise side shows how auto-labeling methods speed up the training process and help to streamline defect evaluation and reduce manual visual inspection efforts.

Using this approach offers several benefits:

Few-shot learning: VLMs will be fine-tuned with only a small variety of labeled examples, enabling rapid adaptation to recent defect patterns, process changes, or product variations.

Explainability: As shown in Figure 2, Cosmos Reason produces interpretable results that engineers can interact with using natural language. For instance, asking “What’s the primary defect pattern on this wafer map?” might return “Center ring defect detected, likely because of chemical contamination.” This semantic reasoning ability goes beyond CNNs, helping engineers quickly discover potential root causes, speed up corrective actions, and reduce the quantity of manual reviews.

Automated data labeling: VLMs can generate high-quality labels for downstream ADC tasks, reducing the time and value of model development. In practice, this approach can cut model construct times by as much as 2x in comparison with manual labeling workflows.

Time series and lot level evaluation: VLMs have the power to process each still images and video sequences, enabling them to proactively monitor process anomalies over time and mitigate errors before they result in critical failures. In a single study, VLMs achieved high accuracy across each OK and NG cases, outperforming traditional CNN-based methods.

A flow diagram showing the end-to-end process for fine-tuning and deploying the Cosmos Reason 1 model. The workflow begins with Data Preparation, which includes two steps: Data Annotation and Pre-processing. It then moves to Supervised Fine-tuning (SFT) using the Cosmos-Reason1 model. After fine-tuning, the model goes through FP8 Quantization. The final stage is Model Deployment, with two deployment options shown: Deploy with Cosmos Reason NIM and Deploy with VSS Blueprint. — Figure 3. The top-to-end workflow for fine-tuning the Cosmos Reason 1 model, covering data preparation, supervised fine-tuning on the curated dataset, and subsequent quantization and deployment for inference.

Optimizing Semiconductor Defect Classification with Generative AI and Vision Foundation Models

The bounds of CNNs in semiconductor defect classification

High data requirements

Limited semantic understanding

Frequent retraining

Modernizing ADC with VLMs and VFMs

Wafer-level intelligence with VLMs

Getting began with Cosmos Reason

Die-level precision with VFMs and self-supervised learning

Getting began with NV-DINOv2 and SSL

Paving the technique to a wise fab

Next steps

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Nous Research's NousCoder-14B is an open-source coding model landing right within the Claude Code moment

Stone Center on Inequality and Shaping the Way forward for Work Launches at MIT

Redefining Secure AI Infrastructure with NVIDIA BlueField Astra for NVIDIA Vera Rubin NVL72

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

HNSW at Scale: Why Your RAG System Gets Worse because the Vector Database Grows

Optimizing Semiconductor Defect Classification with Generative AI and Vision Foundation Models

The bounds of CNNs in semiconductor defect classification

High data requirements

Limited semantic understanding

Frequent retraining

Modernizing ADC with VLMs and VFMs

Wafer-level intelligence with VLMs

Getting began with Cosmos Reason

Die-level precision with VFMs and self-supervised learning

Getting began with NV-DINOv2 and SSL

Paving the technique to a wise fab

Next steps

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.