- Introduction: Why grayscale images might affect anomaly detection.
- Anomaly detection, grayscale images: Quick recap on the 2 fundamental subjects discussed in this text.
- Experiment setting: What and the way we compare.
- Performance results: How grayscale images affect model performance.
- Speed results: How grayscale images affect inference speed.
- Conclusion
1. Introduction
In this text, we’ll explore how grayscale images affect the performance of anomaly detection models and examine how this selection influences inference speed.
In computer vision, it’s well established that fine-tuning pre-trained classification models on grayscale images can result in degraded performance. But what about anomaly detection models? These models don’t require fine-tuning, but they use pre-trained classification models equivalent to WideResNet or EfficientNet as feature extractors. This raises a crucial query: do these feature extractors produce less relevant features when applied to a grayscale image?
This query just isn’t just academic, but one with real-world implications for anyone working on automating industrial visual inspection in manufacturing. For instance, you may end up wondering if a color camera is needed or if a less expensive grayscale one can be sufficient. Or you would have concerns regarding the inference speed and wish to make use of any opportunity to extend it.
2. Anomaly detection, grayscale images
If you happen to are already aware of each anomaly detection in computer vision and the fundamentals of digital image representation, be happy to skip this section. Otherwise, it provides a transient overview and links for further exploration.
Anomaly detection
In computer vision, anomaly detection is a fast-evolving field inside deep learning that focuses on identifying unusual patterns in images. Typically, these models are trained using only images without defects, allowing the model to learn what “normal” looks like. During inference, the model can detect images that deviate from this learned representation as abnormal. Such anomalies often correspond to numerous defects that will appear in a production environment but weren’t seen during training. For a more detailed introduction, see this link.
Grayscale images
For humans, color and grayscale images look quite similar (apart from the dearth of color). But for computers, a picture is an array of numbers, so it becomes somewhat bit more complicated. A grayscale image is a two-dimensional array of numbers, typically starting from 0 to 255, where each value represents the intensity of a pixel, with 0 being black and 255 being white.
In contrast, color images are typically composed of three such separate grayscale images (called channels) stacked together to form a three-dimensional array. Each channel (red, green, and blue) describes the intensity of the respective color, and its combination creates a color image. You’ll be able to learn more about this here.
3. Experiment setting
Models
We’ll use 4 state-of-the-art anomaly detection models: PatchCore, Reverse Distillation, FastFlow, and GLASS. These models represent various kinds of anomaly detection algorithms and, at the identical time, they’re widely utilized in practical applications because of fast training and inference speed. The primary three models use the implementation from the Anomalib library, for GLASS we employ the official implementation.

Dataset
For our experiments, we use the VisA dataset with 12 categories of objects, which provides quite a lot of images and has no color-dependent defects.

Metrics
We’ll use image-level AUROC to see if the entire image was classified accurately without the necessity to pick out a specific threshold, and pixel-level AUPRO, which shows how good we’re at localizing defective areas within the image. Speed can be evaluated using the frames-per-second (FPS) metric. For all metrics, higher values correspond to higher results.
Grayscale conversion
To make a picture grayscale, we’ll use torchvision transforms.

For one channel, we also modify feature extractors using the parameter within the timm library.

The code for adapting Anomalib to make use of one channel is offered here.
4. Performance results
RGB
These are regular images with red, blue, and green channels.

Grayscale, three channels
Images were converted to grayscale using torchvision transform Grayscale with three channels.

Grayscale, one channel
Images were converted to grayscale using the identical torchvision transform Grayscale with one channel.

Comparison
We are able to see that PatchCore and Reverse Distillation have close results across all three experiments for each image and pixel-level metrics. FastFlow becomes somewhat worse, and GLASS becomes noticeably worse. Results are averaged across the 12 categories of objects within the VisA dataset.
What about results per category of objects? Possibly a few of them perform worse than others, and a few higher, causing the common results to seem the identical? Here is the visualization of results for PatchCore across all three experiments showing that results are quite stable inside categories as well.

The identical visualization for GLASS shows that some categories may be barely higher while some may be strongly worse. Nevertheless, this just isn’t necessarily attributable to grayscale transformation only; a few of it may possibly be regular result fluctuations because of how the model is trained. Averaged results show a transparent tendency that for this model, RGB images produce the very best result, grayscale with three channels somewhat worse, and grayscale with one channel the worst result.

Bonus
How do results change per category? It is feasible that some categories are simply higher suited to RGB or grayscale images, even when there are not any color-dependent defects.
Here is the visualization of the difference between RGB and grayscale with one channel for all of the models. We are able to see that only pipe_fryum category becomes barely (or strongly) worse for each model. The remaining of the categories turn out to be worse or higher, depending on the model.

Extra bonus
If you happen to are excited by how this pipe_fryum looks, listed here are a few examples with GLASS model predictions.

5. Speed results
The variety of channels affects only the primary layer of the model, the remaining stays unchanged. The speed improvement appears to be negligible, highlighting how the primary layer feature extraction is only a small a part of the calculations performed by the models. GLASS shows a somewhat noticeable improvement, but at the identical time, it shows the worst metrics decline, so it requires caution if you must speed it up by switching to 1 channel.

6. Conclusion
So how does using grayscale images affect visual anomaly detection? It depends, but RGB appears to be the safer bet. The impact varies depending on the model and data. PatchCore and Reverse Distillation generally handle grayscale inputs well, but that you must be more careful with FastFlow and particularly GLASS, which shows some speed improvement but additionally probably the most significant drop in performance metrics. If you must use grayscale input, that you must test and compare it with RGB in your specific data.
The jupyter notebook with the Anomalib code: link.
Follow creator on LinkedIn for more on industrial visual anomaly detection.
References
1. C. Hughes, Transfer Learning on Greyscale Images: How one can Advantageous-Tune Pretrained Models (2022), towardsdatascience.com
2. S. Wehkamp, A practical guide to image-based anomaly detection using Anomalib (2022), blog.ml6.eu
3. A. Baitieva, Y. Bouaouni, A. Briot, D. Ameln, S. Khalfaoui, and S. Akcay. Beyond Academic Benchmarks: Critical Evaluation and Best Practices for Visual Industrial Anomaly Detection (2025), CVPR Workshop on Visual Anomaly and Novelty Detection (VAND)
4. Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation (2022), ECCV
5. S. Akcay, D. Ameln, A. Vaidya, B. Lakshmanan, N. Ahuja, and U. Genc, Anomalib (2022), ICIP