Enhancing GPU-Accelerated Vector Search in Faiss with NVIDIA cuVS

Construct time: IVF-PQ and IVF-Flat were built as much as 4.7x faster using cuVS (Figure 1)
Latency: Search latency was as much as 8x lower for IVF-PQ, and 90% lower for IVF-Flat (Figure 1)
Throughput: cuVS improved large-batch search throughput as much as 3x for IVF-PQ across each datasets (Figure 2), while maintaining comparable performance for IVF-Flat. This makes it well-suited for high-volume and enormous offline search workloads.

As corporations collect more unstructured data and increasingly use large language models (LLMs), they need faster and more scalable systems. Advanced tools for locating information, equivalent to retrieval-augmented generation (RAG), can take hours and even days to process massive amounts of information—sometimes at the size of terabytes or petabytes.

Meanwhile, online search applications like ad suggestion systems struggle to deliver fast results on CPUs. Hundreds of CPUs could be required to fulfill real-time speed requirements, increasing infrastructure costs.

This post explores the best way to solve these challenges using NVIDIA cuVS with the Meta Faiss library for efficient similarity search and clustering of dense vectors. cuVS uses GPU acceleration to dramatically speed up each the creation of search indexes and the actual search process. The result is way faster, lower-cost, and more efficient performance, all while maintaining seamless compatibility between CPUs and GPUs.

Specifically, the post covers:

The advantages of integrating cuVS and Faiss
How and where cuVS improves vector search performance
Performance with GPU-accelerated inverted file index (IVF) and graph-based indexes
Benchmarks and Python code examples demonstrating the best way to construct and search cuVS-powered indexes with Faiss