Not All HNSW Indices Are Made Equally

-

Photo by Robin Jonathan Deutsch on Unsplash

Rebuilding an HNSW index is some of the resource-intensive features of using HNSW in production workloads. Unlike traditional databases, where data deletions might be handled by simply deleting a row in a table, using HNSW in a vector database often requires a whole rebuild to keep up optimal performance and accuracy.

Why is Rebuilding Vital?

Due to its layered graph structure, HNSW will not be inherently designed for dynamic datasets that change incessantly. Adding recent data or deleting existing data is crucial for maintaining updated data, especially to be used cases like RAG, which goals to enhance search relevence.

Most databases work on an idea called “hard” and “soft” deletes. Hard deletes permanently remove data, while soft deletes flag data as ‘to-be-deleted’ and take away it later. The problem with soft deletes is that the to-be-deleted data still uses significant memory until it’s permanently removed. This is especially problematic in vector databases that use HNSW, where memory consumption is already a major issue.

HNSW creates a graph where nodes (vectors) are connected based on their proximity within the vector space, and traversing on an HNSW graph is finished like a skip-list. So as to support that, the layers of the graph are designed in order that some layers have only a few nodes. When vectors are deleted, especially those on layers which have only a few nodes that function critical connectors within the graph, the entire HNSW structure can grow to be fragmented. This fragmentation may result in nodes (or layers) which might be disconnected from the major graph, which require rebuilding of the whole graph, or on the very least will end in a degradation within the efficiency of searches.

HNSW then uses a soft-delete technique, which marks vectors for deletion but doesn’t immediately remove them. This approach lowers the expense of frequent complete rebuilds, although periodic reconstruction continues to be needed to keep up the graph’s optimal state.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x