Similarity

RAG Explained: Understanding Embeddings, Similarity, and Retrieval

, I walked through constructing an easy RAG pipeline using OpenAI’s API, LangChain, and native files, in addition to effectively chunking large text files. These posts cover the fundamentals of organising a RAG pipeline...

Demystifying Cosine Similarity

is a commonly used metric for operationalizing tasks akin to semantic search and document comparison in the sector of natural language processing (NLP). Introductory NLP courses often provide only a high-level justification for...

Similarity Search, Part 5: Locality Sensitive Hashing (LSH) Introduction Shingling MinHashing LSH Function Error rate Conclusion Resources

Explore how similarity information might be incorporated into hash functionS is an issue where given a question the goal is to search out probably the most similar documents to it amongst all of the...

Similarity Search, Part 5: Locality Sensitive Hashing (LSH)

Explore how similarity information may be incorporated into hash functionSimilarity search is an issue where given a question the goal is to search out probably the most similar documents to it amongst all of...

Cosine Similarity for 1 Trillion Pairs of Vectors Motivation ChunkDot Chunk size calculation Memory and speed Usage Conclusion

Introducing ChunkDotpip install -U chunkdotCalculate the 50 most similar and dissimilar items for 100K items.import numpy as npfrom chunkdot import cosine_similarity_top_kembeddings = np.random.randn(100000, 256)# using all you system's memorycosine_similarity_top_k(embeddings, top_k=50)# most dissimilar items using...

Recent posts

Popular categories

ASK ANA