Similarity

How Convolutional Neural Networks Learn Musical Similarity

audio embeddings for music advice? Streaming platforms (Spotify, Apple Music, etc.) must have the power to recommend recent songs to their users. The higher the recommendations, the higher the listening experience. There are various ways...

How Deep Feature Embeddings and Euclidean Similarity Power Automatic Plant Leaf Recognition

Automatic plant leaf detection is a remarkable innovation in computer vision and machine learning, enabling the identification of plant species by examining a photograph of the leaves. Deep learning is applied to extract meaningful...

RAG Explained: Understanding Embeddings, Similarity, and Retrieval

, I walked through constructing an easy RAG pipeline using OpenAI’s API, LangChain, and native files, in addition to effectively chunking large text files. These posts cover the fundamentals of organising a RAG pipeline...

Demystifying Cosine Similarity

is a commonly used metric for operationalizing tasks akin to semantic search and document comparison in the sector of natural language processing (NLP). Introductory NLP courses often provide only a high-level justification for...

Similarity Search, Part 5: Locality Sensitive Hashing (LSH) Introduction Shingling MinHashing LSH Function Error rate Conclusion Resources

Explore how similarity information might be incorporated into hash functionS is an issue where given a question the goal is to search out probably the most similar documents to it amongst all of the...

Similarity Search, Part 5: Locality Sensitive Hashing (LSH)

Explore how similarity information may be incorporated into hash functionSimilarity search is an issue where given a question the goal is to search out probably the most similar documents to it amongst all of...

Cosine Similarity for 1 Trillion Pairs of Vectors Motivation ChunkDot Chunk size calculation Memory and speed Usage Conclusion

Introducing ChunkDotpip install -U chunkdotCalculate the 50 most similar and dissimilar items for 100K items.import numpy as npfrom chunkdot import cosine_similarity_top_kembeddings = np.random.randn(100000, 256)# using all you system's memorycosine_similarity_top_k(embeddings, top_k=50)# most dissimilar items using...

Recent posts

Popular categories

ASK ANA