The Map of Meaning: How Embedding Models “Understand” Human Language

-

you’re employed with Artificial Intelligence development, if you happen to are studying, or planning to work with that technology, you definitely stumbled upon along your journey.

At its heart, an embedding model is a neural network trained to map like words or sentences right into a continuous vector space, with the goal of approximating mathematically those objects which might be contextually or conceptually similar.

Putting it in simpler words, imagine a library where the books usually are not categorized only by writer and title, but by many other dimensions, resembling , , , , etc.

One other good analogy is a map itself. Consider a map and two cities you don’t know. Let’s say you usually are not that good with Geography and don’t know where Tokyo and Recent York City are within the map. If I inform you that we must always have breakfast in NYC and lunch in Tokyo, you would say: “Let’s do it”.

Nevertheless, once I provide you with the coordinates for you to ascertain the cities on the map, you will notice they’re very far-off from one another. That’s like giving the embeddings to a model: they’re the coordinates!

Constructing the Map

Even before you ever ask an issue, the embedding model was trained. It has read thousands and thousands of sentences and noted patterns. For instance, it sees that “cat” and “kitten” often appear in the identical sorts of sentences, while “cat” and “refrigerator” rarely do.

With those patterns, the model assigns every word a set of coordinates on a mathematical space, like an invisible map.

  • Concepts which might be similar (like “cat” and “kitten”) get placed right next to one another on the map.
  • Concepts which might be somewhat related (like “cat” and “dog”) are placed near one another, but not right on top of each other.
  • Concepts which might be totally unrelated (like “cat” and “quantum physics”) are placed in completely different corners of the map, like NYC and Tokyo.

The Digital Fingerprint

Nice. Now we know the way the map was created. What comes next?

Now we’ll work with this trained embedding model. Once we give the model a sentence like “The fluffy kitten is sleeping”:

  1. It doesn’t have a look at the letters. As a substitute, it visits those coordinates on its map for every word.
  2. It calculates the center point (the typical) of all those locations. That single center point becomes the “fingerprint” for the entire sentence.
  3. It puts a pin on the map where your query’s fingerprint is
  4. Looks around in a circle to see which other fingerprints are nearby.

Any documents that “live” near your query on this map are considered a match, because they share the identical “vibe” or topic, even in the event that they don’t share the very same words.

Embeddings: the invisible map. | Image generated by AI. Google Gemini, 2026.

It’s like looking for a book not by looking for a particular keyword, but by pointing to a spot on a map that claims “these are all books about kittens,” and letting the model fetch all the things in that neighborhood.

Embedding Models Steps

Let’s see next how an embedding model works step-by-step after getting a request.

  1. Computer takes in a text.
  2. Breaks it down into tokens, which is the smallest piece of a phrase with meaning.
  3. Chunking: The input text is split into manageable chunks (often around 512 tokens), so it doesn’t get overwhelmed by an excessive amount of information directly.
  4. Embedding: It transforms each snippet into a protracted list of numbers (a vector) that acts like a novel representing the meaning of that text.
  5. Vector Search: If you ask an issue, the model turns your query right into a “fingerprint” too and quickly calculates which stored snippets have probably the most mathematically similar numbers.
  6. Model returns probably the most similar vectors, that are related to text chunks.
  7. Generation: If you happen to are performing a Retrieval-Augmented Generation (RAG), the model hands those few “winning” snippets to an AI (like a LLM) which reads them and writes out a natural-sounding answer based only on that specific information.

Coding

Great. We did numerous talking. Now, let’s attempt to code a little bit and get those concepts more practical.

We’ll start with an easy BERT (Bidirectional Encoder Representations from Transformers) embedding. It was created by Google and uses the Transformer architecture and its attention mechanism. The vector for a word changes based on the words surrounding it.

# Imports
from transformers import BertTokenizer

# Load pre-trained BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Sample text for tokenization
text = "Embedding models are so cool!"

# Step 1: Tokenize the text
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
# View
tokens
{'input_ids': tensor([[ 101, 7861, 8270, 4667, 4275, 2024, 2061, 4658,  999,  102]]),
 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 
 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

Notice how each word was transformed into an id. Since we now have only 5 words, a few of them might need been broken down into two subwords.

  • The ID 101 is related to the token [CLS]. That token’s vector is believed to capture the general meaning or information of all the sentence or sequence of sentences. It is sort of a stamp that indicates to the LLMs the meaning of that chunk. [2]
  • The ID 102 is related to the token [SEP] to separate sentences. [2]

Next, let’s apply the embedding model to data.

Embedding

Here is one other easy snippet where we get some text and encode it with the versatille and all-purpose embedding model all-MiniLM-L6-v2.

from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer

# 1. Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu')

# 2. Initialize Qdrant client
client = QdrantClient(":memory:")

# 3. Create embeddings
docs = ["refund policy", "pricing details", "account cancellation"]
vectors = model.encode(docs).tolist()

# 4. Store Vectors: Create a group (DB)
client.create_collection(
    collection_name="my_collection",
    vectors_config = models.VectorParams(size=384,
                                         distance= models.Distance.COSINE)
)

# Upload embedded docs (vectors)
client.upload_collection(collection_name="my_collection",
                         vectors= vectors,
                         payload= [{"source": docs[i]} for i in range(len(docs))])




# 5. Search
query_vector = model.encode("How do I cancel my subscription")

# Result
result = client.query_points(collection_name= 'my_collection',
                             query= query_vector,
                             limit=2,
                             with_payload=True)

print("nn ======= RESULTS =========")
result.points

The outcomes are as expected. It points to the account cancellation topic!

 ======= RESULTS =========
[ScoredPoint(id='b9f4aa86-4817-4f85-b26f-0149306f24eb', version=0, score=0.6616353073200185, payload={'source': 'account cancellation'}, vector=None, shard_key=None, order_value=None),
 ScoredPoint(id='190eaac1-b890-427b-bb4d-17d46eaffb25', version=0, score=0.2760082702501182, payload={'source': 'refund policy'}, vector=None, shard_key=None, order_value=None)]

What just happened?

  1. We imported a pre-trained embedding model
  2. Instantiated a vector database of our alternative: Qdrant [3].
  3. Embedded the text and uploaded it to the vector DB in a brand new collection.
  4. We submitted a question.
  5. The outcomes are those documents with the closest mathematical “fingerprint”, or intending to the query’s embeddings.

This is de facto nice.

To finish this text, I ponder if we are able to attempt to high quality tune an embedding model. Let’s try.

Nice Tuning an Embedding Model

Nice-tuning an embedding model is different from fine-tuning an LLM. As a substitute of teaching the model to “talk,” you’re teaching it to reorganize its internal map in order that specific concepts in your domain are pushed further apart or pulled closer together.

Essentially the most common and effective method to do that is using Contrastive Learning with a library like Sentence-Transformers.

First, teach the model what closeness looks like using three data points.

  • Anchor: The reference item (e.g., “Brand A Cola Soda”)
  • Positive: The same item (e.g., “Brand B Cola Soda”) that model should pull together.
  • Negative: A special item (e.g., “Brand A Cola Soda Zero Sugar”) that the model should push away.

Next, we elect a Loss Function to inform the model how much to alter when it makes a mistake. You possibly can choose from:

  • MultipleNegativesRankingLoss: Great if you happen to only have (Anchor, Positive) pairs. It assumes every other positive within the batch is a “negative” for the present anchor.
  • TripletLoss: Best if you’ve gotten explicit (Anchor, Positive, Negative) sets. It forces the space between Anchor-Positive to be smaller than Anchor-Negative by a particular margin.

That is the model similarity results out-of-the-box.

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
from sentence_transformers import util

# 1. Load a pre-trained base model
model = SentenceTransformer('all-MiniLM-L6-v2')

# 1. Define your test cases
query = "Brand A Cola Soda"
decisions = [
    "Brand B Cola Soda",   # The 'Positive' (Should be closer now)
    "Brand A Cola Soda Zero Sugar"   # The 'Negative' (Should be further away now)
]

# 2. Encode the text into vectors
query_vec = model.encode(query)
choice_vecs = model.encode(decisions)

# 3. Compute Cosine Similarity
# util.cos_sim returns a matrix, so we convert to an inventory for readability
cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()

print(f"nn ======= Results for: {query} ===============")
for i, rating in enumerate(cos_scores):
    print(f"-> {decisions[i]}: {rating:.5f}")
 ======= Results for: Brand A Cola Soda ===============
-> Brand B Cola Soda: 0.86003
-> Brand A Cola Soda Zero Sugar: 0.81907

And once we attempt to high quality tune it, showing this model that the Cola Sodas must be closer than the Zero Sugar version, that is what happens.

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
from sentence_transformers import util

# 1. Load a pre-trained base model
fine_tuned_model = SentenceTransformer('all-MiniLM-L6-v2')

# 2. Define your training data (Anchors, Positives, and Negatives)
train_examples = [
    InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand C Cola Zero Sugar"]),
    InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand A Cola Zero Sugar"]),
    InputExample(texts=["Brand A Cola Soda", "Cola Soda", "Brand B Cola Zero Sugar"])
]

# 3. Create a DataLoader and select a Loss Function
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
train_loss = losses.TripletLoss(model=fine_tuned_model)

# 4. Tune the model
fine_tuned_model.fit(train_objectives=[(train_dataloader, train_loss)], 
                     optimizer_params={'lr': 9e-5},
                     epochs=40)


# 1. Define your test cases
query = "Brand A Cola Soda"
decisions = [
    "Brand B Cola Soda",   # The 'Positive' (Should be closer now)
    "Brand A Cola Zero Sugar"   # The 'Negative' (Should be further away now)
]

# 2. Encode the text into vectors
query_vec = fine_tuned_model.encode(query)
choice_vecs = fine_tuned_model.encode(decisions)

# 3. Compute Cosine Similarity
# util.cos_sim returns a matrix, so we convert to an inventory for readability
cos_scores = util.cos_sim(query_vec, choice_vecs)[0].tolist()

print(f"nn ======== Results for: {query} ====================")
for i, rating in enumerate(cos_scores):
    print(f"-> {decisions[i]}: {rating:.5f}")
 ======== Results for: Brand A Cola Soda ====================
-> Brand B Cola Soda: 0.86247
-> Brand A Cola Zero Sugar: 0.75732

Here, we didn’t get a significantly better result. This model is trained over a really great amount of knowledge, so this high quality tuning with a small example was not enough to make it work the way in which we expected.

But still, that is a fantastic learning. We were capable of make the model iapproximate each Cola Soda examples, but that also brought closer the Zero Cola Soda.

Alignment and Uniformity

way of checking how the model was updated is taking a look at these metrics

  • Alignment: Imagine you’ve gotten a bunch of related items, like ‘Brand A Cola Soda’ and ‘Cola Soda’. .
    • A high alignment rating implies that your model is nice at placing similar things close together, which is usually what you would like for tasks like looking for similar products.
  • Uniformity: Now imagine all of your different items, from ‘refund policy’ to ‘Quantum computing’. Uniformity measures how unfolded  these things are within the embedding space. You would like them to be unfolded evenly somewhat than all clumped together in a single corner.
    • Good uniformity means your model can distinguish between different concepts effectively and avoids mapping all the things to a small, dense region.

embedding model must be balanced. It must bring similar items close together (good alignment) while concurrently pushing dissimilar items far apart and ensuring all the space is well-utilized (good uniformity). This enables the model to capture meaningful relationships without sacrificing its ability to tell apart between distinct concepts.

Ultimately, the best balance often depends upon your specific application. For some tasks, like semantic search, you would possibly prioritize very strong alignment, while for others, like anomaly detection, a better degree of uniformity is likely to be more critical.

That is the code for alignment calculation, which is a mean of the cosine similarities between anchor points and positive matches.

from sentence_transformers import SentenceTransformer, util
import numpy as np
import torch

# --- Alignment Metric for Base Model ---
base_alignment_scores = []

# Assuming 'train_examples' was defined in a previous cell and comprises (anchor, positive, negative) triplets
for instance in train_examples:
    # Encode the anchor and positive texts using the bottom model
    anchor_embedding_base = model.encode(example.texts[0], convert_to_tensor=True)
    positive_embedding_base = model.encode(example.texts[1], convert_to_tensor=True)
    
    # Calculate cosine similarity between anchor and positive
    score_base = util.cos_sim(anchor_embedding_base, positive_embedding_base).item()
    base_alignment_scores.append(score_base)

average_base_alignment = np.mean(base_alignment_scores)

And that is the code for Uniformity calculation. It’s calculated by first taking a various set of embeddings, then computing the cosine similarity between every possible pair of those embeddings, and at last averaging all those pairwise similarity scores.

# --- Uniformity Metric for Base Model ---
# Use the identical diverse set of texts
uniformity_embeddings_base = model.encode(uniformity_texts, convert_to_tensor=True)

# Calculate all pairwise cosine similarities
pairwise_cos_sim_base = util.cos_sim(uniformity_embeddings_base, uniformity_embeddings_base)

# Extract unique pairwise similarities (excluding self-similarity and duplicates)
upper_triangle_indices_base = torch.triu_indices(pairwise_cos_sim_base.shape[0], pairwise_cos_sim_base.shape[1], offset=1)
uniformity_similarity_scores_base = pairwise_cos_sim_base[upper_triangle_indices_base[0], upper_triangle_indices_base[1]].cpu().numpy()

# Calculate the typical of those pairwise similarities
average_uniformity_similarity_base = np.mean(uniformity_similarity_scores_base)

And the outcomes. Given the very limited training data used for fine-tuning (only 3 examples), it’s not surprising that the fine-tuned model doesn’t show a transparent improvement over the bottom model in these specific metrics. 

The  kept related items barely  together than your fine-tuned model did (higher alignment), and in addition kept different, unrelated things barely  or less cluttered than your fine-tuned model (lower uniformity).

* Base Model:
Base Model Alignment Rating (Avg Cosine Similarity of Positive Pairs): 0.8451
Base Model Uniformity Rating (Avg Pairwise Cos Sim. of Diverse Embeddings): 0.0754


* Nice Tuned Model:
Alignment Rating (Average Cosine Similarity of Positive Pairs): 0.8270
Uniformity Rating (Average Pairwise Cosine Similarity of Diverse Embeddings): 0.0777

Before You Go

In this text, we learned about embedding models and the way they work under the hood, in a practical way.

These models gained numerous importance after the surge of AI, being a fantastic engine for RAG applications and fast search.

Computers should have a method to understand text, and the embeddings are the important thing. They encode text into vectors of numbers, making it easy for the models to calculate distances and find one of the best matches.

Here is my contact, if you happen to liked this content. Find me in my website.

https://gustavorsantos.me

Git Hub Code

https://github.com/gurezende/Studying/tree/master/Python/NLP/Embedding_Models

References

[1. Modern NLP: Tokenization, Embedding, and Text Classification] (https://medium.com/data-science-collective/modern-nlp-tokenization-embedding-and-text-classification-448826f489bf?sk=6e5d94086f9636e451717dfd0bf1c03a)

[2. A Visual Guide to Using BERT for the First Time](https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/)

[3. Qdrant Docs] (https://qdrant.tech/documentation/)

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x