Retrieval for Time-Series: How Looking Back Improves Forecasts

Helps in Time Series Forecasting

All of us understand how it goes: Time-series data is hard.

Traditional forecasting models are unprepared for incidents like sudden market crashes, black swan events, or rare weather patterns.

Even large fancy models like Chronos sometimes struggle because they haven’t handled that sort of pattern before.

We are able to mitigate this with retrieval. With retrieval, we’re capable of ask after which using that past example to guide the forecast.

As all of us might know now, in natural language processing (NLP), this concept known as Retrieval-Augmented Generation (RAG). It’s becoming popular too within the time-series forecasting world.

The model then considers past situations that look much like the present one, and from there it may well make more reliable predictions.

How is that this RAF different from traditional time-series? Retrieval forecasting adds an explicit memory access step.

As a substitute of:

With retrieval we’ve got:

similarity search concrete past episodes->

Retrieval-Augmented Forecasting Cycle. Image by Creator | Napkin AI.

As a substitute of just using what the model learned during training, the idea is to present it access to a spread of similar situations.

It’s like letting a weather model check, “What did past winters like this one appear to be before?”.

Hey there, I’m Sara Nóbrega, an AI Engineer. In the event you’re working on similar problems or want feedback on applying these ideas, I collect my writing, resources, and mentoring links here.

In this text, I explore retrieval–augmented forecasting from first principles and show, with concrete examples and code examples, how retrieval will be utilized in real forecasting pipelines.

What Is Retrieval-Augmented Forecasting (RAF)?

What’s RAF? On a really high-level view, as an alternative only leaning on what a model learned in training, RAF lets the model actively look up concrete past situations much like the present one and use their outcomes to guide its prediction.

Let’s see it more intimately:

You change the present situation (e.g., the previous couple of weeks of a time series stock dataset) into a question.
This query is then used to search a database of historical time-series segments to seek out probably the most similar patterns.
These matches don’t need to come back from the identical stock; the system must also surface similar movements from other stocks or financial products.

It retrieves those patterns and what happened afterwards.

Afterwards, this information is ingested to the forecasting model to assist it make higher predictions.

This method is powerful in:

Zero-shot scenarios: When the model faces something it wasn’t trained on.
Rare or anomalous events: Like COVID, sudden financial crashes, etc.
Evolving seasonal trends: Where past data accommodates helpful patterns, but they shift over time.

RAF doesn’t replace your forecasting model, but as an alternative augments it by giving it extra hints and grounding it in relevant historical examples.

One other example: let’s say you wish to forecast energy consumption during an unusually hot week.

As a substitute of hoping your model recalls how heatwaves affect usage, retrieval finds similar past heatwaves and lets the model consider what happened in that point.

What Do These Models Actually Retrieve?

The retrieved “knowledge” isn’t only raw data. It’s context that offers the model clues.

Listed below are some common examples:

Examples of Data Retrieval. Image by Author | Napkin AI. — Examples of Data Retrieval. Image by Creator | Napkin AI.

As you may see, retrieval focuses on meaningful historical situations, like rare shocks, seasonal effects and patterns which have similar structures. These give actionable context for the present forecast.

How Do These Models Retrieve?

To seek out relevant patterns from the past, these models use structured mechanisms that represent the present situation in a way that makes it easy to look large databases and find the closest matches.

The code snippets on this section are a simplified illustration meant to construct intuition, they don’t represent production code.

Retrieval methods for time series forecasting. Image by Author | Napkin AI. — Retrieval methods for time series forecasting. Image by Creator | Napkin AI.

A few of these methods are:

Embedding-Based Similarity

This one converts time-series (or patches/windows of a series) into compact vectors, then compare them with distance metrics like Euclidean or cosine similarity.

In easy words: The model turns chunks of time-series data into short summaries after which checks which past summaries look most much like what’s happening now.

Some retrieval-augmented forecasters (e.g., RAFT) retrieve probably the most similar historical patches from the training data / entire series after which aggregate retrieved values with attention-like weights.

In easy words: It finds similar situations from the past and averages them, paying more attention to the best matches.

import numpy as np

# Example: embedding-based retrieval for time-series patches
# This can be a toy example to point out the *idea* behind retrieval.
# In practice:
# - embeddings are learned by neural networks
# - similarity search runs over hundreds of thousands of vectors
# - this logic lives inside a bigger forecasting pipeline


def embed_patch(patch: np.ndarray) -> np.ndarray:
    """
    Convert a brief time-series window ("patch") right into a compact vector.

    Here we use easy statistics (mean, std, min, max) purely for illustration.
    Real-world systems might use:
      - a trained encoder network
      - shape-based representations
      - frequency-domain features
      - latent vectors from a forecasting backbone
    """
    return np.array([
        patch.mean(),   # average level
        patch.std(),    # volatility
        patch.min(),    # lowest point
        patch.max()     # highest point
    ])


def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """
    Measure how similar two vectors are.
    Cosine similarity focuses on *direction* fairly than magnitude,
    which is usually useful for comparing patterns or shapes.
    """
    return float(a @ b) / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-9)


# Step 1: Represent the present situation

# A brief window representing the present time-series behavior
query_patch = np.array([10, 12, 18, 25, 14, 11])

# Turn it into an embedding
query_embedding = embed_patch(query_patch)


# Step 2: Represent historical situations

# Past windows extracted from historical data
historical_patches = [
    np.array([9, 11, 17, 24, 13, 10]),   # looks similar
    np.array([2, 2, 2, 2, 2, 2]),        # flat, unrelated
    np.array([10, 13, 19, 26, 15, 12])   # very similar
]

# Convert all historical patches into embeddings
historical_embeddings = [
    embed_patch(patch) for patch in historical_patches
]

# Step 3: Compare and retrieve probably the most similar past cases

# Compute similarity scores between the present situation
# and every historical example
similarities = [
    cosine_similarity(query_embedding, hist_emb)
    for hist_emb in historical_embeddings
]

# Rank historical patches by similarity
top_k_indices = np.argsort(similarities)[::-1][:2]

print("Most similar historical patches:", top_k_indices)

# Step 4 (conceptual):
# In a retrieval-augmented forecaster, the model would now:
# - retrieve the *future outcomes* of those similar patches
# - weight them by similarity (attention-like weighting)
# - use them to guide the ultimate forecast
# This integration step is model-specific and never shown here.

Retrieval Tools and Libraries

1. FAISS
FAISS is a brilliant fast and GPU-friendly library for similarity search over dense vectors. The best datasets for this library are those which can be large and in-memory, though its structure makes real-time updates tougher to implement.

import faiss
import numpy as np

# Suppose we have already got embeddings for historical windows
d = 128  # embedding dimension
xb = np.random.randn(100_000, d).astype("float32")  # historical embeddings
xq = np.random.randn(1, d).astype("float32")        # query embedding

index = faiss.IndexFlatIP(d)   # inner product (often used with normalized vectors for cosine-like behavior)
index.add(xb)

k = 5
scores, ids = index.search(xq, k)
print("Nearest neighbors (ids):", ids)
print("Similarity scores:", scores)

# Some FAISS indexes/algorithms can run on GPU.

Nearest-neighbor lookup (Annoy)
The Annoy library is comparatively lightweight and simple to work with.

The perfect datasets for this library is historical datasets that remain mostly static, since any modification to the dataset requires rebuilding the index.

from annoy import AnnoyIndex
import numpy as np

# Variety of values in each embedding vector.
# The "length" of every fingerprint.
f = 64

# Create an Annoy index.
# This object will store many past embeddings and help us quickly find probably the most similar ones.
ann = AnnoyIndex(f, "angular")
# "angular" distance is usually used to check patterns
# and behaves similarly to cosine similarity.

# Add historical embeddings (past situations).
# Each item represents a compressed version of a past time-series window.
# Here we use random numbers just for instance.
for i in range(10000):
    ann.add_item(i, np.random.randn(f).tolist())

# Construct the search structure.
# This step organizes the information so similarity searches are fast.
# After this, the index becomes read-only.
ann.construct(10)

# Save the index to disk.
# This permits us to load it later without rebuilding the whole lot.
ann.save("hist.ann")

# Create a question embedding.
# This represents the present situation we would like to check
# against past situations.
q = np.random.randn(f).tolist()

# Find the 5 most similar past embeddings.
# Annoy returns the IDs of the closest matches.
neighbors = ann.get_nns_by_vector(q, 5)

print("Nearest neighbors:", neighbors)

# Necessary note:
# Once the index is built, you can't add latest items.
# If latest historical data appears, the index have to be rebuilt.

Qdrant / Pinecone

Qdrant and Pinecone are like Google for embeddings.

You store plenty of vector “fingerprints” (plus extra tags like city/season), and when you will have a brand new fingerprint, you ask:

That is what makes them easier than rolling your personal retrieval: they handle fast search filtering!

Qdrant calls metadata payload, and you may filter search results using conditions.

# Example only (for intuition). Real code needs a running Qdrant instance + real embeddings.

from qdrant_client import QdrantClient, models

client = QdrantClient(url="http://localhost:6333")

collection = "time_series_windows"

# Pretend that is the embedding of the *current* time-series window
query_vector = [0.12, -0.03, 0.98, 0.44]  # shortened for readability

# Filter = "only consider past windows from Latest York in summer"
# Qdrant documentation shows filters built from FieldCondition + MatchValue. :contentReference[oaicite:3]{index=3}
query_filter = models.Filter(
    must=[
        models.FieldCondition(
            key="city",
            match=models.MatchValue(value="New York"),
        ),
        models.FieldCondition(
            key="season",
            match=models.MatchValue(value="summer"),
        ),
    ]
)

# In real usage, you’d call search/query and get back the closest matches
# plus their payload (metadata) for those who request it.
results = client.search(
    collection_name=collection,
    query_vector=query_vector,
    query_filter=query_filter,
    limit=5,
    with_payload=True,   # return metadata so you may inspect what you retrieved
)

print(results)

# What you'd do next (conceptually):
# - take the matched IDs
# - load the actual historical windows behind them
# - feed those windows (or their outcomes) into your forecasting model

Pinecone stores metadata key-value pairs alongside vectors and enables you to filter at query time (including $eq) and return metadata.

# Example only (for intuition). Real code needs an API key + an index host.

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index(host="INDEX_HOST")

# Pretend that is the embedding of the present time-series window
query_vector = [0.12, -0.03, 0.98, 0.44]  # shortened for readability

# Ask for probably the most similar past windows, but only where:
# city == "Latest York" AND season == "summer"
# Pinecone docs show query-time filtering and `$eq`. :contentReference[oaicite:5]{index=5}
res = index.query(
    namespace="windows",
    vector=query_vector,
    top_k=5,
    filter={
        "city": {"$eq": "Latest York"},
        "season": {"$eq": "summer"},
    },
    include_metadata=True,  # return tags so you may sanity-check matches
    include_values=False
)

print(res)

# Conceptually next:
# - use the returned IDs to fetch the underlying historical windows/outcomes
# - condition your forecast on those retrieved examples

Why do vector DBs help? They allow you to do similarity search + “SQL-like WHERE filters” in a single step, which is difficult to do cleanly with a DIY setup (each Qdrant payload filtering and Pinecone metadata filtering are first-class features of their docs.)

Each tool has its trade-offs. For example, FAISS is great for performance but isn’t fitted to frequent updates. Qdrant gives flexibility and real-time filtering. Pinecone is simple to establish but SaaS-only.

Retrieval + Forecasting: Methods to Mix Them

After knowing what to retrieve, the following step is to mix that information with the present input.

It may vary depending on the architecture and the duty. There are several strategies for doing this (see image below).

Strategies for Combining Retrieval and Forecasting. Image by Creator | Napkin AI.

A. Concatenation
Idea: treat retrieved context as “more input” by appending it to the prevailing sequence (quite common in retrieval-augmented generation setups).

Works well with transformer-based models like Chronos and doesn’t require architecture changes.

import torch

# x_current: the model's usual input sequence (e.g., last N timesteps or tokens)
# shape: [batch, time, d_model]   (or [batch, time] for those who think in tokens)
x_current = torch.randn(8, 128, 256)

# x_retrieved: retrieved context encoded within the SAME representation space
# e.g., embeddings for similar past windows (or their summaries)
# shape: [batch, retrieved_time, d_model]
x_retrieved = torch.randn(8, 32, 256)

# Easy fusion: just append retrieved context to the top of the input sequence
# Now the model sees: [current history ... + retrieved context ...]
x_fused = torch.cat([x_current, x_retrieved], dim=1)

# In practice, you'd also add:
# - an attention mask (so the model knows what’s real vs padded)
# - segment/type embeddings (so the model knows which part is retrieved context)
# Then feed x_fused to your transformer.

B. Cross-Attention Fusion
Idea: keep the “current input” and “retrieved context” separate, and let the model attend to retrieved context when it needs it. That is the core “fusion within the decoder via cross-attention” pattern utilized by retrieval-augmented architectures like FiD.

import torch

# current_repr: representation of the present time-series window
# shape: [batch, time, d_model]
current_repr = torch.randn(8, 128, 256)

# retrieved_repr: representation of retrieved windows (might be concatenated)
# shape: [batch, retrieved_time, d_model]
retrieved_repr = torch.randn(8, 64, 256)

# Consider cross-attention like:
# - Query (Q) comes from the present sequence
# - Keys/Values (K/V) come from retrieved context
Q = current_repr
K = retrieved_repr
V = retrieved_repr

# Attention scores: "How much should each current timestep have a look at each retrieved timestep?"
scores = torch.matmul(Q, K.transpose(-1, -2)) / (Q.size(-1) ** 0.5)

# Turn scores into weights (so that they sum to 1 across retrieved positions)
weights = torch.softmax(scores, dim=-1)

# Weighted sum of retrieved information (that is the “fused” retrieved signal)
retrieval_signal = torch.matmul(weights, V)

# Final fused representation: current info + retrieved info
# (Some models add, some concatenate, some use a learned projection)
fused = current_repr + retrieval_signal

# Then the forecasting head reads from `fused` to predict the longer term.

C. Mixture-of-Experts (MoE)
Idea: mix two “experts”:

the retrieval-based forecaster (non-parametric, case-based)

the base forecaster (parametric knowledge)

A “gate” decides which one to trust more at every time step.

import torch

# base_pred: forecast from the fundamental model (what it "learned in weights")
# shape: [batch, horizon]
base_pred = torch.randn(8, 24)

# retrieval_pred: forecast suggested by retrieved similar cases
# shape: [batch, horizon]
retrieval_pred = torch.randn(8, 24)

# context_for_gate: summary of the present situation (might be last hidden state)
# shape: [batch, d_model]
context_for_gate = torch.randn(8, 256)

# gate: a number between 0 and 1 saying "how much to trust retrieval"
# (In real models, it is a tiny neural net.)
gate = torch.sigmoid(torch.randn(8, 1))

# Mixture: convex combination
# - if gate ~ 1 -> trust retrieval more
# - if gate ~ 0 -> trust the bottom model more
final_pred = gate * retrieval_pred + (1 - gate) * base_pred

# In practice:
# - gate is perhaps timestep-dependent: shape [batch, horizon, 1]
# - you may additionally add training losses to stabilize routing/usage (common in MoE)

D. Channel Prompting
Idea: treat retrieved series as extra input channels/features (especially natural in multivariate time series, where each variable is a “channel”).

import torch

# x: multivariate time series input
# shape: [batch, time, channels]
# Example: channels might be [sales, price, promo_flag, temperature, ...]
x = torch.randn(8, 128, 5)

# retrieved_series_aligned: retrieved signal aligned to the identical time grid
# Example: average of the top-k similar past windows (or one representative neighbor)
# shape: [batch, time, retrieved_channels]
retrieved_series_aligned = torch.randn(8, 128, 2)

# Channel prompting = append retrieved channels as extra features
# Now the model gets "normal channels + retrieved channels"
x_prompted = torch.cat([x, retrieved_series_aligned], dim=-1)

# In practice you’d likely also include:
# - a mask or confidence rating for retrieved channels
# - normalization so retrieved signals are on a comparable scale
# Then feed x_prompted into the forecaster.

Some models even mix multiple methods.

A standard approach is to retrieve similar series, merge them using attention so the model can deal with probably the most relevant parts, after which feed them to an authority.

Wrap-up

Retrieval-Augmented Forecasting (RAF) lets your model learn from the past in a way that traditional time-series modeling doesn’t achieve.

It acts like an external memory that helps the model navigate unfamiliar situations with more confidence.

It’s easy to experiment with and delivers meaningful improvements in forecasting tasks.

Retrieval is just not an educational hype anymore, it’s already delivering ends in real-world systems.

Thanks for reading!

References

[1] J. Liu, Y. Zhang, Z. Wang , Retrieval-Augmented Time Series Forecasting (2025),
Source: https://arxiv.org/html/2505.04163v1

[2] UConn DSIS, TS-RAG: Time-Series Retrieval-Augmented Generation (n.d.),
Source: https://github.com/UConn-DSIS/TS-RAG

[3] Y. Zhang, H. Xu, X. Chen , Memory-Augmented Forecasting for Time Series with Rare Events (2024),
Source: https://arxiv.org/abs/2412.20810

Retrieval for Time-Series: How Looking Back Improves Forecasts

Helps in Time Series Forecasting

What Is Retrieval-Augmented Forecasting (RAF)?

What Do These Models Actually Retrieve?

How Do These Models Retrieve?

Embedding-Based Similarity

Retrieval Tools and Libraries

Qdrant / Pinecone

Retrieval + Forecasting: Methods to Mix Them

Wrap-up

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization

「データ不足」の壁を越える：合成ペルソナが日本のAI開発を加速

Microsoft has a brand new plan to prove what’s real and what’s AI online

Announcing our latest Gemini AI model

Train AI models with Unsloth and Hugging Face Jobs for FREE

Retrieval for Time-Series: How Looking Back Improves Forecasts

Helps in Time Series Forecasting

What Is Retrieval-Augmented Forecasting (RAF)?

What Do These Models Actually Retrieve?

How Do These Models Retrieve?

Embedding-Based Similarity

Retrieval Tools and Libraries

Qdrant / Pinecone

Retrieval + Forecasting: Methods to Mix Them

Wrap-up

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.