generate customer journeys that appear smooth and fascinating, but evaluating whether these journeys are structurally sound stays difficult for current methods.
This text introduces Continuity, Deepening, and Progression (CDP) — three deterministic, content-structure-based metrics for evaluating...
an article about overengineering a RAG system, adding fancy things like query optimization, detailed chunking with neighbors and keys, together with expanding the context.
The argument against this type of work is that for a...
a decade working in analytics, I firmly imagine that observability and evaluation are essential for any LLM application running in production. Monitoring and metrics aren’t just nice-to-haves. They ensure your product is functioning...
to Constructing an Overengineered Retrieval System. That one was about constructing the whole system. This one is about doing the evals for it.
Within the previous article, I went through different parts of a RAG...
at IBM TechXchange, I spent loads of time around teams who were already running LLM systems in production. One conversation that stayed with me got here from LangSmith, the parents who construct tooling...
concerning the idea of using AI to judge AI, also often called “LLM-as-a-Judge,” my response was:
We live in a world where even toilet paper is marketed as “AI-powered.” I assumed this was just...
:
👉
👉
of my post series on retrieval evaluation measures for RAG pipelines, we took an in depth have a look at the binary retrieval evaluation metrics. More specifically, in Part 1, we went...
, one could argue that the majority of the work resembles traditional software development greater than ML or Data Science, considering we regularly use off-the-shelf foundation models as a substitute of coaching them ourselves....