Evaluating

Evaluating Where to Implement Agentic AI in Your Business

Agentic AI has the potential to reshape several industries by enabling autonomous decision-making, real-time adaptability, and proactive problem-solving. As businesses strive to reinforce operational efficiency, they face the challenge of deciding how and where...

Select the Right One: Evaluating Topic Models for Business Intelligence

are utilized in businesses to categorise brand-related text datasets (akin to product and site reviews, surveys, and social media comments) and to trace how customer satisfaction metrics change over time. There's a myriad of...

LLM-as-a-Judge: A Scalable Solution for Evaluating Language Models Using Language Models

The LLM-as-a-Judge framework is a scalable, automated alternative to human evaluations, which are sometimes costly, slow, and limited by the amount of responses they will feasibly assess. By utilizing an LLM to evaluate the...

Evaluating Model Retraining Strategies

How data drift and concept drift matter to decide on the correct retraining strategy?The black swan event occurred at step 39, the errors of all models suddenly increased at this point. Nevertheless, after retraining...

Evaluating Edge Detection? Don’t Use RMSE, PSNR or SSIM

Empirical and theoretical evidence for why Figure of Merit (FOM) is the very best edge-detection evaluation metricImage segmentation and edge detection are closely related tasks. Take this output from a coastal segmentation model for...

Evaluating RAG Pipelines with Ragas

Leveraging the Ragas framework to find out the performance of your retrieval augmented generation (RAG) pipelineProceed reading on Towards Data Science »

The Language of Locations: Evaluating Generative AI’s Geocoding Proficiency

Case Study: Unstructured Location Descriptions of Automobile AccidentsData Collection and PreparationTo check out and quantify the geocoding capabilities of LLMs, a listing of 100 unstructured location descriptions of car accidents in Minnesota were randomly...

Every little thing You Should Know About Evaluating Large Language Models

Open Language ModelsFrom perplexity to measuring general intelligenceAs open source language models develop into more available, getting lost in all the choices is straightforward.How can we determine their performance and compare them? And the...

Recent posts

Popular categories

ASK ANA