Evaluation

The Decontaminated Evaluation of GPT-4 Decontamination of the evaluation data It’s contaminated Is GPT-4 good at these exams? Conclusion

GPT-4 won’t be your lawyer anytime soonThe main points of the contamination for every exam are given page 30 of the report.Among the many 49 exams used for evaluation, 12 were found completely absent...

Traditional Versus Neural Metrics for Machine Translation Evaluation

100+ latest metrics since 2010COMET and BLEURT rank at the highest while BLEU appears at the underside. Interestingly, you can even notice on this table that there are some metrics that I didn’t write...

Model Evaluation in Time Series Forecasting

Introducing backtesting for time series using the Skforecast libraryBelow, there are the three described backtesting methods with a random forest regressor used as autoregression.When taking a look at the implementation, the difference between the...

Recent posts

Popular categories

ASK ANA