Understanding Evaluation Metrics for Large Language Models by Anupam Tiwari
As large language models (LLMs) become more capable, evaluating their outputs becomes increasingly important. This presentation provides a concise overview of the most commonly used LLM evaluation metrics ranging from traditional n-gram based measures like BLEU and ROUGE to modern semantic and human-preference-based approaches. It is intended as a quick reference for anyone looking to understand how LLM performance is measured in practice.
https://orcid.org/0000-0002-9097-2246

0 comments:
Post a Comment