RAG, retrieval, factuality papers

BEIR, FActScore, RAGAS, Self-RAG, ARES, SAFE, SimpleQA, HalluLens.

Eight papers covering retrieval, RAG eval, and factuality scoring. BEIR is the retrieval baseline that survives. FActScore is the atomic-fact pattern that everything else borrows from. RAGAS, Self-RAG, and ARES wire those metrics into pipelines. SAFE is the search-augmented evaluator result that makes large-scale factuality scoring cheap. SimpleQA and HalluLens are the 2024 to 2025 work to read before claiming a hallucination rate.