Long-context papers

Why '1M tokens' rarely means a million tokens. Lost in the Middle, LongBench, RULER, BABILong, NoCha, Loong, NoLiMa.

Nine papers covering the gap between advertised context window and the context window models can actually use. Lost in the Middle set the agenda with the U-shaped position curve. LongBench, InfBench, RULER, BABILong, NoCha, and Loong each push the harness in a different direction: broader suites, longer contexts, position robustness, reasoning, books, and multi-doc synthesis. LongBench v2 and NoLiMa are the 2025 updates that close the loop. NoLiMa especially is the paper to cite when a vendor claims a number you do not believe.