Papers & Reports

New! Epidemiology of Large Language Models

Our Layer 1 benchmark paper introducing the observational distribution evaluation framework.

Read the Technical Report
New! Benchmark Code (Layer 1)

Explore the full codebase used to run our benchmark evaluations.

Go to GitHub
New! Hugging Face Workspace

Browse benchmark datasets, predictions, and evaluation artifacts on Hugging Face.

Visit Hugging Face