Writing · Escritos

Field notes on AI, agents, and the science of measuring them.

Apuntes sobre IA, agentes y la ciencia de medirlos.

Long-form essays for the people who build, buy, or govern AI systems. Rigorous, source-backed, and written to be useful rather than hyped. Ensayos de fondo para quienes construyen, compran o gobiernan sistemas de IA. Con rigor, fuentes y la intención de ser útiles antes que ruidosos.

Essays

1 published

EN AI Evaluation · June 2026 · 18 min read

The Stopwatch and the Exam

Static benchmarks did not die. They stopped being enough. The move from capability to agency, the harness problem underneath it, and where attention belongs now. Backed by a public catalog of 69 agentic benchmarks.

Read the essay →