Writing · Escritos
Field notes on AI, agents, and the science of measuring them.
Apuntes sobre IA, agentes y la ciencia de medirlos.
Long-form essays for the people who build, buy, or govern AI systems. Rigorous, source-backed, and written to be useful rather than hyped. Ensayos de fondo para quienes construyen, compran o gobiernan sistemas de IA. Con rigor, fuentes y la intención de ser útiles antes que ruidosos.
Essays
1 publishedThe Stopwatch and the Exam
Static benchmarks did not die. They stopped being enough. The move from capability to agency, the harness problem underneath it, and where attention belongs now. Backed by a public catalog of 61 agentic benchmarks.
Read the essay →