Series Proving Agent Quality With Data
A series of experiments testing whether specialized AI agents on local models can match cloud API quality for personal task management.
- Building a Production Eval System for AI Agents
What we learned building a quality measurement system for a multi-agent AI, drawing on practitioner wisdom from Hamel Husain, Eugene Yan, Braintrust, and applied-llms.org.