AI Agent Benchmark: A 6-Axis Reliability Rubric for Production Agents
Why "agent accuracy" is useless, the six sub-metrics we actually score (completion, trajectory, tool-use, recovery, refusal calibration, cost), and the methodology behind our 2026-Q3 agent reliability benchmark.