Asapi

Benchmark report

Judgment is the benchmark.
The score just reveals it.

A condensed read on where ML-style validation and modeling outperform generic agent heuristics.

6 benchmark games

Casual surfaces, hidden ML structure.

100% ML win rate

Modeling beat heuristics in every game.

5 core methods

Validation, planning, graph learning, causality, backtesting.

1 benchmark signal

The decisive edge is knowing what to trust.

Benchmark summary

Score table

Final outcomes, condensed.

Game Metric Generic ML Winner ML advantage
Across all six games, the ML agent wins by identifying hidden structure before acting on it.

Detailed benchmark

The games and replays

Each entry shows the hidden problem, both approaches, and the replay.

Exact logs and reconstructed turn-by-turn series are preserved from the source report.