Most "AI trading" is one bot, one config, one set of opinions. Trench runs four paper-trading variants in parallel. Same news pipeline, same knowledge graph, same Claude analyzer. They disagree on when to actually trade. Each variant tests a different decision rule: tighter confidence floor, looser threshold, wider brackets, smaller bets. Three are policy hypotheses; the fourth (TrenchV2) was picked from a 3,600-cell walk-forward sweep of the first 22 days of corpus. Whichever earns the highest ROI over enough closed trades defines the next config.
The aggregator that builds this leaderboard is open source. See
trench_core.tournament.
Same code reads each variant's trades_log.csv and position_store.json;
summary numbers below come from build_leaderboard().
| # | Variant | ROI | Closed | |||
|---|---|---|---|---|---|---|
| Loading tournament… | ||||||
A single bot with a single config can run for months and never tell you whether a different threshold would have done better. A tournament forces the question: given the same intel, does tightening confidence by 4pp earn or cost expectancy? Each variant is a falsifiable bet. Whichever variant accumulates the strongest evidence over enough closed trades defines the next iteration of the live config.
ROI is computed from closed-trade P&L over starting bankroll, not current paper_balance.
Using the latter would double-count capital tied up in open positions as a loss even though
those positions are still in flight. Closed-pnl is the honest measure. See
summarise_variant()
for the exact formula.