What Trench did
Between April 21 and May 12, four paper-trading variants of Trench (baseline, high-conviction, wide-net, and the new TrenchV2) each ran the bot's full intelligence pipeline every ten minutes and made their own decisions on the same signals. Same news, same Claude analysis, different gates.
Across the four variants combined: +$285 in wins, −$218 in losses, net ~$67 on $4,000 of paper bankroll. That's barely above zero. We're not pretending otherwise.
The interesting part isn't the P&L. The interesting part is which kinds of trades worked, which kinds didn't, and what that tells us about how to size and time the next 28.
The single biggest pattern: the bot was right on the Iran deal
By a wide margin, the most-traded thesis this period was NO on a US-Iran nuclear deal within the year. Three of the four variants concentrated heavily here. As of today, the bot's read on Iran escalation is 45 percentage points on a 0-1 scale; the volume-weighted Manifold consensus is 79 percentage points. A 34-point gap.
The crowd thinks something dramatic happens. Trench thinks nothing dramatic happens. One of us is wrong. We'll know which over the next 90 days as the June, July, August, and September contracts settle. The bot's per-contract probabilities are all auditable at /audit?trade_id= for any individual trade in the tape.
What went wrong, specifically
Of the 28 closed losses, our lesson classifier (lexical patterns + Haiku for the long tail) labels them like this:
- Thesis invalidated (19 losses, −$151): the market moved against the bot after a new headline contradicted the entry thesis. The bot recognized this and exited, often cleanly. These are the cheapest losses to take — small, fast, well-reasoned exits.
- Bracket overshoot (4 losses, −$58): the stop-loss fired on routine volatility, not on a thesis change. These are the worst losses: the bot exited a position that was probably going to recover. They cluster on the tightest-bracket variant (Baseline, SL=10%).
- Mispriced entry (4 losses, −$7): the bot admitted in the exit reasoning that it shouldn't have entered. Lottery-ticket markets, expiry-too-near positions. The amount is small because these were caught fast.
- Wrong side of consensus (1 loss, −$2): whale flow pointed one way, Trench went the other, the whales were right. We weight whale signals as one input among many; this loss says we should weight them slightly higher.
Three example loss cards from the tape, all linked to their full audit and chain anchor:
Every loss in the tape is one click away on the Wall of Transparency, filterable by lesson type and variant.
What we changed because of this
Two structural changes shipped this period:
1. Backtest infrastructure with walk-forward validation
We built a replay engine that walks every historical signal forward under counterfactual parameter sets, then validates against a held-out time window. First run found a wide-bracket config that looked great on in-sample data (+42% ROI). Walk-forward killed it: zero of the top-10 in-sample winners produced any closed trades on the test fold. Open-at-end selection bias. The config we eventually deployed (TrenchV2) was picked by bootstrap-ranking under a close-rate constraint, not by raw ROI. Full write-up at /backtest.
2. TrenchV2 — first config picked from data, not intuition
Result: TP=30%, SL=30%, confidence=0.70, edge=0.03,
size=$30. Symmetric brackets (the asymmetric 20%/10%
bracket the live bot was using was responsible for most of the
bracket-overshoot losses above). Looser confidence floor (the
sharpest band of confidence in the data was 0.70-0.74, not the
0.74+ we'd been using). Bootstrap P(profitable) on this
config: 57.3%. The same metric on the existing baseline: 0.2%.
A 290× lift in our self-assessed probability of breaking even.
Falsifiable in 4 weeks per
tasks/trenchv2-hypothesis.md.
What's on the table for next month
Where to verify any of this
Every claim in this post traces to a public surface:
- The 46 closed trades, win rate, P&L: /api/tournament + /v1/lessons/stats
- The 34pp Trench-vs-crowd gap: /shadow-api/api/forecast-comparison
- Every individual loss with reasoning: /lessons
- Per-trade audit with inputs/processing/hash/result: /audit
- Backtest methodology and live runs: /backtest
- Hash chain pinning every signal's timestamp: /registry
If anything in this post doesn't match what those pages say, the pages are the source of truth. Tell us hello@trenchsignals.io and we'll fix the post.
Trench Monthly is a recurring write-up of what the bot did in the last 30 days. Issue #2 lands the second week of June, after the first batch of June Iran contracts settle. To get it in your inbox, subscribe at trenchsignals.io.