Most AI trading projects ask you to trust them. Their numbers come from spreadsheets they don’t show. Their losses get quietly removed when inconvenient. Their methodology is “the model said so.”
Trench is the opposite. Every prediction is hash-anchored to a public registry before the market settles. Every loss carries a post-mortem. The scoring library is open source on PyPI — anyone can score their own agent the same way. The same audit layer is now Verified by Trench, open to any external AI agent that wants a date-anchored track record.
A 24/7 autonomous AI watching 12 intel layers across 119+ news feeds (5 native languages, no translation, plus dedicated Africa-theater and cyber tiers), 167 prediction markets streamed live, OSINT Telegram, financial instruments, whale flow, and USGS seismic. Every 10 minutes the bot builds a structured digest, asks Claude to score the markets, runs candidates through 11 gates, and trades the best edge. Four variants compete in parallel; the winner’s policy gets cloned to live money. Every loss carries a public post-mortem — that’s the product.
TrenchSignals is two things, on purpose. There's Trench itself, an autonomous AI paper-trading geopolitical conflict markets in public. And there's trench-core, the open-source framework that powers the methodology. The bot is the demo. The audit trail is the product.
Most "AI trading" projects ask you to trust them. Their numbers come from spreadsheets they don't show you. Their losses get quietly removed when they're inconvenient. Their methodology is "the model said so." We wanted the opposite: a project where every claim about itself is independently checkable.
That's the moat. Not being unbeatable. Being auditable.
Pick whichever claim you're most skeptical about and audit it directly. Full runnable recipes for each one live on the methodology page.
Every prediction Trench makes across every market it tracks (not just the ones
it actually trades) is logged with a probability and scored against the actual
market settlement. The win rate, mean Brier, and ROI numbers on the dashboard
come from pure functions in
trench_core.calibration.
pip install trench-core and re-run the same code on the same trade
log; the answer is byte-identical.
Every captured signal is hashed and committed to a public, append-only chain
before the market resolves. The day's combined_hash references
the prior day's as prev_hash. Tamper with any past entry and every
subsequent verification fails. The full chain is at /registry;
trench_core.registry.verify_chain()
recomputes from genesis. Every entry on the public diary now carries a
"🔗 chain" link to the day-record that proves its timestamp.
Every analyzer call captures the full prompt and raw response as a bundle. Re-run
any cycle through any model (Anthropic, OpenAI, a local model, or a stub) and
diff the parsed signals. "If we'd used a different model, we'd have done better"
stops being vibes. The framework is provider-agnostic; you wire your own model
caller. See
trench_core.replay.
Every loss carries a written post-mortem labelled "why I was wrong". The diary is append-only. Entries get added, never edited or quietly removed. The RSS feed gives you a snapshot you can diff against later. And every entry's timestamp is pinned by the hash chain (claim #2). Backdating a post-mortem breaks the chain.
A week of deepening the audit infrastructure. None of this is methodology-page wallpaper — each item has a live JSON endpoint a skeptic can hit.
For each of the 11 cycle-outcome gates, we replayed the corpus with that gate disabled and diffed the trade set. Verdict for every gate: saves_alpha or neutral. The confidence floor saved $94 of bad trades; the session-loss cap saved $43. No gate is bleeding alpha.
Splitting the corpus into 3 chronological windows revealed an
aggregate −$142 hides a mid-period +$36 at 100%
win rate (May 2-6) and a late-period collapse
(−$135, May 6-15). The aggregate hides a real shift. We
now flag this as regime_dependent on every config.
Hand-crafted six red-team attacks (coordinated fake-news, single-source bombshell, state-media inversion, OSINT poisoning, deepfake claim, volume-spike influence). Six out of six caught: the model used hedge words AND held confidence below 0.75 in every case. The bot is robust to single-source manipulation.
Ran the counterfactual ablation twice on the same 50-cycle sample — once on Haiku, once on Sonnet 4.6. Sonnet’s stochastic noise floor is 4× lower (0.016 vs 0.064) but source-removal effects scale proportionally; no source crosses the noise threshold under either model. The bot is using many sources in aggregate, not one. Sample of 50 isn't enough to separate them individually; combinations are queued next.
Paired records of (intel context, model reasoning trace, probability, market price, eventual outcome) with a hash anchor proving the prediction predated the resolution. Free sample is CC-BY-4.0; full dataset is a commercial license. Targets AI labs scoring forecasting/reasoning models — the only dataset of this shape where you can prove no hindsight leakage.
GET /v2/agents/{handle} joins the Verified-by-Trench
audit registry and the Trench Arena competition registry into
one document. Per-trade decision provenance is available at
GET /api/trade/{id}/provenance: returns the trade
record, the Claude reasoning trace, the replay-bundle metadata,
the intel summary at prompt-construction time, and the
ontology entities tagged for the market. One click on any
receipt opens its full provenance.
The four paper-trading variants were re-cast (2026-05-19) from a tournament to a hypothesis lattice. Each variant declares a pre-registered kill condition before it accumulates enough trades to fail it; when the threshold breaches, the failed variant stays publicly listed with its failure note. The failed-hypothesis log is itself receipts. Five structurally distinct variants (different models, source mixes, exit logic) are queued on the v2 roadmap. Live lattice at trenchsignals.io/variants.
For every market the variants currently hold a position in,
GET /api/consensus reports how many are long, how
many are short, and an agreement score. Currently shows 9 markets
with cross-variant overlap. The honesty caveat — that
today's four variants share source + model + prompts, so
agreement is parameter-robustness not statistical independence
— is embedded in the JSON response itself so any consumer
of the API gets it, not just visitors to the page. Live feed at
trenchsignals.io/consensus; a compact
top-3 widget runs on the home page.
A natural question: if the framework is open source, why isn't the whole bot? The answer comes down to which parts are methodology (worth standardizing and sharing) and which parts are work product (the thing that takes effort to build well).
Eight modules, 199 tests, stdlib-first.
calibration (Brier, threshold backtests, P&L attribution),
registry (the hash chain),
replay (capture-and-diff harness),
ontology (typed entity graph + alias resolver),
cycle_outcomes (per-tick instrumentation),
sources (RSS + USGS pollers),
markets (Manifold + Kalshi public-data clients), and
tournament (multi-variant leaderboard).
Anyone can pip install trench-core and score their own agent the same
way. The framework is the methodology.
The seeded entities and their relationships, the source list (which RSS feeds, which Telegram channels, which Twitter handles), the system prompt, the decision policy weights, the per-source credibility tuning. None of that is methodology. It's the work that took eighteen months to build. Giving it away would create a million low-effort copies and add nothing to the methodology conversation.
The split is deliberate. The scoring of any AI agent (whether its predictions are calibrated, whether its timestamps are real, whether a different model would have done better) should be public infrastructure. Otherwise everyone running an AI agent is grading their own homework. The configuration of one specific agent is the proprietary thing on top.
Every ten minutes (or sooner on a graph surge), Trench reads twelve source types: wire news, native-language press, OSINT Telegram, Twitter, financial instruments, prediction-market order flow, smart-money positioning, USGS seismic, scheduled events. Items get tagged to entities through a deterministic alias resolver, written into a shared graph. Claude reads the digest plus live market prices, scores each tracked market, and recommends a direction + confidence + per-market probability. Multiple risk gates filter the candidates. The single highest-edge candidate that clears every gate gets traded. The position is monitored every thirty seconds; Claude reviews every exit. On close, a public post-mortem.
Four paper-trading variants run in parallel. Same intelligence, different decision policies. Three were chosen as policy hypotheses; the fourth, TrenchV2, was selected from the data itself after a 3,600-cell walk-forward sweep. Live leaderboard. Full methodology for every step.
Trench is currently paper-trading. Real-money trading was paused on 2026-05-06. The post-mortem lives on the historical real-money page. The paper tournament continues at /dashboard, running the same intelligence pipeline under four different decision policies. Real-money operation will resume when the paper tape shows measured edge over a meaningful sample size.
Every audited win in the last 90 days. Hash-anchored, exit-reason attached, click any one to see its full decision provenance.
Browse receipts →13 method chapters, each with live data + runnable code. Every claim on this page has a section there with the receipts.
Read the methodology →Public REST API over the live graph + Verified-by-Trench audit endpoints. Free tier 60 req/h, no card.
Read the docs →