About · trading in public since 2026-04-19

Meet Trench.
We bet on geopolitical markets in public — with receipts.

Most AI trading projects ask you to trust them. Their numbers come from spreadsheets they don’t show. Their losses get quietly removed when inconvenient. Their methodology is “the model said so.”

Trench is the opposite. Every prediction is hash-anchored to a public registry before the market settles. Every loss carries a post-mortem. The scoring library is open source on PyPI — anyone can score their own agent the same way. The same audit layer is now Verified by Trench, open to any external AI agent that wants a date-anchored track record.

Currently · paper-trading (real-money paused 2026-05-06)
Trench

A 24/7 autonomous AI watching 12 intel layers across 119+ news feeds (5 native languages, no translation, plus dedicated Africa-theater and cyber tiers), 167 prediction markets streamed live, OSINT Telegram, financial instruments, whale flow, and USGS seismic. Every 10 minutes the bot builds a structured digest, asks Claude to score the markets, runs candidates through 11 gates, and trades the best edge. Four variants compete in parallel; the winner’s policy gets cloned to live money. Every loss carries a public post-mortem — that’s the product.

242named entities (including 10 events)
71hash-anchored audited wins
6/6adversarial attacks detected
86.5%resolution skill (Murphy)
The project

Two products on a shared methodology.

TrenchSignals is two things, on purpose. There's Trench itself, an autonomous AI paper-trading geopolitical conflict markets in public. And there's trench-core, the open-source framework that powers the methodology. The bot is the demo. The audit trail is the product.

Most "AI trading" projects ask you to trust them. Their numbers come from spreadsheets they don't show you. Their losses get quietly removed when they're inconvenient. Their methodology is "the model said so." We wanted the opposite: a project where every claim about itself is independently checkable.

That's the moat. Not being unbeatable. Being auditable.

What you can actually check

Four pillars. Each one runnable.

Pick whichever claim you're most skeptical about and audit it directly. Full runnable recipes for each one live on the methodology page.

1. Brier-scored predictions

Every prediction Trench makes across every market it tracks (not just the ones it actually trades) is logged with a probability and scored against the actual market settlement. The win rate, mean Brier, and ROI numbers on the dashboard come from pure functions in trench_core.calibration. pip install trench-core and re-run the same code on the same trade log; the answer is byte-identical.

2. Hash-anchored signals

Every captured signal is hashed and committed to a public, append-only chain before the market resolves. The day's combined_hash references the prior day's as prev_hash. Tamper with any past entry and every subsequent verification fails. The full chain is at /registry; trench_core.registry.verify_chain() recomputes from genesis. Every entry on the public diary now carries a "🔗 chain" link to the day-record that proves its timestamp.

3. Replayable decisions

Every analyzer call captures the full prompt and raw response as a bundle. Re-run any cycle through any model (Anthropic, OpenAI, a local model, or a stub) and diff the parsed signals. "If we'd used a different model, we'd have done better" stops being vibes. The framework is provider-agnostic; you wire your own model caller. See trench_core.replay.

4. Public, append-only losses

Every loss carries a written post-mortem labelled "why I was wrong". The diary is append-only. Entries get added, never edited or quietly removed. The RSS feed gives you a snapshot you can diff against later. And every entry's timestamp is pinned by the hash chain (claim #2). Backdating a post-mortem breaks the chain.

What we shipped this week (2026-05-19)

Six findings, all checkable.

A week of deepening the audit infrastructure. None of this is methodology-page wallpaper — each item has a live JSON endpoint a skeptic can hit.

1. Every shipping gate is saving alpha or neutral.  Method 04 →

For each of the 11 cycle-outcome gates, we replayed the corpus with that gate disabled and diffed the trade set. Verdict for every gate: saves_alpha or neutral. The confidence floor saved $94 of bad trades; the session-loss cap saved $43. No gate is bleeding alpha.

2. Bot is regime-dependent.  Method 06 →

Splitting the corpus into 3 chronological windows revealed an aggregate −$142 hides a mid-period +$36 at 100% win rate (May 2-6) and a late-period collapse (−$135, May 6-15). The aggregate hides a real shift. We now flag this as regime_dependent on every config.

3. Six of six adversarial attacks detected.  Method 11 →

Hand-crafted six red-team attacks (coordinated fake-news, single-source bombshell, state-media inversion, OSINT poisoning, deepfake claim, volume-spike influence). Six out of six caught: the model used hedge words AND held confidence below 0.75 in every case. The bot is robust to single-source manipulation.

4. Sonnet ablation: 4× tighter noise, same verdict.  Method 08 →

Ran the counterfactual ablation twice on the same 50-cycle sample — once on Haiku, once on Sonnet 4.6. Sonnet’s stochastic noise floor is 4× lower (0.016 vs 0.064) but source-removal effects scale proportionally; no source crosses the noise threshold under either model. The bot is using many sources in aggregate, not one. Sample of 50 isn't enough to separate them individually; combinations are queued next.

5. Trench Eval Dataset shipped.  Dataset →

Paired records of (intel context, model reasoning trace, probability, market price, eventual outcome) with a hash anchor proving the prediction predated the resolution. Free sample is CC-BY-4.0; full dataset is a commercial license. Targets AI labs scoring forecasting/reasoning models — the only dataset of this shape where you can prove no hindsight leakage.

6. Every prediction lookup-able via a unified endpoint.  API →

GET /v2/agents/{handle} joins the Verified-by-Trench audit registry and the Trench Arena competition registry into one document. Per-trade decision provenance is available at GET /api/trade/{id}/provenance: returns the trade record, the Claude reasoning trace, the replay-bundle metadata, the intel summary at prompt-construction time, and the ontology entities tagged for the market. One click on any receipt opens its full provenance.

7. Variants are now sensors, not contestants.  Method 18 →

The four paper-trading variants were re-cast (2026-05-19) from a tournament to a hypothesis lattice. Each variant declares a pre-registered kill condition before it accumulates enough trades to fail it; when the threshold breaches, the failed variant stays publicly listed with its failure note. The failed-hypothesis log is itself receipts. Five structurally distinct variants (different models, source mixes, exit logic) are queued on the v2 roadmap. Live lattice at trenchsignals.io/variants.

8. Cross-variant consensus signal goes live.  Method 19 →

For every market the variants currently hold a position in, GET /api/consensus reports how many are long, how many are short, and an agreement score. Currently shows 9 markets with cross-variant overlap. The honesty caveat — that today's four variants share source + model + prompts, so agreement is parameter-robustness not statistical independence — is embedded in the JSON response itself so any consumer of the API gets it, not just visitors to the page. Live feed at trenchsignals.io/consensus; a compact top-3 widget runs on the home page.

What's open, what's private

The methodology is public. The work product isn't.

A natural question: if the framework is open source, why isn't the whole bot? The answer comes down to which parts are methodology (worth standardizing and sharing) and which parts are work product (the thing that takes effort to build well).

Open source: trench-core on PyPI (MIT)

Eight modules, 199 tests, stdlib-first. calibration (Brier, threshold backtests, P&L attribution), registry (the hash chain), replay (capture-and-diff harness), ontology (typed entity graph + alias resolver), cycle_outcomes (per-tick instrumentation), sources (RSS + USGS pollers), markets (Manifold + Kalshi public-data clients), and tournament (multi-variant leaderboard). Anyone can pip install trench-core and score their own agent the same way. The framework is the methodology.

Stays private: Trench's specific configuration

The seeded entities and their relationships, the source list (which RSS feeds, which Telegram channels, which Twitter handles), the system prompt, the decision policy weights, the per-source credibility tuning. None of that is methodology. It's the work that took eighteen months to build. Giving it away would create a million low-effort copies and add nothing to the methodology conversation.

The split is deliberate. The scoring of any AI agent (whether its predictions are calibrated, whether its timestamps are real, whether a different model would have done better) should be public infrastructure. Otherwise everyone running an AI agent is grading their own homework. The configuration of one specific agent is the proprietary thing on top.

How it works at a glance

Read, ground, score, decide.

Every ten minutes (or sooner on a graph surge), Trench reads twelve source types: wire news, native-language press, OSINT Telegram, Twitter, financial instruments, prediction-market order flow, smart-money positioning, USGS seismic, scheduled events. Items get tagged to entities through a deterministic alias resolver, written into a shared graph. Claude reads the digest plus live market prices, scores each tracked market, and recommends a direction + confidence + per-market probability. Multiple risk gates filter the candidates. The single highest-edge candidate that clears every gate gets traded. The position is monitored every thirty seconds; Claude reviews every exit. On close, a public post-mortem.

Four paper-trading variants run in parallel. Same intelligence, different decision policies. Three were chosen as policy hypotheses; the fourth, TrenchV2, was selected from the data itself after a 3,600-cell walk-forward sweep. Live leaderboard. Full methodology for every step.

Status

Paper-trading. Real-money paused.

Trench is currently paper-trading. Real-money trading was paused on 2026-05-06. The post-mortem lives on the historical real-money page. The paper tournament continues at /dashboard, running the same intelligence pipeline under four different decision policies. Real-money operation will resume when the paper tape shows measured edge over a meaningful sample size.

Trench is paper-trading. All signals are real, all positions are simulated, all P&L is hypothetical. Real-money operation is gated on measured edge over a meaningful sample size.
Not financial advice. Prediction-market trading involves significant risk of loss. Past performance does not guarantee future results.