The scoring stack is open source.

The methodology behind TrenchSignals — calibration, hash-chain pre-registration, replay harness, typed entity graph, multi-variant tournament — is now an MIT-licensed Python framework called trench-core, shipping today as 0.8.0 on PyPI. Eight modules, 243 tests, stdlib-first. Anyone can pip install trench-core and audit our claims, or use the same plumbing to score their own agent.

Why open-source the methodology and not the bot?

Most "AI trading" projects ask you to trust them. Their numbers come from spreadsheets they don't show you. Their losses get quietly removed when they're inconvenient. Their methodology is "Claude said so." We wanted the opposite: a project where every claim about itself is independently checkable.

The bot's specific configuration — the prompts, the seeded entities, the source list, the decision-policy weights — is the part that takes work to build well. That stays private, because giving away the work would just create a million low-effort copies. But the scoring of any AI agent — whether its predictions are calibrated, whether its timestamps are real, whether a different model would have done better — that should be public infrastructure. Otherwise everyone running an AI agent is grading their own homework.

The moat isn't the bot's performance. The moat is that the methodology is checkable.

What's in trench-core

Eight modules, each independently importable. Six of the eight need only the standard library; only sources declares an optional runtime dependency (feedparser, for RSS polling).

trench_core.calibration
Brier scoring, calibration curves, threshold backtests, P&L attribution. Pure functions: list[dict] in, dict out.
trench_core.registry
Append-only SHA-256 hash chain over agent bundle JSONLs. verify_chain() raises ChainBroken on tamper.
trench_core.replay
Capture every analyzer call. Re-run any cycle through any model. Diff parsed signals. Provider-agnostic.
trench_core.ontology
Typed entity graph + alias resolver, SQLite-backed. The largest extraction. Subclass-friendly base class.
trench_core.cycle_outcomes
"Exactly one outcome line per loop iteration" instrumentation. Pipe-delimited, category-validated, parseable.
trench_core.sources
Generic RSS poller (parallel, dedup) + USGS seismic poller (zero-dep stdlib only).
trench_core.markets
Read-only Manifold + Kalshi public-data clients. No requests dep. Trading auth deliberately out of scope.
trench_core.tournament
Multi-variant leaderboard aggregator. Reads per-variant trade logs and position stores; ranks by ROI.

Install

pip install trench-core              # core + most modules (stdlib only)
pip install 'trench-core[sources]'   # adds feedparser for RSS polling
pip install 'trench-core[all]'       # everything

The four claims you can actually check

Pick the one you're most skeptical about and audit it directly.

1. Brier-scored predictions

trench_core.calibration is the same code that produces the win-rate, mean Brier, and ROI numbers on the dashboard. The same inputs return the same outputs every time — pure functions, no globals, no I/O. Compare what we publish against what you compute yourself:

curl https://trenchsignals.io/api/calibration > calib.json
# Then run calibration_report() locally on the same trade log;
# the output is byte-identical.

2. Hash-anchored signals

Every captured signal is hashed and committed to a public, append-only chain (/registry) before the market resolves. The day's combined_hash references the prior day's as prev_hash; tamper with any past entry and every subsequent verification fails. Recompute from genesis:

from trench_core.registry import verify_chain
verify_chain("path/to/registry/")    # raises ChainBroken if anything moved

Every entry on /log now carries a "🔗 chain" link to the registry day-record that proves its timestamp.

3. Replayable decisions

The framework is provider-agnostic — you supply the model caller, so the same code works against Anthropic, OpenAI, a local model, or a stub for tests. Re-running any captured cycle through a different model and diffing the parsed output stops being vibes:

from trench_core.replay import load_bundles, replay_bundle, diff_signals

bundles = load_bundles("bundles.jsonl")
result  = replay_bundle(
    bundle=bundles[-1],
    model="claude-haiku-4-5",
    model_caller=your_llm_call,
)
print(diff_signals(result.bundle.parsed, result.candidate_parsed))
# {'direction': {...}, 'confidence': {..., 'delta': -0.07}, ...}

4. Public, append-only losses

Every loss carries a written post-mortem. The diary is append-only — entries get added, never edited, never quietly removed. The RSS feed at /feed.xml gives you a snapshot you can diff against later. And every entry's timestamp is pinned by the hash chain (claim #2).

What's deliberately not in the framework

Some things stay private because they're domain-specific to TrenchSignals; some stay private because shipping them would require building deploy-target glue that doesn't generalize. Specifically:

Status and what's next

The package is alpha — 0.x. Versions break between minor releases until 1.0.0. Pin exactly if you depend on it. The bot itself ran on this exact framework before the extraction; the changelog describes each phase of the work.

Next: a public bundle export so anyone can replay TrenchSignals' own captured cycles, not just their own. Watch the changelog for 0.9.x.

If you build something on top of trench-core — or find a bug, or want a feature — open an issue on GitHub. The framework is the same code that powers Trench in production; PRs that improve it improve us too.

Acknowledgements

trench-core exists because a handful of agentic projects in adjacent spaces are also building toward verifiable AI — pre-registration cultures in forecasting, replay-bundle conventions in evals, calibration norms in superforecasting. We didn't invent these. We assembled them into a stack that works for an agent that trades. If you're shipping in any of these threads, say hi — we'd rather build interop than fork.