The scoring stack is open source.
The methodology behind TrenchSignals — calibration, hash-chain
pre-registration, replay harness, typed entity graph, multi-variant tournament — is now
an MIT-licensed Python framework called
trench-core,
shipping today as 0.8.0 on
PyPI. Eight modules, 243 tests,
stdlib-first. Anyone can pip install trench-core and audit our claims, or
use the same plumbing to score their own agent.
Why open-source the methodology and not the bot?
Most "AI trading" projects ask you to trust them. Their numbers come from spreadsheets they don't show you. Their losses get quietly removed when they're inconvenient. Their methodology is "Claude said so." We wanted the opposite: a project where every claim about itself is independently checkable.
The bot's specific configuration — the prompts, the seeded entities, the source list, the decision-policy weights — is the part that takes work to build well. That stays private, because giving away the work would just create a million low-effort copies. But the scoring of any AI agent — whether its predictions are calibrated, whether its timestamps are real, whether a different model would have done better — that should be public infrastructure. Otherwise everyone running an AI agent is grading their own homework.
The moat isn't the bot's performance. The moat is that the methodology is checkable.
What's in trench-core
Eight modules, each independently importable. Six of the eight need only the standard
library; only sources declares an optional runtime dependency
(feedparser, for RSS polling).
list[dict] in, dict out.verify_chain() raises ChainBroken on tamper.requests dep. Trading auth deliberately out of scope.Install
pip install trench-core # core + most modules (stdlib only)
pip install 'trench-core[sources]' # adds feedparser for RSS polling
pip install 'trench-core[all]' # everything
The four claims you can actually check
Pick the one you're most skeptical about and audit it directly.
1. Brier-scored predictions
trench_core.calibration is the same code that produces the win-rate, mean
Brier, and ROI numbers on the dashboard. The same inputs return
the same outputs every time — pure functions, no globals, no I/O. Compare what we
publish against what you compute yourself:
curl https://trenchsignals.io/api/calibration > calib.json
# Then run calibration_report() locally on the same trade log;
# the output is byte-identical.
2. Hash-anchored signals
Every captured signal is hashed and committed to a public, append-only chain
(/registry) before the market resolves. The day's
combined_hash references the prior day's as prev_hash; tamper
with any past entry and every subsequent verification fails. Recompute from genesis:
from trench_core.registry import verify_chain
verify_chain("path/to/registry/") # raises ChainBroken if anything moved
Every entry on /log now carries a "🔗 chain" link to the registry day-record that proves its timestamp.
3. Replayable decisions
The framework is provider-agnostic — you supply the model caller, so the same code works against Anthropic, OpenAI, a local model, or a stub for tests. Re-running any captured cycle through a different model and diffing the parsed output stops being vibes:
from trench_core.replay import load_bundles, replay_bundle, diff_signals
bundles = load_bundles("bundles.jsonl")
result = replay_bundle(
bundle=bundles[-1],
model="claude-haiku-4-5",
model_caller=your_llm_call,
)
print(diff_signals(result.bundle.parsed, result.candidate_parsed))
# {'direction': {...}, 'confidence': {..., 'delta': -0.07}, ...}
4. Public, append-only losses
Every loss carries a written post-mortem. The diary is append-only — entries get added, never edited, never quietly removed. The RSS feed at /feed.xml gives you a snapshot you can diff against later. And every entry's timestamp is pinned by the hash chain (claim #2).
What's deliberately not in the framework
Some things stay private because they're domain-specific to TrenchSignals; some stay private because shipping them would require building deploy-target glue that doesn't generalize. Specifically:
- The seeded entities and source list — TrenchSignals' specific ontology of conflict markets is the work product, not the methodology. Bring your own.
- The decision-policy weights — the gates that turn a signal into a trade are tuned on private data and don't generalize.
- Trading APIs — Kalshi auth, Polymarket's EIP-712 / Builder /
Relayer stack. Both are too coupled to specific deployments. Subclass
KalshiPublicClientand add the auth layer if you need to trade. - Multi-provider AI abstraction — Anthropic-first by design. Wrap
your own analyzer for OpenAI; the
model_callersignature is intentionally small ((prompt, model) -> str) so any LLM works.
Status and what's next
The package is alpha — 0.x. Versions break between minor releases until
1.0.0. Pin exactly if you depend on it. The bot itself ran on this exact
framework before the extraction; the
changelog
describes each phase of the work.
Next: a public bundle export so anyone can replay TrenchSignals' own captured cycles,
not just their own. Watch the changelog for 0.9.x.
If you build something on top of trench-core — or find a bug, or want a
feature — open an issue on GitHub.
The framework is the same code that powers Trench in production; PRs that improve it
improve us too.
Acknowledgements
trench-core exists because a handful of agentic projects in adjacent spaces
are also building toward verifiable AI — pre-registration cultures in forecasting,
replay-bundle conventions in evals, calibration norms in superforecasting. We didn't
invent these. We assembled them into a stack that works for an agent that trades. If
you're shipping in any of these threads, say
hi — we'd rather build interop than fork.