Changelog · append-only

Every notable change to Trench.

Reverse-chronological log of what shipped, what changed, what we got wrong and fixed. The diary at /log tracks every trade and signal; this tracks every meaningful change to how Trench works. Subscribe for the weekly summary in your inbox via the homepage form.

2026-05-12

trench-bot: pip-installable agent, plus IP privatization pass

The bot now ships as a local-install Python package. One command to install, one to configure, one to run a single analysis cycle:

pip install -e .
trench-bot init
trench-bot run --tick-once

Defaults are conservative and cheap: paper-trade=true, $1,000 paper bankroll, Claude Haiku for first-touch (under $0.05 per cycle). State lives under ~/.trench/. Real trading requires --live plus venue credentials. See /api for the quick-start. PyPI publish itself is queued behind the bot repo carving a public mirror; the local install path works today.

Same day: the privatization pass tightened up which curation values leak through the public surface. The per-source numeric credibility weights (previously visible as a 21-row table on the live dashboard) are now bucketed into categorical tiers (Primary, Secondary, Tertiary) on both /api/ontology/source-credibility and /v1/ontology/source-credibility, and on the dashboard renderers. Marketing surfaces still show the structure ("we curate per-source credibility"); the actionable numeric values are private. The /audit replay endpoint was already architected with the system prompt private, so no change needed there. The frame used: a competitor should be able to verify claims without being able to clone the system. Proof artifacts stay public; decision artifacts can be private.

Also: fractional-Kelly sizing is live on TrenchV2 since the backtest confirmed +0.25pp ROI on both train and test folds; pinned-market AI-vs-Manifold-crowd comparison (one curated question per theater) replaced the loose keyword aggregate on /api/forecast-comparison; home page restructured with a proof-fold above the moat row that live-fetches the most recent loss card and three-cell tournament state from /api/tournament.

2026-05-11

Backtest suite + TrenchV2 (4th tournament variant)

Shipped a counterfactual replay engine that walks every historical signal under arbitrary parameter sets, plus walk-forward validation, Monte Carlo bootstrap on the closed-trade tape, sensitivity analysis, per-theater and per-agent slices, and a weekly systemd timer that re-runs the whole pipeline every Sunday at 06:00 UTC. Public output at /backtest and /v1/backtest/latest.

The first deliverable was a 4th tournament variant, TrenchV2. Its parameters (TP=0.30, SL=0.30, conf=0.70, edge=0.03, size=$30) were picked from a 3,600-cell walk-forward sweep ranked by P(ROI>0) under a close-rate filter. First config in project history selected from data instead of intuition. Falsifiable in four weeks: if it doesn't outperform baseline by 2026-06-08, the service stops and a post-mortem ships.

Honest interim finding: the bot loses money on the current 22-day corpus with 99.8% confidence after fees. The backtest framework rejected several intuitive improvements (theater diversification cap, thesis-flip exits) by showing they made things worse on real data. Paying for itself before any new code shipped.

2026-05-07

Public framework launch + site repositioned around the moat

Full launch post: /posts/trench-core-launch.html · Short URL: /launch

The scoring stack is now open source as trench-core on GitHub and trench-core 0.8.0 on PyPI (MIT licence). Eight modules, 199 tests, stdlib-first:

  • calibration: Brier scoring, calibration curves, threshold backtests, P&L attribution
  • registry: public, append-only SHA-256 hash chain over agent bundles
  • replay: capture-then-replay harness; re-run any cycle through any model
  • ontology: typed entity graph + alias resolver, SQLite-backed (largest extraction)
  • cycle_outcomes: one structured outcome line per loop iteration
  • sources: generic RSS poller + USGS seismic poller
  • markets: Manifold + Kalshi public-data clients, no requests dep
  • tournament: multi-variant leaderboard aggregator

Site repositioned around verifiability. New /methodology audit guide with four runnable recipes. Home page hero rewritten, "Verifiable AI trading. The methodology is open source." Four-pillar moat row added under the hero. /log entries now carry a "🔗 chain" link to the registry row that proves the timestamp; per-date row anchors on the registry page (#row-YYYY-MM-DD) with cyan-flash highlight on arrival. New /tournament page with a real leaderboard view. /livestrategy got a prominent read-only banner explaining the 2026-05-06 real-money pause. /pricing repositioned from "premium tiers waitlist" (vapor product) to "It's free. Here's how to follow it.", three concrete CTAs (newsletter, RSS/API, GitHub).

The shape of the moat: open-source scoring + cryptographic pre-registration + replayable decisions + public losses. Most "AI trading" projects can't do any of these. That's the differentiation.

2026-05-06

Production hardening pass

fix honesty

Ran a five-agent code review against the full sprint diff (13.4k inserted lines). Shipped 12 fixes:

  • Security: closed a static-fallthrough bypass on the admin tweet queue, removed a hardcoded fallback secret on the waitlist endpoint, escaped two XSS surfaces on the knowledge-graph view, and locked the event-source URL renderer to an http(s) allowlist.
  • Reliability: killed two socket.setdefaulttimeout races (RSS + Metaculus) that could silently cap unrelated network calls, and held strong refs to four fire-and-forget asyncio tasks (fill-confirm, exit-confirm, sizeup-confirm, ws-resubscribe) so CPython's GC can't drop them mid-flight.
  • Risk: session-loss circuit-breaker state now persists to the position store. A restart used to silently re-enable entries past the configured cap; latched state now survives across crashes until human intervention.
  • Signal quality: Polymarket flow baseline divisor was using hours-with-activity instead of the full 22-hour lookback, inflating averages 5–10× for bursty markets and silencing the spike detector exactly when it mattered. Fixed.
  • Embed widget: third-party embed.js now stops polling when the host element detaches, throttles when the tab is backgrounded, jitters the 60s tick across embedders, and applies exponential backoff on persistent failures.
  • Infra: upsized the DigitalOcean droplet from 1GB to 2GB. Memory pressure was OOM-killing paper bots ~3×/day. Headroom went from 63Mi free / 338Mi swap to 1.2Gi free / 0 swap. Total downtime ~7 minutes; zero positions or balance lost in the resize.
  • Frontend hygiene: public-page links to /live (paused real bot) rewritten to /dashboard (active paper tournament).

25 medium/low-severity items remain logged for future passes (HSTS/CSP headers, log rotation, RSS endpoint cache, WAL bloat on the widenet sqlite). Tracked but not load-bearing.

2026-05-06

Distribution-readiness pass

ship dashboard

Shipped the surfaces a serious project should have before pushing distribution: a methodology page for skeptics, a press kit, this changelog, an embeddable status widget at /embed.js, sitemap.xml + robots.txt with explicit AI-crawler rules, refreshed OG / Twitter Card meta on every public page, and a refreshed home with the new tagline ("An AI that admits when it's wrong"). Sitemap submitted to Google Search Console.

2026-05-06

Live-money bot paused

cost strategy honesty

Shut down the live-money bot. It was costing ~$190/mo in Anthropic API spend to manage a $6.39 Kalshi balance, generating ~$2.30 P&L over 23 trades, net structural monthly loss. Found and killed a zombie process that had been running outside the service manager since April 18. Paper tournament covers the same strategy at $1,000 paper bankroll. Cleanly reversible.

See /livestrategy for the paused notice and full reasoning.

2026-05-06

Strategy tournament: 3 paper variants competing

ship strategy

Spun up two new paper-trading variants alongside the existing Baseline, each with isolated data dirs and config:

  • Baseline: conf=0.74, $50 bets (status quo)
  • High Conviction: conf=0.78, $75 bets (tighter threshold + larger size)
  • Wide Net: conf=0.70, $30 bets (looser threshold = more data per unit time)

New /api/tournament endpoint and a leaderboard card on the paper-trading dashboard. Whichever variant earns the highest ROI over 30+ days defines the next config. Calibration finds alpha; tournament proves it.

2026-05-06

Front-end refresh, site catches up to the product

ship dashboard

The site had drifted ~3 weeks behind reality. Refreshed the home page with a 5-state live machine, a tournament section pulling live data, an honesty rail showing the 3 most recent losses with their lessons, and a footer subscribe form. Updated about.html with a new "How we measure ourselves" section. Created /methodology , a 9-chapter deep dive for skeptics. New tagline: "An AI that admits when it's wrong."

2026-05-06

Source-attribution ablation infrastructure

ship strategy

New source-ablation analysis, joins signal cycles with closed trades per variant, computes win-rate-with vs win-rate-without for every intel layer with Wilson CI overlap test. Daily cron at 03:15 UTC writes the report. Returns insufficient_data gracefully until n≥30 paired trades, by design. When data crosses threshold, the report tells us which 2–3 of 9 layers are doing real work vs. noise.

2026-05-06

Brier-scored predictions unlocked

fix honesty

Found three blockers on the prediction-validation pipeline: the resolution-sync job crashed on void/refund markets, the daily cron wasn't installed, and the scorer only handled predictions where the bot had picked a side. Patched all three. Now derives implicit side from our_prob_yes on skipped evals so every prediction is Brier-scoreable. 45 resolutions populated on first run; 10 on shadow already scoreable. Daily sync at 03:00 UTC keeps it fresh.

2026-05-05

Real-time Polymarket whale flow, 216 markets streamed

ship dashboard

New WebSocket subscription to the Polymarket CLOB API for every conflict market the bot tracks. Trades ≥$5K append to big_trades.jsonl and feed into the analyzer prompt as "REALTIME LARGE TRADES" context. Live whale-flow widget on the dashboard renders the feed as it lands. Closes the latency gap on the existing 24h whale snapshot.

2026-05-05

Native-language news feeds (Farsi, Hebrew, Russian, Arabic, Chinese)

ship strategy

Added 10 RSS feeds in 5 native languages. IRNA Farsi, Mehr Farsi, Ynet Hebrew, Walla Hebrew, Haaretz Hebrew, Kommersant, TASS Russian, Al Jazeera Arabic, Asharq Arabic, Xinhua Chinese, directly, no translation layer. Claude reads them natively. New native source-type buckets so the diversity gate counts them as genuinely distinct from English mirrors. Native sources often publish hours before the English equivalents.

2026-05-05

Honesty rails on every public surface

ship honesty

Loss cards in the diary, weekly digest, and tweet templates now lead with "Why I was wrong". Claude's actual exit reasoning, plus a structural failure-mode label (high-conviction-miss / thin-signal / late-entry / thesis-invalidated). Weekly digest puts "What I got wrong this week" above "What worked." The point: anyone can claim wins; the moat is owning specific, classifiable losses.

2026-05-05

Strategy tuning, confidence threshold 0.72 → 0.74

strategy

Calibration backtest showed the 0.76–0.80 confidence bucket winning 75% with +42% ROI vs the 0.72–0.76 bucket bleeding −7%. Sample is small (n=13 signal trades) so we bumped to 0.74 rather than 0.76, provisional change, will revisit at n=30+. Documented in the Strategy Decisions Log on the calibration tab.

2026-05-05

Cycle-outcome instrumentation + funnel visualization

ship dashboard

Every entry-decision tick now emits exactly one structured outcome line: Cycle outcome: <category>. Categories cover every early-return path (too_few_new_items, source_diversity_gate, analyzer_timeout, signal_skip, confidence_too_low, cap_reached, tier_gate_75, tier_gate_80, no_candidates_above_min_edge, insufficient_balance, session_loss_cap, traded). New funnel widget on the dashboard shows where each cycle dies. The 16.7% signal→trade conversion rate is now legible.

2026-05-05

Email digest, daily wrap + weekly digest via Resend

ship

Built and shipped: SubscriberStore (SQLite), send_via_resend wrapper, daily/weekly digest sender with idempotency, branded HTML email shell, one-click unsubscribe at /u/{token}, friends-only CLI subscriber management, and cron entries (daily 13:00 UTC, weekly Mon 14:00 UTC). Domain trenchsignals.io verified in Resend with SPF + DKIM.

2026-05-04

Phase B Half 1, public diary + RSS feed + avatar refinement

ship honesty

Created /log. Trench's public chronological diary. Every closed trade and every directional signal in one append-only feed, deep-link anchored, mobile-friendly. Day-grouped, search/filter (keyword + kind pills), subscribe form. RSS feed at /feed.xml. Updated avatar SVG with eye-character iris.

2026-05-04

Phase A, rebrand to "Trench" the character

ship

Repositioned the entire site around Trench as a protagonist (the AI bot reading the world and placing bets) inside TrenchSignals (the platform). Rewrote home, about, and OG image. Hero now reads "Meet Trench" with a clear character voice across every public surface.

That's everything notable. Older changes existed in the git history before this changelog was started. Follow the RSS feed or subscribe to get future entries by email.