Does the engine know anything the market doesn’t?
Only a handful of our markets have resolved, so Brier-vs-market skill is not yet measurable — saying otherwise would be marketing. But there is an honest interim test: every ~20 minutes the engine logs its probability next to the live market price. When the engine disagrees with the market by 10¢ or more, does the market subsequently move toward the engine? The methodology was frozen before the first run, the result publishes either way, and the study re-runs weekly as data accumulates.
Primary result — 72-hour horizon
Mean signed market move after a ≥10¢ divergence onset, in the direction of the engine. Positive = the market moved toward the engine’s number. The 95% CI is a ticker-clustered bootstrap; the permutation p flips episode directions per market.
| horizon | episodes | markets | mean move | 95% CI | perm-p | share > 0 |
|---|
Pre-specified secondary cuts
Listed before the first run; reported uncorrected; never promoted to a claim. With ~11 cuts, expect one nominal p<0.05 by chance.
| cut | n | mean move | 95% CI | perm-p |
|---|
Spec v2 — the literature-grade upgrade (frozen 2026-06-10, runs alongside v1)
After a methodology review of the event-study literature, v2 adds matched-control abnormal convergence (subtracting the drift of comparable non-episode snapshots — the standard control for deadline grind and favorite-longshot effects), a persistence rule at onset, a 14-day refractory, and a mechanical strawman: the same pipeline run with a Rothschild-debiased lagged market price as the “forecaster.” A claim requires beating both zero and the strawman. v2 carries a hard power gate: under 150 episodes it reports descriptive only, and its thresholds cannot be loosened post hoc.
Technique backtest — would the literature’s fixes help? (resolution-free)
Candidate forecast transforms scored against the debiased market price 7 days later (a proxy scoring rule from the forecasting literature, usable while resolutions are scarce). Caveat by construction: market-anchored candidates are mechanically favored by a market-derived proxy — those rows are shown for transparency, not as findings. The clean test is hazard-decay vs holding a stale forecast (both equally market-blind); it activates automatically once deadline-stamped rows age past 7 days (~2026-06-14).
| candidate | mean proxy-Brier (lower = closer to the future market) |
|---|
How to read this honestly
- A null is informative. The primary CI bounds any true 72h convergence effect; if it brackets zero, the engine’s short-horizon divergences are noise at current power and the engine’s market-anchoring (shrinking toward the price) is the right stance.
- A positive must clear costs. The decision rule requires the mean move to exceed
the round-trip cost at onset (Kalshi fee
2·0.07·p(1−p), Polymarket ~2¢ spread) — statistical significance alone is not an edge. - Market movement is a proxy, not truth. Convergence can mean the market caught up to the engine — or that both follow the same news at different speeds. Resolution-based grading remains the real scoreboard; it just needs markets to resolve.
- Known limitation: the 6h horizon is unmeasurable under the frozen v1 coverage rule (an episode needs a price observation at least 6h after onset within the 6h window) — disclosed rather than patched mid-study.
Frozen methodology (v1)
- Onset: first observation where |engine − market| ≥ 10¢ (previous observation < 10¢), market price within [5¢, 95¢], ≥3 days to resolution where known, one onset per market per 7 days.
- Outcome: last market price within the horizon window, signed by divergence direction; an episode needs ≥ max(6h, half the horizon) of coverage to count at that horizon.
- Primary metric: mean signed move at 72h, 10,000-draw ticker bootstrap CI, 1,000-draw per-ticker sign-permutation p, fixed seeds.
- Decision rule (pre-declared): “convergence alpha” only if the CI excludes zero and the mean clears the cost hurdle.
- Data: the same append-only evaluation log behind the calibration page; the raw artifact this page renders is public JSON.