Trench Eval Dataset schemav0.1.0
Field-by-field reference for the forecasting-reasoning records published
at /dataset. JSONL format, one record per resolved
prediction. Canonical machine-readable schema:
JSON Schema 2020-12.
CSV mirror: /api/dataset/sample.csv.
Top-level shape #
Five required objects per record plus a nullable anchor block. The
reasoning trace lives inside prediction; the
cryptographic-anchor object is the only field that may be
null for records made before
2026-05-07 (when the replay-bundle pipeline launched).
{
"sample_id": "trench-eval-<made_at>-<sha8>",
"schema_version": "0.1.0",
"question": { // the market that was traded },
"prediction": { // what the model said + why },
"context": { // market price + intel snapshot },
"resolution": { // realized outcome + Brier },
"anchor": { // hash-anchor proof — nullable, see below }
}
Machine-readable JSON Schema #
Every field documented below is also published as a strict
JSON Schema 2020-12 at
GET /api/dataset/schema.json. The schema and
this HTML reference are generated from the same Python
FIELDS_SPEC data structure — they cannot drift.
# Fetch the schema curl -s https://trenchsignals.io/api/dataset/schema.json | jq . # Or load it directly in Python import json, urllib.request schema = json.loads(urllib.request.urlopen( "https://trenchsignals.io/api/dataset/schema.json").read()) print(schema["title"], schema["$id"])
Stability key #
Each field carries a stability marker so downstream eval harnesses can plan around future changes:
stable Will not change semantics or types within v0.x. Field rename or removal requires a major-version bump.
experimental May gain values (new enum entries, backfilled tags) within v0.x. Semantics stable; coverage may improve.
nullable-pre-anchor
Present only on records made on/after 2026-05-07. Earlier records have
null for this field (and the whole
anchor block).
Field reference #
Live examples below are computed from the current public sample
(/api/dataset/fields) — they refresh nightly
with the dataset itself. Click a group header to expand.
Verifying a record #
- Fetch the replay bundle.
curl https://trenchsignals.io/api/replay-bundle/{replay_bundle_id}Response contains the bundle + a computed sha256. If the bundle ID predates 2026-05-07, it does not exist — record's anchor will benulland steps 2–4 are skipped. - Recompute the hash locally.
sha256sum <canonical_json(bundle)>should equalanchor.bundle_sha256.Canonical = JSON with sorted keys, no whitespace, ASCII-safe escaping. trench-core on PyPI ships a reference implementation; pure Pythonjson.dumps(obj, sort_keys=True, separators=(",",":"), ensure_ascii=True)works. - Confirm the hash is in the registry chain.
curl {anchor.registry_url}and grep for the hash.The registry is a daily append-only chain; each day's file references the previous day's hash. Tampering with one record forces every subsequent day-file to be rewritten. - Confirm the registry was captured before resolution.
curl {anchor.wayback_url}and read the Internet Archive timestamp.IA captures the registry day-file shortly after publication. The capture timestamp is independent third-party evidence and must be strictly less thanresolution.resolved_at— that is what proves the model did not see the answer.
Why anchor is nullable in v0.1.0 #
The replay-bundle capture pipeline went live 2026-05-07. Records made before that date still carry the full reasoning + market + outcome triple — but the cryptographic-anchor block is absent because no bundle was captured at prediction time.
Current live coverage: computing…. From v0.2 onwards every record is anchored; v0.2 will also retroactively add anchors for any records made between 2026-05-07 and v0.1.0 release whose replay bundles still exist.
Validating records #
Three idiomatic ways to validate a record (or a whole JSONL file) against the published schema. All three pin the schema by URL so validation stays current with future releases.
import json, urllib.request from jsonschema import Draft202012Validator schema = json.loads(urllib.request.urlopen( "https://trenchsignals.io/api/dataset/schema.json").read()) Validator = Draft202012Validator(schema) with urllib.request.urlopen( "https://trenchsignals.io/dataset-files/sample/records.jsonl") as f: for i, line in enumerate(f): record = json.loads(line) errors = list(Validator.iter_errors(record)) if errors: print(f"record {i} failed:", errors[0].message) else: print(f"record {i} ok")
# pydantic v2 — define the model from the published JSON Schema. # For static type-checking you'd hand-write a TrenchRecord(BaseModel); # this snippet validates at runtime against the live schema instead. import json, urllib.request from pydantic import create_model, TypeAdapter def load_schema(): return json.loads(urllib.request.urlopen( "https://trenchsignals.io/api/dataset/schema.json").read()) # Easiest path: TypeAdapter + a hand-written model. The model below covers # the high-value fields; extend as needed. from pydantic import BaseModel from typing import Optional, Literal class Prediction(BaseModel): made_at: str model: str probability_yes: float confidence: float direction: Literal["ESCALATE", "DEESCALATE", "YES", "NO", "SKIP"] reasoning: str class Resolution(BaseModel): resolved_at: str outcome: Literal["YES", "NO"] brier_score: float class TrenchRecord(BaseModel): sample_id: str schema_version: str prediction: Prediction resolution: Resolution with urllib.request.urlopen( "https://trenchsignals.io/dataset-files/sample/records.jsonl") as f: for line in f: rec = TrenchRecord.model_validate_json(line) print(rec.sample_id, "brier=", rec.resolution.brier_score)
// Node.js — Ajv is the canonical JSON Schema 2020-12 validator. import Ajv2020 from "ajv/dist/2020.js"; import addFormats from "ajv-formats"; const schema = await (await fetch( "https://trenchsignals.io/api/dataset/schema.json")).json(); const ajv = new Ajv2020({allErrors: true, strict: false}); addFormats(ajv); const validate = ajv.compile(schema); const jsonl = await (await fetch( "https://trenchsignals.io/dataset-files/sample/records.jsonl")).text(); for (const line of jsonl.split("\n").filter(Boolean)) { const rec = JSON.parse(line); if (!validate(rec)) console.error("failed:", validate.errors); }
# pipx install check-jsonschema — bulk-validate every record in one command. curl -s https://trenchsignals.io/api/dataset/schema.json -o /tmp/trench-schema.json curl -s https://trenchsignals.io/dataset-files/sample/records.jsonl -o /tmp/records.jsonl # check-jsonschema expects a JSON array; convert JSONL on the fly: jq -s . /tmp/records.jsonl > /tmp/records.json check-jsonschema \ --schemafile /tmp/trench-schema.json \ /tmp/records.json
Loading the dataset #
The storefront page at /dataset ships Pure-Python,
pandas, and HuggingFace datasets snippets for
loading. The snippet below shows the schema-page-specific path: generate
type-safe loader code from the JSON Schema so your eval pipeline gets
compile-time field guarantees.
# datamodel-code-generator: schema → pydantic models in one shot. pip install datamodel-code-generator datamodel-codegen \ --url https://trenchsignals.io/api/dataset/schema.json \ --input-file-type jsonschema \ --output-model-type pydantic_v2.BaseModel \ --output trench_eval_models.py # Now use the generated classes in your code: from trench_eval_models import TrenchEvalDatasetRecord record = TrenchEvalDatasetRecord.model_validate_json(json_line) print(record.prediction.probability_yes, record.resolution.outcome)
Versioning policy #
Semver. major.minor.patch:
- Patch — bug fixes, doc clarifications, retroactive backfills (e.g. adding anchors to existing records). No code changes required.
- Minor — new fields (additive only), new enum values, new exchanges/theaters. Existing loaders keep working; new fields appear as optional.
- Major — field rename, field removal, type change, enum value removal. Announced at least one release in advance with a deprecation marker.
The schema_version field on every record
pins the spec it conforms to. Validators should check this before
validating the record body.
Changelog #
v0.1.0 2026-05-17 · current
Initial public release. 11 resolved predictions in the public sample (Iran theater), full reasoning traces, market price + outcome pairs, daily nightly regeneration.
- Schema: 5 required top-level objects + nullable anchor block
- JSON Schema 2020-12 published at
/api/dataset/schema.json - CSV mirror at
/api/dataset/sample.csv - Replay-bundle pipeline live for records on/after 2026-05-07
v0.2.0 pending · expected 2026-06
Multi-theater + multi-model.
- Russia/Ukraine, Taiwan/China, Lebanon, broader Middle East coverage
- Multi-model reasoning traces (Claude + GPT-4o + Gemini in parallel via the shadow ensemble)
- Theater backfill — retroactive tagging for v0.1.0 records that currently show
theater: null - Anchor backfill — populate
anchor.*for any records made 2026-05-07+ whose replay bundles still exist on disk
v0.3.0 planned · Q3 2026
Numeric + categorical questions.
- New
question.typeenum values:numeric,categorical - New optional
prediction.distributionfield for non-binary outputs - Africa & cyber theater coverage (already wired in the live bot via 6 new RSS feeds)
License #
The public sample is CC-BY-4.0. Attribute to "TrenchSignals / Trench Eval Dataset v0.1.0". The full dataset (all resolved predictions to date + continuous updates + replay-bundle URLs) is a commercial license — dataset@trenchsignals.io.
Live record browser
Browse all sample records (with full reasoning traces) on the storefront page: trenchsignals.io/dataset.