Trench Eval Dataset schemav0.1.0

Field-by-field reference for the forecasting-reasoning records published at /dataset. JSONL format, one record per resolved prediction. Canonical machine-readable schema: JSON Schema 2020-12. CSV mirror: /api/dataset/sample.csv.

records in public sample hash-anchored 33 documented fields

Top-level shape #

Five required objects per record plus a nullable anchor block. The reasoning trace lives inside prediction; the cryptographic-anchor object is the only field that may be null for records made before 2026-05-07 (when the replay-bundle pipeline launched).

{
  "sample_id":       "trench-eval-<made_at>-<sha8>",
  "schema_version":  "0.1.0",
  "question":        { // the market that was traded },
  "prediction":      { // what the model said + why },
  "context":         { // market price + intel snapshot },
  "resolution":      { // realized outcome + Brier },
  "anchor":          { // hash-anchor proof — nullable, see below }
}

Machine-readable JSON Schema #

Every field documented below is also published as a strict JSON Schema 2020-12 at GET /api/dataset/schema.json. The schema and this HTML reference are generated from the same Python FIELDS_SPEC data structure — they cannot drift.

# Fetch the schema
curl -s https://trenchsignals.io/api/dataset/schema.json | jq .

# Or load it directly in Python
import json, urllib.request
schema = json.loads(urllib.request.urlopen(
    "https://trenchsignals.io/api/dataset/schema.json").read())
print(schema["title"], schema["$id"])

Stability key #

Each field carries a stability marker so downstream eval harnesses can plan around future changes:

stable Will not change semantics or types within v0.x. Field rename or removal requires a major-version bump.

experimental May gain values (new enum entries, backfilled tags) within v0.x. Semantics stable; coverage may improve.

nullable-pre-anchor Present only on records made on/after 2026-05-07. Earlier records have null for this field (and the whole anchor block).

Field reference #

Live examples below are computed from the current public sample (/api/dataset/fields) — they refresh nightly with the dataset itself. Click a group header to expand.

top-level 2 fields · required on every record
question.* 7 fields · the market that was traded
prediction.* 7 fields · what the model said + why
context.* 8 fields · market price + intel snapshot
resolution.* 5 fields · realized outcome + Brier
anchor.* (whole object nullable) 4 fields · hash-anchor proof

Verifying a record #

  1. Fetch the replay bundle. curl https://trenchsignals.io/api/replay-bundle/{replay_bundle_id}
    Response contains the bundle + a computed sha256. If the bundle ID predates 2026-05-07, it does not exist — record's anchor will be null and steps 2–4 are skipped.
  2. Recompute the hash locally. sha256sum <canonical_json(bundle)> should equal anchor.bundle_sha256.
    Canonical = JSON with sorted keys, no whitespace, ASCII-safe escaping. trench-core on PyPI ships a reference implementation; pure Python json.dumps(obj, sort_keys=True, separators=(",",":"), ensure_ascii=True) works.
  3. Confirm the hash is in the registry chain. curl {anchor.registry_url} and grep for the hash.
    The registry is a daily append-only chain; each day's file references the previous day's hash. Tampering with one record forces every subsequent day-file to be rewritten.
  4. Confirm the registry was captured before resolution. curl {anchor.wayback_url} and read the Internet Archive timestamp.
    IA captures the registry day-file shortly after publication. The capture timestamp is independent third-party evidence and must be strictly less than resolution.resolved_at — that is what proves the model did not see the answer.

Why anchor is nullable in v0.1.0 #

The replay-bundle capture pipeline went live 2026-05-07. Records made before that date still carry the full reasoning + market + outcome triple — but the cryptographic-anchor block is absent because no bundle was captured at prediction time.

Current live coverage: computing…. From v0.2 onwards every record is anchored; v0.2 will also retroactively add anchors for any records made between 2026-05-07 and v0.1.0 release whose replay bundles still exist.

Validating records #

Three idiomatic ways to validate a record (or a whole JSONL file) against the published schema. All three pin the schema by URL so validation stays current with future releases.

import json, urllib.request
from jsonschema import Draft202012Validator

schema = json.loads(urllib.request.urlopen(
    "https://trenchsignals.io/api/dataset/schema.json").read())
Validator = Draft202012Validator(schema)

with urllib.request.urlopen(
    "https://trenchsignals.io/dataset-files/sample/records.jsonl") as f:
    for i, line in enumerate(f):
        record = json.loads(line)
        errors = list(Validator.iter_errors(record))
        if errors:
            print(f"record {i} failed:", errors[0].message)
        else:
            print(f"record {i} ok")
# pydantic v2 — define the model from the published JSON Schema.
# For static type-checking you'd hand-write a TrenchRecord(BaseModel);
# this snippet validates at runtime against the live schema instead.
import json, urllib.request
from pydantic import create_model, TypeAdapter

def load_schema():
    return json.loads(urllib.request.urlopen(
        "https://trenchsignals.io/api/dataset/schema.json").read())

# Easiest path: TypeAdapter + a hand-written model. The model below covers
# the high-value fields; extend as needed.
from pydantic import BaseModel
from typing import Optional, Literal

class Prediction(BaseModel):
    made_at: str
    model: str
    probability_yes: float
    confidence: float
    direction: Literal["ESCALATE", "DEESCALATE", "YES", "NO", "SKIP"]
    reasoning: str

class Resolution(BaseModel):
    resolved_at: str
    outcome: Literal["YES", "NO"]
    brier_score: float

class TrenchRecord(BaseModel):
    sample_id: str
    schema_version: str
    prediction: Prediction
    resolution: Resolution

with urllib.request.urlopen(
    "https://trenchsignals.io/dataset-files/sample/records.jsonl") as f:
    for line in f:
        rec = TrenchRecord.model_validate_json(line)
        print(rec.sample_id, "brier=", rec.resolution.brier_score)
// Node.js — Ajv is the canonical JSON Schema 2020-12 validator.
import Ajv2020 from "ajv/dist/2020.js";
import addFormats from "ajv-formats";

const schema = await (await fetch(
    "https://trenchsignals.io/api/dataset/schema.json")).json();

const ajv = new Ajv2020({allErrors: true, strict: false});
addFormats(ajv);
const validate = ajv.compile(schema);

const jsonl = await (await fetch(
    "https://trenchsignals.io/dataset-files/sample/records.jsonl")).text();
for (const line of jsonl.split("\n").filter(Boolean)) {
    const rec = JSON.parse(line);
    if (!validate(rec)) console.error("failed:", validate.errors);
}
# pipx install check-jsonschema — bulk-validate every record in one command.
curl -s https://trenchsignals.io/api/dataset/schema.json -o /tmp/trench-schema.json
curl -s https://trenchsignals.io/dataset-files/sample/records.jsonl -o /tmp/records.jsonl

# check-jsonschema expects a JSON array; convert JSONL on the fly:
jq -s . /tmp/records.jsonl > /tmp/records.json

check-jsonschema \
    --schemafile /tmp/trench-schema.json \
    /tmp/records.json

Loading the dataset #

The storefront page at /dataset ships Pure-Python, pandas, and HuggingFace datasets snippets for loading. The snippet below shows the schema-page-specific path: generate type-safe loader code from the JSON Schema so your eval pipeline gets compile-time field guarantees.

# datamodel-code-generator: schema → pydantic models in one shot.
pip install datamodel-code-generator

datamodel-codegen \
    --url https://trenchsignals.io/api/dataset/schema.json \
    --input-file-type jsonschema \
    --output-model-type pydantic_v2.BaseModel \
    --output trench_eval_models.py

# Now use the generated classes in your code:
from trench_eval_models import TrenchEvalDatasetRecord
record = TrenchEvalDatasetRecord.model_validate_json(json_line)
print(record.prediction.probability_yes, record.resolution.outcome)

Versioning policy #

Semver. major.minor.patch:

The schema_version field on every record pins the spec it conforms to. Validators should check this before validating the record body.

Changelog #

v0.1.0 2026-05-17 · current

Initial public release. 11 resolved predictions in the public sample (Iran theater), full reasoning traces, market price + outcome pairs, daily nightly regeneration.

  • Schema: 5 required top-level objects + nullable anchor block
  • JSON Schema 2020-12 published at /api/dataset/schema.json
  • CSV mirror at /api/dataset/sample.csv
  • Replay-bundle pipeline live for records on/after 2026-05-07

v0.2.0 pending · expected 2026-06

Multi-theater + multi-model.

  • Russia/Ukraine, Taiwan/China, Lebanon, broader Middle East coverage
  • Multi-model reasoning traces (Claude + GPT-4o + Gemini in parallel via the shadow ensemble)
  • Theater backfill — retroactive tagging for v0.1.0 records that currently show theater: null
  • Anchor backfill — populate anchor.* for any records made 2026-05-07+ whose replay bundles still exist on disk

v0.3.0 planned · Q3 2026

Numeric + categorical questions.

  • New question.type enum values: numeric, categorical
  • New optional prediction.distribution field for non-binary outputs
  • Africa & cyber theater coverage (already wired in the live bot via 6 new RSS feeds)

License #

The public sample is CC-BY-4.0. Attribute to "TrenchSignals / Trench Eval Dataset v0.1.0". The full dataset (all resolved predictions to date + continuous updates + replay-bundle URLs) is a commercial license — dataset@trenchsignals.io.

Live record browser

Browse all sample records (with full reasoning traces) on the storefront page: trenchsignals.io/dataset.