rendered 2026-07-03T12:35:48Z

MacroGuru — Scenario Lifecycle & Quality SOP

This is the canonical operating document for adding anything to the MacroGuru library — a what‑if scenario, a tradable market/asset, or a Week‑Ahead catalyst playbook. It is written so that the author (Vikas), Claude, and anyone who takes over the product can follow it without guessing. Every actionable step is a checkbox. If a step is not checked, the item is not done — do not assume, do not skip.

Last updated: 2026‑06‑28 · Owner: Vikas · Status: LIVE / authoritative. Keep this file in sync with the code — see §26 Maintaining this document.

0. How to use this document

Adding ONE scenario by hand? → work top‑to‑bottom through §5–§13 and tick every box.
Generating scenarios in bulk (the road to 100k)? → §16 Scaling to 100,000 then the per‑scenario gates still apply to every record.
Adding a tradable asset/market? → §14 Sub‑process A.
Adding a Week‑Ahead catalyst? → §15 Sub‑process B.
Refreshing the weekly "Solid chance of happening" landing predictions? → §15A Sub‑process C.
Just need the one‑page run sheet? → §24 Master checklist.
Hit a weird build/deploy failure? → §22 Failure modes & gotchas.

Conventions in this doc: ✅ = built/required · 🔶 = partially built / manual today · ⛔ = hard stop (do not publish if this fails) · code = exact file/command. All Python runs as PYTHONPATH=. .venv/bin/python … from the repo root /Users/bloquelabs/learn/MacroGuruStrategy.

1. North Star & non‑negotiable invariants

The mission. Capture 100,000 scenarios from all over the world and make each one do exactly one job: help a person peek into the future, manage their emotions, and make a better decision. Every scenario is a small, falsifiable, emotionally‑intelligent rehearsal of a possible future — not a prediction, not hype, not filler.

Three things make a MacroGuru scenario different from a blog post or a tweet, and they are invariants — they may never be broken to hit a number:

⛔ Measured, not asserted. Every cross‑asset impact is an abnormal return (market‑beta stripped: AR = r − (α + β·r_SPX)) measured over real analogous historical events. No number in the product is unfalsifiable. (Source of truth: docs/HISTORICAL_ENGINE.md.)
⛔ Calibrated, never clairvoyant. Every probability is a base‑rate‑anchored prior, continuously scored against what actually happens (Brier‑style reliability), and labelled as a prior — never as a forecast.
⛔ No tradable left behind. Every asset listed on Hyperliquid must be covered: it appears in the scenarios that move it, it has price (or proxy) data, and it has measured evidence. The coverage gate (coverage_report.json) fails loudly on a gap. (Source: docs/SELF_IMPROVING_ENGINE.md.)
⛔ Both sides of every frontier. The future is not only crises. Where the library covers a force that re‑shapes our species — AI, medicine, science, energy, space, biology, nature — it must hold both the good outcome (the breakthrough, RISK‑ON) and its probable‑negative tail (the bust/shock, RISK‑OFF). Never ship the fear without the hope, or the hope without the tail. (Framework: docs/FRONTIER_SCENARIO_SPEC.md; operational steps in §16.6.)

The emotional contract (the reason we exist). A scenario that scares without guiding is a failure. Every published scenario must answer "so what do I do, and how do I not panic?" — see §19 Emotional‑integrity standard.

2. What a scenario IS — the data model

A scenario lives in three places, in this order of authority:

Layer	File	What it holds	Who writes it
Source record	`config/scenarios.yaml`	the human‑authored truth	you / `merge_ccar_scenarios.py`
Roots (drivers)	`config/scenario_roots_ccar.yaml`, `config/scenario_roots_batch3.yaml`	`id → {probability, timeline, roots:{factor:signed}}`	generator / by hand
Overrides (applied LAST)	`config/scenario_overrides.yaml`	`id → {probability, [roots]}` recalibrations	`bin/apply_probs.py` (machine‑managed)

2.1 The `scenarios.yaml` record (exact fields)

- id: 1                       # int, unique, never reused (deactivate, never delete — invariant)
  category: geo_escalation    # MUST exist in bin/news_taxonomy.py SECTORS map (see §4)
  status: candidate           # candidate = awaiting review · keep = published · drop = excluded
  probability: 0.06           # float 0..1 (curated prior; overrides win at build time)
  timeline: 0–6 months        # EN‑DASH "–", one of the four allowed (§4)
  title: Hormuz closure       # 2–6 words, concrete, headline‑grade (no "markets fall")
  scenario: >                 # ONE precise sentence: shock + trigger + a real magnitude/episode
    Iran closes the Strait of Hormuz after a naval clash with US forces, choking ~20% of seaborne oil.

Countries are NOT hand‑set here. build_scenarios.py auto‑tags countries from the scenario text (and the generation hint). A bulk‑gen record (§16) may carry a countries: [...] hint, but the tagger is the source of truth.

2.2 The compiled, public objects (generated — never hand‑edit)

Output	Path	Built by
Cascade + ranked markets + guidance + countries	`web/public/data/scenarios/<id>.json` (+ `scenarios_index.json`)	`build_scenarios.py`
Measured historical evidence (event study)	`web/public/data/history/<id>.json` (+ `history_meta.json`)	`build_history.py`
The public page	`web/public/scenario/<slug>.html`	`build_seo_pages.py`
Probability path over time	`web/public/data/prob_history/<id>.json`	`build_prob_history.py`
Narrative / prob‑note / deviation / headline	`web/public/data/narratives.json`	the intelligence passes (§9)

3. The lifecycle at a glance

 (1) CONCEIVE ─► (2) SPECIFY ─► (3) ROOTS+CASCADE ─► (4) MEASURE HISTORY ─►
  source a real     write the      build_scenarios      build_history
  concern/episode   yaml record    (drivers→cascade)    (event study, evidence)
       │                                                        │
       ▼                                                        ▼
 (5) ENRICH ─► (6) QUALITY GATE ─► (7) PUBLISH ─► (8) VERIFY PROD ─► (9) MONITOR/CALIBRATE
  prob · dev      the finalize       build_seo_pages   clean‑URL +     coverage + Brier +
  · headline      POST‑CHECK ⛔      + git push        SHA checks      retire/update loop
  · guidance                                                              │
       └──────────────────────────── feeds the next conception ◄─────────┘

Each stage has a checklist. A scenario is DONE only when every ⛔ box in stages 6–8 is ticked.

4. Controlled vocabularies (copy‑paste reference)

⛔ Any value outside these sets will silently mis‑map (wrong sector), get skipped at build, or break the cascade. The generator (merge_ccar_scenarios.py) validates against these — honour them by hand too.

Timelines (use the EN‑DASH –, not a hyphen -): 0–6 months · 6–18 months · 1–3 years · 3–10 years

Categories (the CCAR‑expansion set — preferred for new macro/stress scenarios; 23): asia_macro, europe_macro, emerging_macro, central_banks_fx, sovereign_fiscal, oil_gas, power_energy, metals_mining, agri_food_water, crypto_markets, crypto_infra, ai_compute, tech_cyber, automation_labor, greatpower_geo, regional_conflict, trade_sanctions, financial_plumbing, realestate_demographics, health_biotech, society_politics, climate_frontier, markets_corporate

Legacy scenarios use other category strings (e.g. geo_escalation). The real gate is: the category MUST resolve to a sector in bin/news_taxonomy.py (SECTORS / sector_for()), else the page won't map to a section front. When in doubt, use one of the 23 above. Pre‑check: §6.

Root‑shock factors (use ONLY these — 50): risk_appetite, credit_spreads, geopolitical_risk, financial_conditions, crypto_confidence, oil_supply_risk, industrial_demand, ai_capex, recession_signal, climate_supply, trade_tension, crypto_liquidity, china_growth, inflation_surprise, automation_displacement, dollar_confidence, fed_hawkishness, growth_surprise, food_inflation, european_energy, semiconductor_risk, EM_FX, defense_spend, carry_appetite, robot_productivity, VIX, BTC, real_yields, pandemic, XCU, fertilizer, NG, labor_shortage, inflation_expectations, consumer_discretionary, oil_demand, diesel, XAU, global_growth, mortgage_rates, china_stimulus, DXY, gasoline, XAG, WHEAT, risk_parity_delever, jet_fuel, CORN, curve_slope, labor_surplus

Sign convention (positive = the factor intensifies): risk_appetite − = risk‑off · recession_signal + = recession rising · credit_spreads + = wider · financial_conditions + = tighter · dollar_confidence − = USD doubts · china_growth − = China slowing · real_yields + = yields up · EM_FX − = EM currencies fall · oil_supply_risk + = supply threatened · inflation_surprise + = hotter · crypto_confidence/crypto_liquidity − = crypto stress · risk_parity_delever + = forced deleveraging.

Root recipes (start here, adapt magnitude to severity) — full list in docs/CCAR_SCENARIO_GENERATION_SPEC.md. Examples: - Recession / equity crash: {recession_signal:0.6, risk_appetite:-0.6, credit_spreads:0.5, financial_conditions:0.5} - Oil supply shock: {oil_supply_risk:0.8, inflation_surprise:0.5, geopolitical_risk:0.6} - Crypto / stablecoin run: {crypto_confidence:-0.8, crypto_liquidity:-0.7, BTC:-0.6} - EM sudden stop: {EM_FX:-0.8, credit_spreads:0.6, risk_appetite:-0.5, DXY:0.4}

5. Stage 1 — CONCEIVE

Goal: a scenario worth a person's attention — anchored to a real concern, not invented. Grounded in scenario‑planning practice (plausible, decision‑relevant, driver‑based) and intelligence analytic standards (ICD 203: sourced, assumptions explicit) — see §17 & §26 refs.

PRE‑CHECK — idea quality (tick all before writing the record): - [ ] It traces to a real anchor: a regulator/stress‑test concern (Fed CCAR/DFAST, BoE, EBA/ECB/ESRB, BoJ, APRA, OSFI, FINMA, PBoC, RBI, MAS, HKMA, IMF FSAP/GFSR, FSB, BIS, NGFS…) or a documented historical episode. (No anchor → not a scenario yet.) - [ ] It names a specific channel and magnitude/episode — not "markets fall." (e.g. "US office CRE crash," "Hormuz closure choking ~20% of seaborne oil"). - [ ] It is decision‑relevant: a reader could plausibly act on it (hedge, trim, wait, allocate). - [ ] It is distinct — not a near‑duplicate of an existing scenario. (Search the library / Scenario Lab first; the bulk merger dedups by normalised title, but check by hand for singletons.) - [ ] It is falsifiable: you can imagine the measured event study that would confirm or deny its cascade. - [ ] It maps to ≥1 HL‑coverage need OR adds genuine world coverage (a country/theme we're thin on). - [ ] Its driver(s) exist in the 50‑factor root vocabulary (§4) — if the true driver isn't expressible, the scenario may be out of the engine's competence (flag it, don't force a wrong root). - [ ] Assumptions are explicit (ICD 203): you can state the 1–2 load‑bearing assumptions in one line.

6. Stage 2 — SPECIFY (the record)

Goal: a clean scenarios.yaml record + its roots.

CHECKLIST: - [ ] Pick the next free id = max(existing)+1. Never reuse or renumber an id (renumbering nukes live URLs/slugs). - [ ] category ∈ allowed set and resolves in news_taxonomy (§4). - [ ] status: candidate (promote to keep only after the Quality Gate, §10). - [ ] timeline uses the en‑dash and is one of the four allowed values. - [ ] title = 2–6 words, concrete, reads as a headline. - [ ] scenario = one precise sentence: shock + trigger + magnitude/episode. - [ ] Add the roots entry (id → {probability, timeline, roots:{…}}) in a roots file: 2–5 factors, each signed per the convention (§4), magnitudes scaled to severity (use the recipes). - [ ] Probability is sober: severe‑tail 0.02–0.10; plausible‑cyclical 0.15–0.45; developing trend up to 0.85. (Calibration detail: §18.) - [ ] If bulk‑generated, the record validates against merge_ccar_scenarios.py (vocab + dedup). (See §16.)

⛔ A scenario with no roots is skipped at publish (no cascade, no page). Roots are mandatory.

7. Stage 3 — ROOTS & CASCADE (`build_scenarios.py`)

Goal: turn drivers into the full butterfly effect. build_scenarios.py maps the scenario → signed root shocks → runs the deep causal‑propagation engine (macroguru/cascade/propagation.py) → ranks impacted markets by decision‑relevance (|move| × confidence) → gates each to Hyperliquid (tradable ⇒ trade button) → builds the left‑to‑right mindmap (trigger → 1st → 2nd order) → writes the long/short/cash guidance.

PYTHONPATH=. .venv/bin/python bin/build_scenarios.py
# → web/public/data/scenarios/<id>.json  (+ scenarios_index.json)  +  enriched config/scenarios.yaml

CHECKLIST: - [ ] Build runs clean (no traceback); scenarios/<id>.json exists for the new id. - [ ] The cascade has immediate (1‑hop) and ripple (2+‑hop) nodes — not a single flat list. - [ ] Materiality floor did its job: no spurious tiny nodes cluttering the mindmap (the floor prunes them — confirm the kept nodes are economically sensible). - [ ] Impacted markets are ranked sensibly (the dominant market matches the thesis). - [ ] HL gating is right: tradable assets show a trade button; non‑tradable are shown without one. - [ ] Countries auto‑tagged correctly (spot‑check the countries in the compiled JSON vs the text). - [ ] Guidance (long/short/cash) generated and directionally coherent with the cascade (longs are the up‑moves, shorts the down‑moves, cash if net risk‑off).

8. Stage 4 — MEASURE HISTORY (`build_history.py`)

Goal: prove the cascade against reality. This is what separates us from opinion. The event study measures abnormal returns for every (scenario × asset) over the analogous historical events matched from config/historical_events.yaml, and records hit‑rate, sample size n, a confidence score, and whether the measured sign agrees with the projected cascade.

PYTHONPATH=. .venv/bin/python bin/build_history.py
# → web/public/data/history/<id>.json  (+ history_meta.json)

CHECKLIST: - [ ] history/<id>.json exists and contains events_used (≥1 analogue). ⛔ No events_used ⇒ the page will be noindex (thin) — see §10. - [ ] The analogues are genuinely analogous (right tags/theme), not coincidental keyword hits. - [ ] Each analogue has a real, working source_url (ICD 203 sourcing; YMYL trust). - [ ] The per‑asset table shows abnormal returns at 20d and 5d, hit‑rate, n, confidence. - [ ] Where the measured sign disagrees with the cascade, that's expected sometimes — it must be explained by a deviation thesis (§9), not hidden. - [ ] If analogues are too thin/stale (n tiny, all pre‑2010): either widen the event library (config/historical_events.yaml via seeds or the adversarial sourcing workflow → merge_history.py) or mark the scenario low‑conviction. (See §16.4 the evidence bottleneck.)

8.1 Adding the historical evidence (event library)

A new event in config/historical_events.yaml:

- id: b2-trade_sanctions-436        # stable unique id
  date: '1929-05-28'                # ISO; date_confidence captures fuzziness
  name: Smoot-Hawley clears the US House (protectionism signal)
  category: United States
  tags: [risk_off, trade_war, geopolitical]   # tags drive analogue matching
  summary: <one neutral sentence>
  source_url: https://…             # ⛔ must resolve
  source_name: en.wikipedia.org
  source: workflow                  # seed | workflow (adversarially verified)
  date_confidence: 0.8
  verified: true
  region: United States
  theme: trade_sanctions

[ ] New events are tagged so the matcher can find them for the right scenarios.
[ ] Events go through bin/merge_history.py (de‑dup/normalise) before build_history.py.

9. Stage 5 — ENRICH (the intelligence passes)

Goal: a calibrated probability, an honest "why we may diverge," a human headline, and actionable guidance. These are the token‑costing monthly tier (bin/REFRESH_RUNBOOK.md). Run them for the new ids (or library‑wide).

Pass	Command chain	Writes	Standard it serves
Probability (re)calibration	`prep_probs.py` → 40‑agent Workflow → `apply_probs.py`	`scenario_overrides.yaml` + `narratives.json::prob_notes`	base‑rate anchoring (superforecasting)
Deviation thesis	`prep_deviation.py` → Workflow → `apply_deviation.py`	`narratives.json::deviations`	analytic rigor / ICD 203
Headline (only when new scenarios were added)	`prep_headlines.py` → Workflow → `apply_headlines.py`	`narratives.json::headlines`	E‑E‑A‑T / readability

CHECKLIST: - [ ] Probability is base‑rate‑anchored to n_analogues (frequent patterns higher; rare/tail lower) and well‑spread, not clustered. (Bands: §18.) - [ ] Each probability carries a ≤18‑word rationale (prob_notes) — never a bare number (ICD 203: pair the estimate with its basis). - [ ] Where cascade ≠ measured history on a material asset, a deviation thesis (≤34 words) names the failure mode: regime contamination · swamped channel · thin/stale sample · structural change · history‑wins — and says which side to trust. - [ ] Headline is a natural, grammatical "What if …?" (not the terse internal title). - [ ] ⛔ apply_headlines.py MERGES into the headlines dict — confirm it did not wipe the existing 1,200+ (it setdefault().update()s; never replace the dict). - [ ] Guidance (from build_scenarios) passes the emotional‑integrity standard (§19): it tells the reader what to do and steadies them.

10. Stage 6 — QUALITY GATE ⛔ (the finalize POST‑CHECK)

This is the gate the user asked for: "what we pre‑check and post‑check once the scenario is finalised." A scenario may be promoted to status: keep and published only when every box here is ticked. If any ⛔ fails, fix it or leave the scenario as candidate (it will render noindex/thin until fixed).

A. Integrity (the three invariants): - [ ] ⛔ Measured: history/<id>.json has events_used ≥ 1 with real sources; the per‑asset abnormal returns render. - [ ] ⛔ Calibrated: probability is a sober, base‑rate‑anchored prior with a one‑line rationale; not a forecast. - [ ] ⛔ Coverage: if the scenario introduces/relies on an HL asset, that asset is in coverage_report.json as covered (price/proxy + ≥1 scenario + ≥1 analogue). Run bin/coverage_report.py; gaps must not increase.

B. Content quality ("no weak proposition" — §17): - [ ] Distinct (no near‑duplicate); concrete channel + magnitude; precise one‑sentence scenario. - [ ] Cascade is economically coherent (signs make sense; ripple chain is real, not decorative). - [ ] Deviation thesis present wherever cascade and measured history disagree on a material asset. - [ ] No fabricated precise statistics; magnitudes realistic and consistent with the cited concern.

C. Emotional integrity (§19): - [ ] "What to do if this happens" renders with Long/Short/Cash guidance + a plain‑English common‑man line. - [ ] The tone steadies (it frames probability + horizon + what to watch), it does not induce panic.

D. SEO / trust / YMYL (§20): - [ ] Indexable test passes: probability and events_used both present (else it is correctly noindex). - [ ] Title/description are honest and specific; JSON‑LD (Article + Dataset + BreadcrumbList) is intact. - [ ] Sources are cited and resolve; the probabilistic‑future + not‑investment‑advice disclaimer is present.

E. Data quality (DAMA 6 dimensions — §17.1): - [ ] Accuracy (matches the cited reality) · Completeness (roots+cascade+history+guidance all present) · Consistency (signs/units agree across cascade, guidance, history) · Timeliness (probability reflects current regime) · Validity (vocab honoured) · Uniqueness (no duplicate id/title).

[ ] Promote status: candidate → keep only after A–E all pass.

11. Stage 7 — PUBLISH (`build_seo_pages.py` → GitHub)

Goal: render the page + all hubs, run tests, ship via Git.

PYTHONPATH=. .venv/bin/python bin/build_seo_pages.py     # pages + hubs + sitemaps + stats + search_index
PYTHONPATH=. .venv/bin/python -m pytest -q               # ⛔ expect "227 passed" (or current count)
git add -A && git commit -m "…" && git push origin main  # ⛔ DEPLOY = git push (auto-builds on Vercel)

CHECKLIST: - [ ] build_seo_pages.py ran clean; the new web/public/scenario/<slug>.html exists. - [ ] data/stats.json count went up by the number added (counts self‑update; never hardcode a count anywhere — counts.js + [data-mg-count] fill them). - [ ] data/search_index.json includes the new scenario (with c=kicker, g=sector). - [ ] Sitemaps regenerated; the new indexable page is in sitemap-scenarios.xml. - [ ] ⛔ Tests pass (pytest -q). - [ ] ⛔ Deploy via git push origin main — the Vercel project public (rootDirectory web/public, branch main) auto‑builds server‑side. Do NOT use vercel deploy --archive=tgz (it re‑uploads the whole site and exhausts the free upload quota — api-upload-free 429). The refresh_scenarios.py docstring still says "archive deploy" — that is stale; ignore it. - [ ] Commit only the intended files; do not sweep in unrelated periodic‑refresh data churn (charts/history) unless that's the point of the commit.

12. Stage 8 — VERIFY ON PROD ⛔

Goal: confirm it's actually live and correct — not just deployed. Prod alias: https://macroguru.app.

CHECKLIST: - [ ] Poll the deploy by commit SHA (the first READY you see is often the previous deploy). vercel ls public --yes → vercel inspect <building-url> until Ready. - [ ] ⛔ Fetch the clean URL (cleanUrls strips .html): curl -sL https://…/scenario/<slug> — not …/scenario/<slug>.html (that returns a ~15‑byte redirect stub and looks broken). For static assets, byte‑compare (curl … | wc -c vs local) to confirm freshness. - [ ] The page shows: oddsbar (prob + crowd), markets table, "What to do if this happens," "Historical precedent" table, related scenarios. - [ ] Search (type a token) returns the scenario with a kicker + highlighted match + prob pill. - [ ] Mobile: at 390px and 360px the page has zero horizontal overflow (scrollWidth == viewport) and no overlap. (Mobile standards: §21.) - [ ] No console errors; the price chart line renders (the --line var fallback is present).

13. Stage 9 — MONITOR, CALIBRATE & RETIRE (the living scenario)

A scenario is never "done forever" — it is scored against reality and updated. This is the self‑improving loop (docs/SELF_IMPROVING_ENGINE.md, stages D/E/G).

CHECKLIST (ongoing): - [ ] Coverage stays green: bin/coverage_report.py → coverage_report.json shows no new gaps after the add. - [ ] Calibration is watched: calibration.json (reliability curve by confidence band, corroboration score) — high‑probability scenarios should fire more often than low‑probability ones; mis‑calibrated buckets get nudged toward the realized base rate (apply_probs next cycle). - [ ] Prob‑history updates: bin/build_prob_history.py redraws how the probability moved + the events that moved it. - [ ] Fires are logged: when a matched real event fires, score the realized abnormal returns vs the projected cascade (sign hit‑rate); a systematic miss → re‑map roots + fresh deviation thesis. - [ ] Retire/deactivate, never delete: an obsolete scenario is set status: drop (or an asset deactivated) — history is preserved for future re‑use (invariant).

14. Sub‑process A — Adding a tradable asset/market

"All the pointers related to adding the market in the product." Triggered when a new market matters (a new Hyperliquid listing, or a market we want charts/evidence for). Goal: no tradable left behind.

CHECKLIST: - [ ] Classify the asset (crypto / commodity / tokenized‑equity / FX / index / rate / vol). - [ ] Price source + ticker map: register it in the price layer (macroguru price map / ASSET_YF) and, for the chart, add it to bin/build_asset_charts.py's set; run it → web/public/data/charts/<tk>.json (+ charts_index.json). - [ ] Short‑history proxy: if the asset has < ~1y of data, map a long‑history peer/proxy (new L2 → ETH/SOL beta; new gold product → XAU) so the event study has something to measure. - [ ] HL alias: add the HL symbol alias in hl_universe.TICKER_ALIASES (e.g. our XAU ↔ HL PAXG). - [ ] Scenario inclusion: for every existing scenario, does the asset's class intersect the scenario's root shocks? If yes it inherits the cascade automatically (it becomes a new leaf in cascade/propagation.py). - [ ] Evidence: run build_history.py so the asset gets measured abnormal returns across the matched analogues. - [ ] Net‑new asset‑specific scenarios: generate the scenarios the asset makes relevant that the library lacks (e.g. a new staking token → "staking‑yield collapse"). Run them through §5–§11. - [ ] Chart var sanity: confirm the price line renders on news.css pages (the --line fallback). Click‑to‑chart is gated to charted tickers. - [ ] ⛔ Coverage assertion: coverage_report.py shows the asset as covered (price/proxy + ≥1 scenario + ≥1 analogue). A remaining gap fails the loop — fix the price source or proxy (e.g. the known HYPE gap → wire HL's candle API / CoinGecko). - [ ] Rebuild + publish (§11) + verify (§12).

15. Sub‑process B — Adding a Week‑Ahead catalyst playbook

For scheduled catalysts (NFP, CPI, FOMC, OPEC+, options expiry, Jackson Hole, shutdown, elections…). Content lives in bin/catalyst_playbook.py; pages render at /week-ahead/<type>.

CHECKLIST: - [ ] Add a PLAYBOOK[<ctype>] entry: slug, label, sector, kicker, blurb, watch, poly_q, kalshi_q, and 3 outcomes. - [ ] Each outcome has: name, prob (base rates summing ≈ 1 across the three), tag (risk-on|risk-off|mixed), a thesis, a cascade (_imm/_rip nodes: asset, signed direction, magnitude prior, one‑line mechanism), and a guide (stance, long, short, cash?, common). - [ ] Magnitudes are reaction‑function priors grounded in published cross‑asset consensus, tuned to the current regime — labelled as priors (not measured returns). - [ ] Add a CALENDAR row in bin/build_upcoming.py with the matching ctype. - [ ] Run build_upcoming.py then build_seo_pages.py; verify /week-ahead/<slug> renders 3 outcomes + cascade + "what to do," and the landing card links to it. - [ ] Mobile + verify on prod (§12).

15A. Sub‑process C — The weekly "Solid chance of happening" predictions

The top section of the landing page (index.html) — a small, curated list of the highest‑probability, falsifiable macro calls for the next ~7 days, blended from the scheduled calendar, analyst consensus, prediction‑market crowd odds, the verified world‑state, and the engine's own scenarios. It is the product's sharpest expression of the mission: peek into the coming week, with calibrated odds, not hype.

Data flow: config/predictions_week.yaml (the research‑authored weekly seed) → bin/build_predictions.py (matches each call to the nearest library scenario for a deep‑dive link, resolves asset tickers→names, validates the honesty contract, writes web/public/data/predictions.json) → web/public/predictions.js renders the cards into #solid-predictions. build_predictions.py runs automatically at the tail of build_seo_pages.py (so the scenario deep‑dive links stay synced to fresh slugs).

Four surfaces, one feed (predictions.json): (1) the landing section (index.html via predictions.js); (2) the /news lead band (server‑rendered render_solid_predictions()); (3) the site‑wide ribbon (web/public/predictions-ribbon.js — a STATIC, dismissible, theme‑aware strip of the top calls injected on every page after the breaking‑bar; no auto‑scroll per WCAG 2.2.2; dismissal persists per week_of; self‑suppresses on /predictions); and (4) the dedicated /predictions page (render_predictions_page() → predictions.html) — the full week with per‑card "≈ N in 100", ours‑vs‑crowd edge, the fixed resolution rule (joined from predictions_log.json), Article+ItemList JSON‑LD, an evergreen dateless URL re‑rendered in place. All four refresh from the same weekly rebuild — no extra steps.

⛔ Honesty contract (non‑negotiable): every prediction ships with a calibrated probability, an explicit basis (scheduled | consensus | base_rate | seasonality | crowd | regime), and real source URLs. "scheduled" = the event WILL occur (~certain); the outcome is the prediction — never conflate them. The section header says "Calibrated odds, not advice." Calibrated, never clairvoyant.

WEEKLY REFRESH CHECKLIST (run every Sunday/Monday for the new week): - [ ] Re‑run the 4 research streams (macro calendar + consensus · geopolitics/energy · markets/crypto/corporate · prediction‑market odds) for the new Mon–Sun window; verify dates against ≥2 sources; capture the current regime + verified world‑state. (See the deep‑research prompts in the session log / §27.) - [ ] Rewrite config/predictions_week.yaml: week_of, window, as_of, regime, and ~6–9 predictions. Each needs: title, claim (falsifiable), probability (0–1, calibrated), optional crowd_probability, basis, timeline (the date), category (a VALID_CAT), what, assets (engine tickers + up|down|flat), recommendation, optional edge, a match hint (keywords → scenario), and ≥1 sources. - [ ] Keep the mix honest + diverse: a few scheduled‑certain anchors, the highest‑conviction outcome calls, and 1–2 genuine edges (a seasonality base rate, an OURS‑vs‑crowd gap). Don't pad with filler. - [ ] Write every claim / recommendation / edge in the MacroGuru voice (docs/VOICE.md): lead with the call, the number is the confidence, one‑line caveat, cut the process‑meta. No "graded against / according to / we track whether". B2C — sharp, not chatty. - [ ] Run PYTHONPATH=. .venv/bin/python bin/build_predictions.py → confirm every call links to a sensible scenario (tighten the match hint if a link is off‑theme/direction). - [ ] Rebuild the site (build_seo_pages.py re‑runs predictions automatically) → verify the landing section, the ribbon (top of every page; dismiss re‑shows next week), and the /predictions page all render, are mobile‑clean, and links resolve. Commit + deploy + verify on prod (§12).

15B. Sub‑process D — Resolving predictions & the Reality Check (the metric that matters most)

"How close to reality are we?" is the only test our work is ultimately judged on. Every published prediction is logged with a fixed, source‑tied resolution rule and scored against what actually happens — wins and losses, in public, at /reality-check. This is the accountability spine of the product. ⛔ Never cherry‑pick: once a call is published it stays on the record, win or lose.

Data flow: each weekly prediction (with resolves_on, resolution_criteria, scheduled_certain) auto‑syncs from config/predictions_week.yaml into the append‑only ledger config/predictions_log.json via bin/build_predictions.py (open entries; it NEVER overwrites a human‑set resolution). bin/build_scorecard.py scores the ledger → web/public/data/scorecard.json → the /reality-check page renders it. Both run at the tail of build_seo_pages.py. Methodology = proper scoring rules (one‑sided Brier in [0,1], Murphy reliability/resolution decomposition, Brier‑skill‑score vs the base rate AND vs the prediction‑market crowd, calibration‑by‑bucket with Wilson bands), per Tetlock/GJP + Metaculus practice.

WEEKLY RESOLUTION CHECKLIST (run every Monday for the week that just ended): - [ ] For each now‑past forecast in config/predictions_log.json, research what ACTUALLY happened against its pre‑registered resolution_criteria (verify with ≥2 sources). - [ ] Set status: "resolved", outcome: true|false (or partial), resolved_on, and evidence: {text, url}. If the premise was voided/ambiguous, set status: "annulled" (counts for nobody, stays visible). - [ ] ⛔ Do NOT edit probability or the claim after publication — the forecast is frozen at publication. Only add the resolution fields. - [ ] Scheduled‑certain calls (scheduled_certain: true) stay in the ledger but are excluded from the skill Brier/BSS — don't let calendar gimmes inflate the headline. - [ ] Run bin/build_scorecard.py (or any build_seo_pages.py); confirm the Brier, calibration curve, and our‑vs‑crowd edge update, and the resolved calls show ✓/✗ with their evidence. Commit + deploy + verify (§12). - [ ] Sanity: report counts + a confidence caveat while n is small; never headline a Brier off <~15 resolved calls.

16. Scaling to 100,000

The path from ~5k → 100k is industrialised generation + the same per‑scenario gates on every record. Volume never lowers the bar.

16.1 The lane method (proven for 1,200 → 5,141)

[ ] Decompose the world into lanes = jurisdiction × mechanism × magnitude × horizon (the CCAR landscape: every stress‑test body × every channel). See docs/CCAR_STRESS_TEST_LANDSCAPE.md.
[ ] One generation agent per lane writes a JSON array to data/gen/L<NN>_<theme>.json, each record = {category, probability, timeline, title, scenario, countries, roots} per docs/CCAR_SCENARIO_GENERATION_SPEC.md.
[ ] Prefer 150 strong over 180 with a weak tail (the spec's rule). Quality over count.

16.2 Merge (validate + dedup)

.venv/bin/python bin/merge_ccar_scenarios.py    # validates vocab, dedups by normalised title, fresh ids, appends

[ ] It validates every record against the category + root‑factor + timeline vocab (§4).
[ ] It de‑duplicates by normalised title (vs the whole library AND within the batch).
[ ] It assigns fresh ids max+1, appends YAML (minimal diff), writes roots to scenario_roots_ccar.yaml.
[ ] Re‑running is safe (skips titles already present).

16.3 Then the standard pipeline on the whole batch

[ ] build_scenarios.py → cascades · build_history.py → evidence · enrich (§9: probs + headlines required for new ids) · build_seo_pages.py · test · push · verify.
[ ] Regenerate config/scenarios_clean.md (bin/dump_scenarios_md.py) so the human‑readable mirror stays current.

16.4 The evidence bottleneck (the real limit on indexable scale) ⛔

A scenario is only indexable (first‑class, public, ranked) when it has measured events_used. So 100k indexable scenarios requires the event library to scale with them: - [ ] Grow config/historical_events.yaml (currently ~993) via the adversarial event‑sourcing workflow → merge_history.py, so every new lane has real analogues to match. - [ ] Ensure new scenarios' tags/themes overlap the event library so build_history.py finds analogues (a scenario with no matchable analogue stays noindex — that's correct, not a bug). - [ ] Track the indexable ratio in stats.json (indexable / scenarios). Driving that ratio up is the work of scaling — raw count without evidence is vanity.

16.5 Global coverage discipline (so it's "from all over the world")

[ ] Watch the country distribution (country hubs in stats.json); deliberately commission lanes for under‑represented regions/themes.
[ ] Every jurisdiction's real stress‑test body should be represented (the landscape doc is the checklist of bodies).

16.6 The frontier-of-human-development axis (good AND bad)

Coverage is not only geographies and crises. The second coverage axis is the frontier of our species — the developments that re‑shape how we live: medicine, AI, technology, science journals, patents & discoveries, biology, nature. This is the home of invariant #4. The full framework (lanes, sources, vocab, ontology, the good/bad rule, worked examples) lives in docs/FRONTIER_SCENARIO_SPEC.md — read it before commissioning a frontier lane. The operational checklist:

[ ] Pair every breakthrough with its tail. For each frontier development, author both the good outcome (RISK‑ON: e.g. "Universal cancer vaccine works") and its probable‑negative (RISK‑OFF: e.g. "Engineered pathogen escapes a lab"). A one‑sided frontier theme is incomplete — do not ship it half‑done.
[ ] Use the frontier root factors (signed, bidirectional) so the engine routes good→RISK‑ON / bad→RISK‑OFF automatically: ai_breakthrough, scientific_breakthrough, biotech_breakthrough, longevity, clean_energy, space_economy, neuro_interface, biosecurity_risk, biodiversity_loss. These are wired in macroguru/cascade/propagation.py, labelled in bin/build_scenarios.py, accepted by merge_ccar_scenarios.py (FRONTIER_FACTORS), and tagged for evidence in bin/build_history.py (root_tags). Add to all four when introducing a new frontier factor.
[ ] Match real analogues. Frontier scenarios match the e4-frontier-* events in config/historical_events.yaml (AlphaFold, ChatGPT, GPT‑4, the Nvidia AI‑capex wave, mRNA/GLP‑1/CRISPR, NIF fusion, Neuralink, LK‑99, JWST, SpaceX landing, IPBES biodiversity…). A novel scenario will show agree=False vs an imperfect analogue — that is the deviation thesis, surfaced honestly, not a bug.
[ ] Keep the live feed in sync. New frontier topics must be classifiable from the wires: macroguru/data/science_news.py (Nature, Science, arXiv, MIT Tech Review, MIT News, FDA) → corroboration.TOPIC_KEYWORDS (frontier topics) → alerts.TOPIC_TO_EVENT → a frontier ontology event in config/event_ontology.yaml (ai_capability_breakthrough, ai_capex_bust, biomedical_breakthrough, biosecurity_shock, clean_energy_abundance, space_economy_milestone, scientific_discovery). A topic with no ontology event is context‑only (it never drives a cascade). After editing the ontology, recompile (compile_ontology() in bin/refresh_site.py).
[ ] ⛔ Precision over recall in the classifier. Frontier keywords must use word boundaries (\bagi\b, not agi — which matches "magic", "agitation"). After any vocab change, run the false‑positive guard before committing. The corroboration gate (2 independent classes; a lone Nature paper or FDA approval = WATCH, not ALLOW) is the second line of defence.

17. The "no weak proposition" rubric

A scenario is strong only if it passes all of these (synthesis of scenario‑planning, superforecasting, ICD 203, and DAMA data‑quality consensus — §27):

[ ] Anchored — a named real concern or episode (not invented).
[ ] Specific — concrete channel + magnitude; one precise sentence (no "markets fall," no vague filler).
[ ] Plausible — internally coherent and physically/economically possible (scenario‑planning's core test).
[ ] Distinct — not a near‑duplicate.
[ ] Falsifiable & measured — has an event study with real analogues and sources.
[ ] Calibrated — sober, base‑rate‑anchored probability with a stated rationale.
[ ] Decision‑relevant — a reader can act; the guidance is concrete.
[ ] Honest — disagreements with history are surfaced (deviation thesis), not hidden; uncertainty is shown.

17.1 Data‑quality dimensions (DAMA 6) mapped to a scenario

Dimension	For a scenario, it means
Accuracy	impacts/sources match cited reality
Completeness	roots + cascade + history + guidance + probability all present
Consistency	signs/units agree across cascade ↔ guidance ↔ measured history
Timeliness	probability reflects the current regime; prices/history are fresh
Validity	category/timeline/roots honour the controlled vocab
Uniqueness	unique id; no duplicate title

18. Probability & calibration standards

Priors, not forecasts. Always labelled as base‑rate‑anchored priors.
Anchor to base rates first (superforecasting): start from "how often has this type of event happened?" via the count of measured analogues (n_analogues); then adjust for current conditions.
Bands (well‑spread, not clustered): 0–6mo one‑offs 0.05–0.40 · structural/tail 0.01–0.10 · developing trends 0.50–0.85. CCAR spec: severe‑tail 0.02–0.10, plausible‑cyclical 0.15–0.45.
Pair every estimate with its basis (ICD 203 words‑of‑estimative‑probability): a ≤18‑word rationale; never a bare number.
Score it (Brier): calibration.json holds the rolling reliability curve by confidence band; the target is that predicted ≈ realized frequency. (Human superforecasters land Brier ≈ 0.15–0.20; that's the bar to beat over time.)
Update, but not too much: nudge mis‑calibrated buckets toward the realized base rate each cycle (apply_probs).

18.1 Derive BOTH numbers from history, then track the variance (the accountability spine)

Full contract: docs/CALIBRATION_METHODOLOGY.md. Engine: macroguru/calibration/derive.py. Neither the probability nor the per‑asset impact % is asserted any more — each is built from a reference class, the build is recorded so it can be audited, and after the fact it is scored against reality.

Probability — derive_probability() in build_history.py: shrink the assigned prior toward the reference‑class (category×timeline) base rate by precedent strength, log‑odds‑pool with the crowd, extremize mildly, reserve mass for unknown‑unknowns + Cromwell‑clamp [2%,97%], and report a credible interval that widens with thin precedent. Written to each scenario's prob_derivation and surfaced as "History‑derived X% · 90% range A–B%" on every scenario page + the Lab/What‑If reasoning panel.
Impact % — derive_impact(): shrink the measured analogue abnormal return toward 0 by reliability (sample × consistency × confidence), blend with the cascade prior, band it (fat‑tail aware). Written per market as impact; surfaced as the "hist A–B%" range next to every projected move.
Tracking — impact_accuracy.json (built by build_history.py) scores the published % vs the measured analogue move (MAE/RMSE/bias/dir‑hit/coverage/skill‑vs‑no‑move, by confidence band); build_scorecard.py folds it into scorecard.json; both show at /reality-check alongside the probability Brier/calibration.

18.2 Sub‑process E — the weekly recalibration (learning loop)

Run bin/recalibrate.py (auto‑runs at the tail of build_seo_pages.py) → recalibration.json. It proposes (never auto‑applies): 1. Impact bias offset — if magnitude bias is materially ≠ 0, apply −bias to published moves. 2. Confidence recalibration — if directional accuracy doesn't rise with the confidence score, down‑weight/re‑fit it. 3. Probability — once forecasts resolve, fit a Platt recalibration on the resolved set; and review the scenarios flagged where the history‑derived probability diverges ≥15pts from the assigned prior.

Apply the accepted proposals via scenario_overrides.yaml / propagation.py (mirrors the apply_probs flow), then rebuild. Each cycle closes the gap to reality a little more.

19. The emotional‑integrity standard (the product purpose)

We exist to help people peek into the future, manage their emotions, and decide better. Behavioral‑finance research is explicit about the failure modes we must counter: loss aversion, recency bias, and panic selling driven by fear of immediate loss. So every published scenario must:

[ ] Name what to do — concrete Long/Short/Cash actions (not just "risk is high").
[ ] Give a plain‑English "common‑man" line — what a normal, stock‑heavy portfolio should consider.
[ ] Frame, don't frighten — always pair the scary part with probability + horizon + what to watch, so the reader sees it as one rehearsed branch of many, not a certainty.
[ ] Pre‑commit the response — the guidance is the reader's plan before the moment, which is exactly the discipline behavioral finance prescribes to avoid impulsive, fear‑driven decisions.
[ ] Tell the truth about uncertainty — show the reliability/where‑we‑may‑diverge, so trust is earned, not assumed.

Litmus test: would this scenario make an anxious reader calmer and better‑prepared, or just more scared? If the latter, it fails — add the guidance and the framing.

20. SEO / E‑E‑A‑T / YMYL standards

Finance is YMYL ("Your Money or Your Life") — Google holds it to the strictest E‑E‑A‑T bar, and so do we.

[ ] Experience/Expertise/Authoritativeness/Trust: original measured data (the event study), clear method, honest disclaimers — not thin AI filler.
[ ] Sources cited and resolving (every analogue's source_url).
[ ] People‑first / helpful: the page answers the reader's real question (what could happen, how likely, what to do) — not written for keywords.
[ ] Structured data intact: Article + Dataset + BreadcrumbList JSON‑LD (not the retired ClaimReview/FAQ).
[ ] Indexability honesty: thin/unmeasured scenarios are noindex until they earn evidence (don't index vanity pages).
[ ] Accountability: the model + method are named; "probabilistic model of the future, not investment advice" appears.

21. Accessibility & mobile standards

Every page (scenario, hub, week‑ahead, landing, legacy) must pass: - [ ] Zero horizontal overflow at 390px and 360px (scrollWidth == viewport). - [ ] Touch targets ≥ 44px (Apple HIG) / ≥ 24px min (WCAG 2.5.8); inputs ≥ 16px (no iOS zoom‑on‑focus). - [ ] Tables scroll inside their own container (comparison tables) or stack (content tables) — never push the page. - [ ] Marquees/animation pause on hover/focus + honour prefers-reduced-motion (WCAG 2.2.2). - [ ] No sticky element covers content on mobile. - [ ] Verify with cache‑busted CSS or on prod (the local preview caches CSS/JS across reloads).

22. Failure modes & gotchas (hard‑won)

⛔ Deploy = git push, never vercel deploy --archive=tgz (re‑uploads ~5k+ files → api-upload-free 429, a 24h lockout). The refresh_scenarios.py docstring is stale on this.
⛔ cleanUrls: verifying a page with curl …/x.html returns a 15‑byte redirect stub (looks like a failed deploy). Use the clean path …/x or curl -L. .css/.js/.json are unaffected.
⛔ Deploy‑by‑SHA: the first READY in vercel ls is often the previous build; match the commit SHA before declaring success.
⛔ apply_headlines.py must MERGE (setdefault().update()), never replace the headlines dict, or it wipes the existing thousands.
⛔ Generated vs hand‑maintained pages: build_seo_pages.py owns news.html, scenario/*, sectors/*, countries/*, assets/<slug> (hubs), risks/*, week-ahead/*. refresh_site.py renders data_quality.html + honest_limits.html from the root *.md via its _DOC_TMPL (and the periodic job re‑runs it — edit the template, not the output, or your change is wiped). Everything else (scenarios.html, monitor.html, whatif.html, world.html, assets.html, index.html, dashboards) is hand‑maintained.
⛔ Roots are mandatory — a scenario with no roots is silently skipped at publish.
⛔ Indexable = probability AND events_used — a scenario with no measured analogue renders noindex (correct, but means "not first‑class yet").
⛔ Never hardcode counts — emit to stats.json; fill via [data-mg-count] + counts.js.
⛔ Never renumber ids / change slugs casually — it nukes live URLs.
Idempotency: build_history/build_scenarios write‑if‑changed (α/β quantised) — re‑running on unchanged prices writes ~0 files; a huge diff means inputs really changed (or upstream data drifted).
UTF‑8 grep: the cascade arrows ▲▼◆ and › break a plain grep in zsh (character not in range) — export LC_ALL=en_US.UTF-8 first.
Local preview caches CSS/JS across reloads and is localhost‑only — cache‑bust (?bust=) or verify on prod.

23. Roles, cadence & ownership

Cadence	What runs	Mechanism	Cost
5 min	alert bus, hypothesis monitor, live‑monitor feed	`event_tick.py` / `reactive_engine.py` (launchd)	free
Daily	prices → history → cascades → coverage → calibration → deploy	`refresh_scenarios.py` (launchd)	free
Monthly / on‑trigger	probability recalibration, deviation theses, headlines (new ids), deep event sourcing, bulk scenario generation	`bin/REFRESH_RUNBOOK.md` (`ops/com.macroguru.intel.plist`)	~5–10M tokens
On new HL listing	new‑asset deep round (§14)	event‑triggered fast‑path	bounded tokens

Owner of this SOP: Vikas. Executors: Vikas, Claude, or claude -p unattended.
Definition of done for any add: every ⛔ box in §10, §11, §12 ticked + pytest green + prod verified by SHA.

24. Master checklist (tear‑off)

One scenario, idea → live. (Bulk: do §16 first, then this per record.)

PRE‑CHECK (idea)
[ ] real anchor (regulator concern / historical episode)
[ ] specific channel + magnitude; distinct; decision‑relevant; falsifiable
[ ] driver expressible in the 50‑factor root vocab; assumptions explicit

SPECIFY
[ ] next free id (never reuse) · category ∈ taxonomy · status: candidate
[ ] timeline en‑dash · title 2–6 words · one precise sentence
[ ] roots entry: 2–5 signed factors · sober probability

BUILD
[ ] build_scenarios.py  → scenarios/<id>.json (cascade imm+ripple, markets ranked, HL gated, countries, guidance)
[ ] build_history.py    → history/<id>.json with events_used ≥1, real sources, 20d/5d abnormal returns

ENRICH (monthly tier)
[ ] probability recalibrated + ≤18‑word rationale
[ ] deviation thesis where cascade ≠ measured history
[ ] headline "What if …?" (apply_headlines MERGES)

QUALITY GATE ⛔ (finalize)
[ ] Integrity: measured ✓ calibrated ✓ coverage (no new gap) ✓
[ ] Content: distinct, concrete, coherent cascade, deviation explained
[ ] Emotional: what‑to‑do + common‑man + steadying frame
[ ] SEO/YMYL: indexable (prob+events_used), sources resolve, JSON‑LD, disclaimer
[ ] DQ: accuracy·completeness·consistency·timeliness·validity·uniqueness
[ ] promote status: candidate → keep

PUBLISH
[ ] build_seo_pages.py · stats.json count up · search_index includes it · sitemap updated
[ ] pytest -q green
[ ] git push origin main   (NOT archive=tgz)

VERIFY PROD ⛔
[ ] poll deploy by SHA → READY
[ ] curl clean URL (…/scenario/<slug>, with -L) shows oddsbar + markets + what‑to‑do + historical precedent
[ ] search returns it; mobile 390/360 zero overflow

MONITOR
[ ] coverage_report green · calibration watched · prob_history drawn · retire = status:drop (never delete)

25. Glossary

Root shock / driver — one of the 50 signed macro factors a scenario moves (the input to the cascade).
Cascade / butterfly effect — the propagated, ranked cross‑asset impact (immediate 1‑hop → ripple 2+‑hop).
Event study — measuring abnormal return AR = r − (α + β·r_SPX) around analogous historical events.
Abnormal return — the asset's move with market beta stripped out (the event's own effect).
Analogue — a real past event (in historical_events.yaml) matched to a scenario by tags/theme.
events_used — the analogues actually used to measure a scenario; required for indexability.
Indexable — a scenario with probability AND events_used → first‑class public page (in sitemap).
Prior — our base‑rate‑anchored probability (not a forecast).
Brier score / reliability curve — how well predicted probabilities match realized frequencies.
Coverage gate — the "no asset left behind" check (coverage_report.json).
Deviation thesis — the ≤34‑word explanation of why the forward cascade departs from measured history.
Lane — a jurisdiction × mechanism × magnitude × horizon slice used for bulk generation.
HL gating — marking which impacted markets are tradable on Hyperliquid (trade button vs shown‑only).

26. Maintaining this document

[ ] When the pipeline changes (a script renamed, a gate added, a vocab extended), update the relevant section here in the same commit. This doc is load‑bearing — drift makes it dangerous.
[ ] When a new failure mode is discovered, add it to §22 immediately (that's how we "never assume" twice).
[ ] Keep the controlled vocabularies (§4) in sync with docs/CCAR_SCENARIO_GENERATION_SPEC.md and bin/news_taxonomy.py (those + merge_ccar_scenarios.py's validator are the machine‑enforced truth).
[ ] Bump the Last updated date at the top.

Companion docs (read alongside this one): SELF_IMPROVING_ENGINE.md (the master loop) · HISTORICAL_ENGINE.md (event‑study mechanics) · CCAR_SCENARIO_GENERATION_SPEC.md (the generation contract) · CCAR_STRESS_TEST_LANDSCAPE.md (the world's stress‑test bodies = the lane map) · ../bin/REFRESH_RUNBOOK.md (the monthly intelligence refresh) · the rendered operator docs DATA_QUALITY.md & HONEST_LIMITS.md.

27. References (external consensus)

The standards above are grounded in published best practice, not invented:

Scenario planning (plausibility, driving forces, decision‑relevance): Shell/GBN methodology — overview via sciencedirect.com review of reviews, Policy Horizons Canada foresight manual.
Superforecasting & calibration (base rates, Brier ≈ 0.15–0.20, update incrementally): Tetlock's Good Judgment Project — AI Impacts summary, notes on Superforecasting.
Analytic standards (sourcing, explicit assumptions, words‑of‑estimative‑probability with confidence): ODNI ICD 203.
Data‑quality dimensions (the 6: accuracy, completeness, consistency, timeliness, validity, uniqueness): DAMA‑NL DDQ paper, IBM data‑quality dimensions.
Behavioral finance (loss aversion, recency bias, panic selling, pre‑committed plans): Morgan Stanley — behavioral finance, hyperbolic discounting & panic selling (NIH/PMC).
E‑E‑A‑T / YMYL (financial content held to the strictest trust bar; people‑first, named accountability): Google — creating helpful, reliable, people‑first content, Search Engine Land — YMYL guide.
Frontier‑of‑human‑development axis (the good/bad scenario framework, sources, vocab, ontology): docs/FRONTIER_SCENARIO_SPEC.md — the companion spec to this SOP.

End of SOP. If you followed every checkbox, the scenario is measured, calibrated, covered, emotionally honest, indexable, accessible, and live — and it does the one thing we exist for: help someone meet the future calmly and decide better. Now do it 100,000 times.