⌂ front page · command center
rendered 2026-07-03T12:35:48Z

MacroGuru — Scenario Lifecycle & Quality SOP

This is the canonical operating document for adding anything to the MacroGuru library — a what‑if scenario, a tradable market/asset, or a Week‑Ahead catalyst playbook. It is written so that the author (Vikas), Claude, and anyone who takes over the product can follow it without guessing. Every actionable step is a checkbox. If a step is not checked, the item is not done — do not assume, do not skip.

Last updated: 2026‑06‑28 · Owner: Vikas · Status: LIVE / authoritative. Keep this file in sync with the code — see §26 Maintaining this document.


0. How to use this document

Conventions in this doc: ✅ = built/required · 🔶 = partially built / manual today · ⛔ = hard stop (do not publish if this fails) · code = exact file/command. All Python runs as PYTHONPATH=. .venv/bin/python … from the repo root /Users/bloquelabs/learn/MacroGuruStrategy.


1. North Star & non‑negotiable invariants

The mission. Capture 100,000 scenarios from all over the world and make each one do exactly one job: help a person peek into the future, manage their emotions, and make a better decision. Every scenario is a small, falsifiable, emotionally‑intelligent rehearsal of a possible future — not a prediction, not hype, not filler.

Three things make a MacroGuru scenario different from a blog post or a tweet, and they are invariants — they may never be broken to hit a number:

  1. Measured, not asserted. Every cross‑asset impact is an abnormal return (market‑beta stripped: AR = r − (α + β·r_SPX)) measured over real analogous historical events. No number in the product is unfalsifiable. (Source of truth: docs/HISTORICAL_ENGINE.md.)
  2. Calibrated, never clairvoyant. Every probability is a base‑rate‑anchored prior, continuously scored against what actually happens (Brier‑style reliability), and labelled as a prior — never as a forecast.
  3. No tradable left behind. Every asset listed on Hyperliquid must be covered: it appears in the scenarios that move it, it has price (or proxy) data, and it has measured evidence. The coverage gate (coverage_report.json) fails loudly on a gap. (Source: docs/SELF_IMPROVING_ENGINE.md.)
  4. Both sides of every frontier. The future is not only crises. Where the library covers a force that re‑shapes our species — AI, medicine, science, energy, space, biology, nature — it must hold both the good outcome (the breakthrough, RISK‑ON) and its probable‑negative tail (the bust/shock, RISK‑OFF). Never ship the fear without the hope, or the hope without the tail. (Framework: docs/FRONTIER_SCENARIO_SPEC.md; operational steps in §16.6.)

The emotional contract (the reason we exist). A scenario that scares without guiding is a failure. Every published scenario must answer "so what do I do, and how do I not panic?" — see §19 Emotional‑integrity standard.


2. What a scenario IS — the data model

A scenario lives in three places, in this order of authority:

Layer File What it holds Who writes it
Source record config/scenarios.yaml the human‑authored truth you / merge_ccar_scenarios.py
Roots (drivers) config/scenario_roots_ccar.yaml, config/scenario_roots_batch3.yaml id → {probability, timeline, roots:{factor:signed}} generator / by hand
Overrides (applied LAST) config/scenario_overrides.yaml id → {probability, [roots]} recalibrations bin/apply_probs.py (machine‑managed)

2.1 The scenarios.yaml record (exact fields)

- id: 1                       # int, unique, never reused (deactivate, never delete — invariant)
  category: geo_escalation    # MUST exist in bin/news_taxonomy.py SECTORS map (see §4)
  status: candidate           # candidate = awaiting review · keep = published · drop = excluded
  probability: 0.06           # float 0..1 (curated prior; overrides win at build time)
  timeline: 0–6 months        # EN‑DASH "–", one of the four allowed (§4)
  title: Hormuz closure       # 2–6 words, concrete, headline‑grade (no "markets fall")
  scenario: >                 # ONE precise sentence: shock + trigger + a real magnitude/episode
    Iran closes the Strait of Hormuz after a naval clash with US forces, choking ~20% of seaborne oil.

Countries are NOT hand‑set here. build_scenarios.py auto‑tags countries from the scenario text (and the generation hint). A bulk‑gen record (§16) may carry a countries: [...] hint, but the tagger is the source of truth.

2.2 The compiled, public objects (generated — never hand‑edit)

Output Path Built by
Cascade + ranked markets + guidance + countries web/public/data/scenarios/<id>.json (+ scenarios_index.json) build_scenarios.py
Measured historical evidence (event study) web/public/data/history/<id>.json (+ history_meta.json) build_history.py
The public page web/public/scenario/<slug>.html build_seo_pages.py
Probability path over time web/public/data/prob_history/<id>.json build_prob_history.py
Narrative / prob‑note / deviation / headline web/public/data/narratives.json the intelligence passes (§9)

3. The lifecycle at a glance

 (1) CONCEIVE ─► (2) SPECIFY ─► (3) ROOTS+CASCADE ─► (4) MEASURE HISTORY ─►
  source a real     write the      build_scenarios      build_history
  concern/episode   yaml record    (drivers→cascade)    (event study, evidence)
       │                                                        │
       ▼                                                        ▼
 (5) ENRICH ─► (6) QUALITY GATE ─► (7) PUBLISH ─► (8) VERIFY PROD ─► (9) MONITOR/CALIBRATE
  prob · dev      the finalize       build_seo_pages   clean‑URL +     coverage + Brier +
  · headline      POST‑CHECK ⛔      + git push        SHA checks      retire/update loop
  · guidance                                                              │
       └──────────────────────────── feeds the next conception ◄─────────┘

Each stage has a checklist. A scenario is DONE only when every ⛔ box in stages 6–8 is ticked.


4. Controlled vocabularies (copy‑paste reference)

⛔ Any value outside these sets will silently mis‑map (wrong sector), get skipped at build, or break the cascade. The generator (merge_ccar_scenarios.py) validates against these — honour them by hand too.

Timelines (use the EN‑DASH , not a hyphen -): 0–6 months · 6–18 months · 1–3 years · 3–10 years

Categories (the CCAR‑expansion set — preferred for new macro/stress scenarios; 23): asia_macro, europe_macro, emerging_macro, central_banks_fx, sovereign_fiscal, oil_gas, power_energy, metals_mining, agri_food_water, crypto_markets, crypto_infra, ai_compute, tech_cyber, automation_labor, greatpower_geo, regional_conflict, trade_sanctions, financial_plumbing, realestate_demographics, health_biotech, society_politics, climate_frontier, markets_corporate

Legacy scenarios use other category strings (e.g. geo_escalation). The real gate is: the category MUST resolve to a sector in bin/news_taxonomy.py (SECTORS / sector_for()), else the page won't map to a section front. When in doubt, use one of the 23 above. Pre‑check: §6.

Root‑shock factors (use ONLY these — 50): risk_appetite, credit_spreads, geopolitical_risk, financial_conditions, crypto_confidence, oil_supply_risk, industrial_demand, ai_capex, recession_signal, climate_supply, trade_tension, crypto_liquidity, china_growth, inflation_surprise, automation_displacement, dollar_confidence, fed_hawkishness, growth_surprise, food_inflation, european_energy, semiconductor_risk, EM_FX, defense_spend, carry_appetite, robot_productivity, VIX, BTC, real_yields, pandemic, XCU, fertilizer, NG, labor_shortage, inflation_expectations, consumer_discretionary, oil_demand, diesel, XAU, global_growth, mortgage_rates, china_stimulus, DXY, gasoline, XAG, WHEAT, risk_parity_delever, jet_fuel, CORN, curve_slope, labor_surplus

Sign convention (positive = the factor intensifies): risk_appetite − = risk‑off · recession_signal + = recession rising · credit_spreads + = wider · financial_conditions + = tighter · dollar_confidence − = USD doubts · china_growth − = China slowing · real_yields + = yields up · EM_FX − = EM currencies fall · oil_supply_risk + = supply threatened · inflation_surprise + = hotter · crypto_confidence/crypto_liquidity − = crypto stress · risk_parity_delever + = forced deleveraging.

Root recipes (start here, adapt magnitude to severity) — full list in docs/CCAR_SCENARIO_GENERATION_SPEC.md. Examples: - Recession / equity crash: {recession_signal:0.6, risk_appetite:-0.6, credit_spreads:0.5, financial_conditions:0.5} - Oil supply shock: {oil_supply_risk:0.8, inflation_surprise:0.5, geopolitical_risk:0.6} - Crypto / stablecoin run: {crypto_confidence:-0.8, crypto_liquidity:-0.7, BTC:-0.6} - EM sudden stop: {EM_FX:-0.8, credit_spreads:0.6, risk_appetite:-0.5, DXY:0.4}


5. Stage 1 — CONCEIVE

Goal: a scenario worth a person's attention — anchored to a real concern, not invented. Grounded in scenario‑planning practice (plausible, decision‑relevant, driver‑based) and intelligence analytic standards (ICD 203: sourced, assumptions explicit) — see §17 & §26 refs.

PRE‑CHECK — idea quality (tick all before writing the record): - [ ] It traces to a real anchor: a regulator/stress‑test concern (Fed CCAR/DFAST, BoE, EBA/ECB/ESRB, BoJ, APRA, OSFI, FINMA, PBoC, RBI, MAS, HKMA, IMF FSAP/GFSR, FSB, BIS, NGFS…) or a documented historical episode. (No anchor → not a scenario yet.) - [ ] It names a specific channel and magnitude/episode — not "markets fall." (e.g. "US office CRE crash," "Hormuz closure choking ~20% of seaborne oil"). - [ ] It is decision‑relevant: a reader could plausibly act on it (hedge, trim, wait, allocate). - [ ] It is distinct — not a near‑duplicate of an existing scenario. (Search the library / Scenario Lab first; the bulk merger dedups by normalised title, but check by hand for singletons.) - [ ] It is falsifiable: you can imagine the measured event study that would confirm or deny its cascade. - [ ] It maps to ≥1 HL‑coverage need OR adds genuine world coverage (a country/theme we're thin on). - [ ] Its driver(s) exist in the 50‑factor root vocabulary (§4) — if the true driver isn't expressible, the scenario may be out of the engine's competence (flag it, don't force a wrong root). - [ ] Assumptions are explicit (ICD 203): you can state the 1–2 load‑bearing assumptions in one line.


6. Stage 2 — SPECIFY (the record)

Goal: a clean scenarios.yaml record + its roots.

CHECKLIST: - [ ] Pick the next free id = max(existing)+1. Never reuse or renumber an id (renumbering nukes live URLs/slugs). - [ ] category ∈ allowed set and resolves in news_taxonomy (§4). - [ ] status: candidate (promote to keep only after the Quality Gate, §10). - [ ] timeline uses the en‑dash and is one of the four allowed values. - [ ] title = 2–6 words, concrete, reads as a headline. - [ ] scenario = one precise sentence: shock + trigger + magnitude/episode. - [ ] Add the roots entry (id → {probability, timeline, roots:{…}}) in a roots file: 2–5 factors, each signed per the convention (§4), magnitudes scaled to severity (use the recipes). - [ ] Probability is sober: severe‑tail 0.02–0.10; plausible‑cyclical 0.15–0.45; developing trend up to 0.85. (Calibration detail: §18.) - [ ] If bulk‑generated, the record validates against merge_ccar_scenarios.py (vocab + dedup). (See §16.)

A scenario with no roots is skipped at publish (no cascade, no page). Roots are mandatory.


7. Stage 3 — ROOTS & CASCADE (build_scenarios.py)

Goal: turn drivers into the full butterfly effect. build_scenarios.py maps the scenario → signed root shocks → runs the deep causal‑propagation engine (macroguru/cascade/propagation.py) → ranks impacted markets by decision‑relevance (|move| × confidence) → gates each to Hyperliquid (tradable ⇒ trade button) → builds the left‑to‑right mindmap (trigger → 1st → 2nd order) → writes the long/short/cash guidance.

PYTHONPATH=. .venv/bin/python bin/build_scenarios.py
# → web/public/data/scenarios/<id>.json  (+ scenarios_index.json)  +  enriched config/scenarios.yaml

CHECKLIST: - [ ] Build runs clean (no traceback); scenarios/<id>.json exists for the new id. - [ ] The cascade has immediate (1‑hop) and ripple (2+‑hop) nodes — not a single flat list. - [ ] Materiality floor did its job: no spurious tiny nodes cluttering the mindmap (the floor prunes them — confirm the kept nodes are economically sensible). - [ ] Impacted markets are ranked sensibly (the dominant market matches the thesis). - [ ] HL gating is right: tradable assets show a trade button; non‑tradable are shown without one. - [ ] Countries auto‑tagged correctly (spot‑check the countries in the compiled JSON vs the text). - [ ] Guidance (long/short/cash) generated and directionally coherent with the cascade (longs are the up‑moves, shorts the down‑moves, cash if net risk‑off).


8. Stage 4 — MEASURE HISTORY (build_history.py)

Goal: prove the cascade against reality. This is what separates us from opinion. The event study measures abnormal returns for every (scenario × asset) over the analogous historical events matched from config/historical_events.yaml, and records hit‑rate, sample size n, a confidence score, and whether the measured sign agrees with the projected cascade.

PYTHONPATH=. .venv/bin/python bin/build_history.py
# → web/public/data/history/<id>.json  (+ history_meta.json)

CHECKLIST: - [ ] history/<id>.json exists and contains events_used (≥1 analogue). ⛔ No events_used ⇒ the page will be noindex (thin) — see §10. - [ ] The analogues are genuinely analogous (right tags/theme), not coincidental keyword hits. - [ ] Each analogue has a real, working source_url (ICD 203 sourcing; YMYL trust). - [ ] The per‑asset table shows abnormal returns at 20d and 5d, hit‑rate, n, confidence. - [ ] Where the measured sign disagrees with the cascade, that's expected sometimes — it must be explained by a deviation thesis (§9), not hidden. - [ ] If analogues are too thin/stale (n tiny, all pre‑2010): either widen the event library (config/historical_events.yaml via seeds or the adversarial sourcing workflow → merge_history.py) or mark the scenario low‑conviction. (See §16.4 the evidence bottleneck.)

8.1 Adding the historical evidence (event library)

A new event in config/historical_events.yaml:

- id: b2-trade_sanctions-436        # stable unique id
  date: '1929-05-28'                # ISO; date_confidence captures fuzziness
  name: Smoot-Hawley clears the US House (protectionism signal)
  category: United States
  tags: [risk_off, trade_war, geopolitical]   # tags drive analogue matching
  summary: <one neutral sentence>
  source_url: https://…             # ⛔ must resolve
  source_name: en.wikipedia.org
  source: workflow                  # seed | workflow (adversarially verified)
  date_confidence: 0.8
  verified: true
  region: United States
  theme: trade_sanctions

9. Stage 5 — ENRICH (the intelligence passes)

Goal: a calibrated probability, an honest "why we may diverge," a human headline, and actionable guidance. These are the token‑costing monthly tier (bin/REFRESH_RUNBOOK.md). Run them for the new ids (or library‑wide).

Pass Command chain Writes Standard it serves
Probability (re)calibration prep_probs.py → 40‑agent Workflow → apply_probs.py scenario_overrides.yaml + narratives.json::prob_notes base‑rate anchoring (superforecasting)
Deviation thesis prep_deviation.py → Workflow → apply_deviation.py narratives.json::deviations analytic rigor / ICD 203
Headline (only when new scenarios were added) prep_headlines.py → Workflow → apply_headlines.py narratives.json::headlines E‑E‑A‑T / readability

CHECKLIST: - [ ] Probability is base‑rate‑anchored to n_analogues (frequent patterns higher; rare/tail lower) and well‑spread, not clustered. (Bands: §18.) - [ ] Each probability carries a ≤18‑word rationale (prob_notes) — never a bare number (ICD 203: pair the estimate with its basis). - [ ] Where cascade ≠ measured history on a material asset, a deviation thesis (≤34 words) names the failure mode: regime contamination · swamped channel · thin/stale sample · structural change · history‑wins — and says which side to trust. - [ ] Headline is a natural, grammatical "What if …?" (not the terse internal title). - [ ] ⛔ apply_headlines.py MERGES into the headlines dict — confirm it did not wipe the existing 1,200+ (it setdefault().update()s; never replace the dict). - [ ] Guidance (from build_scenarios) passes the emotional‑integrity standard (§19): it tells the reader what to do and steadies them.


10. Stage 6 — QUALITY GATE ⛔ (the finalize POST‑CHECK)

This is the gate the user asked for: "what we pre‑check and post‑check once the scenario is finalised." A scenario may be promoted to status: keep and published only when every box here is ticked. If any ⛔ fails, fix it or leave the scenario as candidate (it will render noindex/thin until fixed).

A. Integrity (the three invariants): - [ ] ⛔ Measured: history/<id>.json has events_used ≥ 1 with real sources; the per‑asset abnormal returns render. - [ ] ⛔ Calibrated: probability is a sober, base‑rate‑anchored prior with a one‑line rationale; not a forecast. - [ ] ⛔ Coverage: if the scenario introduces/relies on an HL asset, that asset is in coverage_report.json as covered (price/proxy + ≥1 scenario + ≥1 analogue). Run bin/coverage_report.py; gaps must not increase.

B. Content quality ("no weak proposition" — §17): - [ ] Distinct (no near‑duplicate); concrete channel + magnitude; precise one‑sentence scenario. - [ ] Cascade is economically coherent (signs make sense; ripple chain is real, not decorative). - [ ] Deviation thesis present wherever cascade and measured history disagree on a material asset. - [ ] No fabricated precise statistics; magnitudes realistic and consistent with the cited concern.

C. Emotional integrity (§19): - [ ] "What to do if this happens" renders with Long/Short/Cash guidance + a plain‑English common‑man line. - [ ] The tone steadies (it frames probability + horizon + what to watch), it does not induce panic.

D. SEO / trust / YMYL (§20): - [ ] Indexable test passes: probability and events_used both present (else it is correctly noindex). - [ ] Title/description are honest and specific; JSON‑LD (Article + Dataset + BreadcrumbList) is intact. - [ ] Sources are cited and resolve; the probabilistic‑future + not‑investment‑advice disclaimer is present.

E. Data quality (DAMA 6 dimensions — §17.1): - [ ] Accuracy (matches the cited reality) · Completeness (roots+cascade+history+guidance all present) · Consistency (signs/units agree across cascade, guidance, history) · Timeliness (probability reflects current regime) · Validity (vocab honoured) · Uniqueness (no duplicate id/title).


11. Stage 7 — PUBLISH (build_seo_pages.py → GitHub)

Goal: render the page + all hubs, run tests, ship via Git.

PYTHONPATH=. .venv/bin/python bin/build_seo_pages.py     # pages + hubs + sitemaps + stats + search_index
PYTHONPATH=. .venv/bin/python -m pytest -q               # ⛔ expect "227 passed" (or current count)
git add -A && git commit -m "…" && git push origin main  # ⛔ DEPLOY = git push (auto-builds on Vercel)

CHECKLIST: - [ ] build_seo_pages.py ran clean; the new web/public/scenario/<slug>.html exists. - [ ] data/stats.json count went up by the number added (counts self‑update; never hardcode a count anywhere — counts.js + [data-mg-count] fill them). - [ ] data/search_index.json includes the new scenario (with c=kicker, g=sector). - [ ] Sitemaps regenerated; the new indexable page is in sitemap-scenarios.xml. - [ ] ⛔ Tests pass (pytest -q). - [ ] ⛔ Deploy via git push origin main — the Vercel project public (rootDirectory web/public, branch main) auto‑builds server‑side. Do NOT use vercel deploy --archive=tgz (it re‑uploads the whole site and exhausts the free upload quota — api-upload-free 429). The refresh_scenarios.py docstring still says "archive deploy" — that is stale; ignore it. - [ ] Commit only the intended files; do not sweep in unrelated periodic‑refresh data churn (charts/history) unless that's the point of the commit.


12. Stage 8 — VERIFY ON PROD ⛔

Goal: confirm it's actually live and correct — not just deployed. Prod alias: https://macroguru.app.

CHECKLIST: - [ ] Poll the deploy by commit SHA (the first READY you see is often the previous deploy). vercel ls public --yesvercel inspect <building-url> until Ready. - [ ] ⛔ Fetch the clean URL (cleanUrls strips .html): curl -sL https://…/scenario/<slug>not …/scenario/<slug>.html (that returns a ~15‑byte redirect stub and looks broken). For static assets, byte‑compare (curl … | wc -c vs local) to confirm freshness. - [ ] The page shows: oddsbar (prob + crowd), markets table, "What to do if this happens," "Historical precedent" table, related scenarios. - [ ] Search (type a token) returns the scenario with a kicker + highlighted match + prob pill. - [ ] Mobile: at 390px and 360px the page has zero horizontal overflow (scrollWidth == viewport) and no overlap. (Mobile standards: §21.) - [ ] No console errors; the price chart line renders (the --line var fallback is present).


13. Stage 9 — MONITOR, CALIBRATE & RETIRE (the living scenario)

A scenario is never "done forever" — it is scored against reality and updated. This is the self‑improving loop (docs/SELF_IMPROVING_ENGINE.md, stages D/E/G).

CHECKLIST (ongoing): - [ ] Coverage stays green: bin/coverage_report.pycoverage_report.json shows no new gaps after the add. - [ ] Calibration is watched: calibration.json (reliability curve by confidence band, corroboration score) — high‑probability scenarios should fire more often than low‑probability ones; mis‑calibrated buckets get nudged toward the realized base rate (apply_probs next cycle). - [ ] Prob‑history updates: bin/build_prob_history.py redraws how the probability moved + the events that moved it. - [ ] Fires are logged: when a matched real event fires, score the realized abnormal returns vs the projected cascade (sign hit‑rate); a systematic miss → re‑map roots + fresh deviation thesis. - [ ] Retire/deactivate, never delete: an obsolete scenario is set status: drop (or an asset deactivated) — history is preserved for future re‑use (invariant).


14. Sub‑process A — Adding a tradable asset/market

"All the pointers related to adding the market in the product." Triggered when a new market matters (a new Hyperliquid listing, or a market we want charts/evidence for). Goal: no tradable left behind.

CHECKLIST: - [ ] Classify the asset (crypto / commodity / tokenized‑equity / FX / index / rate / vol). - [ ] Price source + ticker map: register it in the price layer (macroguru price map / ASSET_YF) and, for the chart, add it to bin/build_asset_charts.py's set; run it → web/public/data/charts/<tk>.json (+ charts_index.json). - [ ] Short‑history proxy: if the asset has < ~1y of data, map a long‑history peer/proxy (new L2 → ETH/SOL beta; new gold product → XAU) so the event study has something to measure. - [ ] HL alias: add the HL symbol alias in hl_universe.TICKER_ALIASES (e.g. our XAU ↔ HL PAXG). - [ ] Scenario inclusion: for every existing scenario, does the asset's class intersect the scenario's root shocks? If yes it inherits the cascade automatically (it becomes a new leaf in cascade/propagation.py). - [ ] Evidence: run build_history.py so the asset gets measured abnormal returns across the matched analogues. - [ ] Net‑new asset‑specific scenarios: generate the scenarios the asset makes relevant that the library lacks (e.g. a new staking token → "staking‑yield collapse"). Run them through §5–§11. - [ ] Chart var sanity: confirm the price line renders on news.css pages (the --line fallback). Click‑to‑chart is gated to charted tickers. - [ ] ⛔ Coverage assertion: coverage_report.py shows the asset as covered (price/proxy + ≥1 scenario + ≥1 analogue). A remaining gap fails the loop — fix the price source or proxy (e.g. the known HYPE gap → wire HL's candle API / CoinGecko). - [ ] Rebuild + publish (§11) + verify (§12).


15. Sub‑process B — Adding a Week‑Ahead catalyst playbook

For scheduled catalysts (NFP, CPI, FOMC, OPEC+, options expiry, Jackson Hole, shutdown, elections…). Content lives in bin/catalyst_playbook.py; pages render at /week-ahead/<type>.

CHECKLIST: - [ ] Add a PLAYBOOK[<ctype>] entry: slug, label, sector, kicker, blurb, watch, poly_q, kalshi_q, and 3 outcomes. - [ ] Each outcome has: name, prob (base rates summing ≈ 1 across the three), tag (risk-on|risk-off|mixed), a thesis, a cascade (_imm/_rip nodes: asset, signed direction, magnitude prior, one‑line mechanism), and a guide (stance, long, short, cash?, common). - [ ] Magnitudes are reaction‑function priors grounded in published cross‑asset consensus, tuned to the current regime — labelled as priors (not measured returns). - [ ] Add a CALENDAR row in bin/build_upcoming.py with the matching ctype. - [ ] Run build_upcoming.py then build_seo_pages.py; verify /week-ahead/<slug> renders 3 outcomes + cascade + "what to do," and the landing card links to it. - [ ] Mobile + verify on prod (§12).


15A. Sub‑process C — The weekly "Solid chance of happening" predictions

The top section of the landing page (index.html) — a small, curated list of the highest‑probability, falsifiable macro calls for the next ~7 days, blended from the scheduled calendar, analyst consensus, prediction‑market crowd odds, the verified world‑state, and the engine's own scenarios. It is the product's sharpest expression of the mission: peek into the coming week, with calibrated odds, not hype.

Data flow: config/predictions_week.yaml (the research‑authored weekly seed) → bin/build_predictions.py (matches each call to the nearest library scenario for a deep‑dive link, resolves asset tickers→names, validates the honesty contract, writes web/public/data/predictions.json) → web/public/predictions.js renders the cards into #solid-predictions. build_predictions.py runs automatically at the tail of build_seo_pages.py (so the scenario deep‑dive links stay synced to fresh slugs).

Four surfaces, one feed (predictions.json): (1) the landing section (index.html via predictions.js); (2) the /news lead band (server‑rendered render_solid_predictions()); (3) the site‑wide ribbon (web/public/predictions-ribbon.js — a STATIC, dismissible, theme‑aware strip of the top calls injected on every page after the breaking‑bar; no auto‑scroll per WCAG 2.2.2; dismissal persists per week_of; self‑suppresses on /predictions); and (4) the dedicated /predictions page (render_predictions_page()predictions.html) — the full week with per‑card "≈ N in 100", ours‑vs‑crowd edge, the fixed resolution rule (joined from predictions_log.json), Article+ItemList JSON‑LD, an evergreen dateless URL re‑rendered in place. All four refresh from the same weekly rebuild — no extra steps.

Honesty contract (non‑negotiable): every prediction ships with a calibrated probability, an explicit basis (scheduled | consensus | base_rate | seasonality | crowd | regime), and real source URLs. "scheduled" = the event WILL occur (~certain); the outcome is the prediction — never conflate them. The section header says "Calibrated odds, not advice." Calibrated, never clairvoyant.

WEEKLY REFRESH CHECKLIST (run every Sunday/Monday for the new week): - [ ] Re‑run the 4 research streams (macro calendar + consensus · geopolitics/energy · markets/crypto/corporate · prediction‑market odds) for the new Mon–Sun window; verify dates against ≥2 sources; capture the current regime + verified world‑state. (See the deep‑research prompts in the session log / §27.) - [ ] Rewrite config/predictions_week.yaml: week_of, window, as_of, regime, and ~6–9 predictions. Each needs: title, claim (falsifiable), probability (0–1, calibrated), optional crowd_probability, basis, timeline (the date), category (a VALID_CAT), what, assets (engine tickers + up|down|flat), recommendation, optional edge, a match hint (keywords → scenario), and ≥1 sources. - [ ] Keep the mix honest + diverse: a few scheduled‑certain anchors, the highest‑conviction outcome calls, and 1–2 genuine edges (a seasonality base rate, an OURS‑vs‑crowd gap). Don't pad with filler. - [ ] Write every claim / recommendation / edge in the MacroGuru voice (docs/VOICE.md): lead with the call, the number is the confidence, one‑line caveat, cut the process‑meta. No "graded against / according to / we track whether". B2C — sharp, not chatty. - [ ] Run PYTHONPATH=. .venv/bin/python bin/build_predictions.py → confirm every call links to a sensible scenario (tighten the match hint if a link is off‑theme/direction). - [ ] Rebuild the site (build_seo_pages.py re‑runs predictions automatically) → verify the landing section, the ribbon (top of every page; dismiss re‑shows next week), and the /predictions page all render, are mobile‑clean, and links resolve. Commit + deploy + verify on prod (§12).


15B. Sub‑process D — Resolving predictions & the Reality Check (the metric that matters most)

"How close to reality are we?" is the only test our work is ultimately judged on. Every published prediction is logged with a fixed, source‑tied resolution rule and scored against what actually happens — wins and losses, in public, at /reality-check. This is the accountability spine of the product. ⛔ Never cherry‑pick: once a call is published it stays on the record, win or lose.

Data flow: each weekly prediction (with resolves_on, resolution_criteria, scheduled_certain) auto‑syncs from config/predictions_week.yaml into the append‑only ledger config/predictions_log.json via bin/build_predictions.py (open entries; it NEVER overwrites a human‑set resolution). bin/build_scorecard.py scores the ledger → web/public/data/scorecard.json → the /reality-check page renders it. Both run at the tail of build_seo_pages.py. Methodology = proper scoring rules (one‑sided Brier in [0,1], Murphy reliability/resolution decomposition, Brier‑skill‑score vs the base rate AND vs the prediction‑market crowd, calibration‑by‑bucket with Wilson bands), per Tetlock/GJP + Metaculus practice.

WEEKLY RESOLUTION CHECKLIST (run every Monday for the week that just ended): - [ ] For each now‑past forecast in config/predictions_log.json, research what ACTUALLY happened against its pre‑registered resolution_criteria (verify with ≥2 sources). - [ ] Set status: "resolved", outcome: true|false (or partial), resolved_on, and evidence: {text, url}. If the premise was voided/ambiguous, set status: "annulled" (counts for nobody, stays visible). - [ ] ⛔ Do NOT edit probability or the claim after publication — the forecast is frozen at publication. Only add the resolution fields. - [ ] Scheduled‑certain calls (scheduled_certain: true) stay in the ledger but are excluded from the skill Brier/BSS — don't let calendar gimmes inflate the headline. - [ ] Run bin/build_scorecard.py (or any build_seo_pages.py); confirm the Brier, calibration curve, and our‑vs‑crowd edge update, and the resolved calls show ✓/✗ with their evidence. Commit + deploy + verify (§12). - [ ] Sanity: report counts + a confidence caveat while n is small; never headline a Brier off <~15 resolved calls.


16. Scaling to 100,000

The path from ~5k → 100k is industrialised generation + the same per‑scenario gates on every record. Volume never lowers the bar.

16.1 The lane method (proven for 1,200 → 5,141)

16.2 Merge (validate + dedup)

.venv/bin/python bin/merge_ccar_scenarios.py    # validates vocab, dedups by normalised title, fresh ids, appends

16.3 Then the standard pipeline on the whole batch

16.4 The evidence bottleneck (the real limit on indexable scale) ⛔

A scenario is only indexable (first‑class, public, ranked) when it has measured events_used. So 100k indexable scenarios requires the event library to scale with them: - [ ] Grow config/historical_events.yaml (currently ~993) via the adversarial event‑sourcing workflowmerge_history.py, so every new lane has real analogues to match. - [ ] Ensure new scenarios' tags/themes overlap the event library so build_history.py finds analogues (a scenario with no matchable analogue stays noindex — that's correct, not a bug). - [ ] Track the indexable ratio in stats.json (indexable / scenarios). Driving that ratio up is the work of scaling — raw count without evidence is vanity.

16.5 Global coverage discipline (so it's "from all over the world")

16.6 The frontier-of-human-development axis (good AND bad)

Coverage is not only geographies and crises. The second coverage axis is the frontier of our species — the developments that re‑shape how we live: medicine, AI, technology, science journals, patents & discoveries, biology, nature. This is the home of invariant #4. The full framework (lanes, sources, vocab, ontology, the good/bad rule, worked examples) lives in docs/FRONTIER_SCENARIO_SPEC.md — read it before commissioning a frontier lane. The operational checklist:


17. The "no weak proposition" rubric

A scenario is strong only if it passes all of these (synthesis of scenario‑planning, superforecasting, ICD 203, and DAMA data‑quality consensus — §27):

17.1 Data‑quality dimensions (DAMA 6) mapped to a scenario

Dimension For a scenario, it means
Accuracy impacts/sources match cited reality
Completeness roots + cascade + history + guidance + probability all present
Consistency signs/units agree across cascade ↔ guidance ↔ measured history
Timeliness probability reflects the current regime; prices/history are fresh
Validity category/timeline/roots honour the controlled vocab
Uniqueness unique id; no duplicate title

18. Probability & calibration standards

18.1 Derive BOTH numbers from history, then track the variance (the accountability spine)

Full contract: docs/CALIBRATION_METHODOLOGY.md. Engine: macroguru/calibration/derive.py. Neither the probability nor the per‑asset impact % is asserted any more — each is built from a reference class, the build is recorded so it can be audited, and after the fact it is scored against reality.

18.2 Sub‑process E — the weekly recalibration (learning loop)

Run bin/recalibrate.py (auto‑runs at the tail of build_seo_pages.py) → recalibration.json. It proposes (never auto‑applies): 1. Impact bias offset — if magnitude bias is materially ≠ 0, apply −bias to published moves. 2. Confidence recalibration — if directional accuracy doesn't rise with the confidence score, down‑weight/re‑fit it. 3. Probability — once forecasts resolve, fit a Platt recalibration on the resolved set; and review the scenarios flagged where the history‑derived probability diverges ≥15pts from the assigned prior.

Apply the accepted proposals via scenario_overrides.yaml / propagation.py (mirrors the apply_probs flow), then rebuild. Each cycle closes the gap to reality a little more.


19. The emotional‑integrity standard (the product purpose)

We exist to help people peek into the future, manage their emotions, and decide better. Behavioral‑finance research is explicit about the failure modes we must counter: loss aversion, recency bias, and panic selling driven by fear of immediate loss. So every published scenario must:

Litmus test: would this scenario make an anxious reader calmer and better‑prepared, or just more scared? If the latter, it fails — add the guidance and the framing.


20. SEO / E‑E‑A‑T / YMYL standards

Finance is YMYL ("Your Money or Your Life") — Google holds it to the strictest E‑E‑A‑T bar, and so do we.


21. Accessibility & mobile standards

Every page (scenario, hub, week‑ahead, landing, legacy) must pass: - [ ] Zero horizontal overflow at 390px and 360px (scrollWidth == viewport). - [ ] Touch targets ≥ 44px (Apple HIG) / ≥ 24px min (WCAG 2.5.8); inputs ≥ 16px (no iOS zoom‑on‑focus). - [ ] Tables scroll inside their own container (comparison tables) or stack (content tables) — never push the page. - [ ] Marquees/animation pause on hover/focus + honour prefers-reduced-motion (WCAG 2.2.2). - [ ] No sticky element covers content on mobile. - [ ] Verify with cache‑busted CSS or on prod (the local preview caches CSS/JS across reloads).


22. Failure modes & gotchas (hard‑won)


23. Roles, cadence & ownership

Cadence What runs Mechanism Cost
5 min alert bus, hypothesis monitor, live‑monitor feed event_tick.py / reactive_engine.py (launchd) free
Daily prices → history → cascades → coverage → calibration → deploy refresh_scenarios.py (launchd) free
Monthly / on‑trigger probability recalibration, deviation theses, headlines (new ids), deep event sourcing, bulk scenario generation bin/REFRESH_RUNBOOK.md (ops/com.macroguru.intel.plist) ~5–10M tokens
On new HL listing new‑asset deep round (§14) event‑triggered fast‑path bounded tokens

24. Master checklist (tear‑off)

One scenario, idea → live. (Bulk: do §16 first, then this per record.)

PRE‑CHECK (idea)
[ ] real anchor (regulator concern / historical episode)
[ ] specific channel + magnitude; distinct; decision‑relevant; falsifiable
[ ] driver expressible in the 50‑factor root vocab; assumptions explicit

SPECIFY
[ ] next free id (never reuse) · category ∈ taxonomy · status: candidate
[ ] timeline en‑dash · title 2–6 words · one precise sentence
[ ] roots entry: 2–5 signed factors · sober probability

BUILD
[ ] build_scenarios.py  → scenarios/<id>.json (cascade imm+ripple, markets ranked, HL gated, countries, guidance)
[ ] build_history.py    → history/<id>.json with events_used ≥1, real sources, 20d/5d abnormal returns

ENRICH (monthly tier)
[ ] probability recalibrated + ≤18‑word rationale
[ ] deviation thesis where cascade ≠ measured history
[ ] headline "What if …?" (apply_headlines MERGES)

QUALITY GATE ⛔ (finalize)
[ ] Integrity: measured ✓ calibrated ✓ coverage (no new gap) ✓
[ ] Content: distinct, concrete, coherent cascade, deviation explained
[ ] Emotional: what‑to‑do + common‑man + steadying frame
[ ] SEO/YMYL: indexable (prob+events_used), sources resolve, JSON‑LD, disclaimer
[ ] DQ: accuracy·completeness·consistency·timeliness·validity·uniqueness
[ ] promote status: candidate → keep

PUBLISH
[ ] build_seo_pages.py · stats.json count up · search_index includes it · sitemap updated
[ ] pytest -q green
[ ] git push origin main   (NOT archive=tgz)

VERIFY PROD ⛔
[ ] poll deploy by SHA → READY
[ ] curl clean URL (…/scenario/<slug>, with -L) shows oddsbar + markets + what‑to‑do + historical precedent
[ ] search returns it; mobile 390/360 zero overflow

MONITOR
[ ] coverage_report green · calibration watched · prob_history drawn · retire = status:drop (never delete)

25. Glossary


26. Maintaining this document

Companion docs (read alongside this one): SELF_IMPROVING_ENGINE.md (the master loop) · HISTORICAL_ENGINE.md (event‑study mechanics) · CCAR_SCENARIO_GENERATION_SPEC.md (the generation contract) · CCAR_STRESS_TEST_LANDSCAPE.md (the world's stress‑test bodies = the lane map) · ../bin/REFRESH_RUNBOOK.md (the monthly intelligence refresh) · the rendered operator docs DATA_QUALITY.md & HONEST_LIMITS.md.


27. References (external consensus)

The standards above are grounded in published best practice, not invented:


End of SOP. If you followed every checkbox, the scenario is measured, calibrated, covered, emotionally honest, indexable, accessible, and live — and it does the one thing we exist for: help someone meet the future calmly and decide better. Now do it 100,000 times.