This is the canonical operating document for adding anything to the MacroGuru library — a what‑if scenario, a tradable market/asset, or a Week‑Ahead catalyst playbook. It is written so that the author (Vikas), Claude, and anyone who takes over the product can follow it without guessing. Every actionable step is a checkbox. If a step is not checked, the item is not done — do not assume, do not skip.
Last updated: 2026‑06‑28 · Owner: Vikas · Status: LIVE / authoritative. Keep this file in sync with the code — see §26 Maintaining this document.
Conventions in this doc: ✅ = built/required · 🔶 = partially built / manual today · ⛔ = hard stop (do not publish if this fails) · code = exact file/command. All Python runs as PYTHONPATH=. .venv/bin/python … from the repo root /Users/bloquelabs/learn/MacroGuruStrategy.
The mission. Capture 100,000 scenarios from all over the world and make each one do exactly one job: help a person peek into the future, manage their emotions, and make a better decision. Every scenario is a small, falsifiable, emotionally‑intelligent rehearsal of a possible future — not a prediction, not hype, not filler.
Three things make a MacroGuru scenario different from a blog post or a tweet, and they are invariants — they may never be broken to hit a number:
AR = r − (α + β·r_SPX)) measured over real analogous historical events. No number in the product is
unfalsifiable. (Source of truth: docs/HISTORICAL_ENGINE.md.)coverage_report.json) fails loudly on a gap. (Source: docs/SELF_IMPROVING_ENGINE.md.)docs/FRONTIER_SCENARIO_SPEC.md; operational steps in §16.6.)The emotional contract (the reason we exist). A scenario that scares without guiding is a failure. Every published scenario must answer "so what do I do, and how do I not panic?" — see §19 Emotional‑integrity standard.
A scenario lives in three places, in this order of authority:
| Layer | File | What it holds | Who writes it |
|---|---|---|---|
| Source record | config/scenarios.yaml |
the human‑authored truth | you / merge_ccar_scenarios.py |
| Roots (drivers) | config/scenario_roots_ccar.yaml, config/scenario_roots_batch3.yaml |
id → {probability, timeline, roots:{factor:signed}} |
generator / by hand |
| Overrides (applied LAST) | config/scenario_overrides.yaml |
id → {probability, [roots]} recalibrations |
bin/apply_probs.py (machine‑managed) |
scenarios.yaml record (exact fields)- id: 1 # int, unique, never reused (deactivate, never delete — invariant)
category: geo_escalation # MUST exist in bin/news_taxonomy.py SECTORS map (see §4)
status: candidate # candidate = awaiting review · keep = published · drop = excluded
probability: 0.06 # float 0..1 (curated prior; overrides win at build time)
timeline: 0–6 months # EN‑DASH "–", one of the four allowed (§4)
title: Hormuz closure # 2–6 words, concrete, headline‑grade (no "markets fall")
scenario: > # ONE precise sentence: shock + trigger + a real magnitude/episode
Iran closes the Strait of Hormuz after a naval clash with US forces, choking ~20% of seaborne oil.
Countries are NOT hand‑set here.
build_scenarios.pyauto‑tags countries from the scenario text (and the generation hint). A bulk‑gen record (§16) may carry acountries: [...]hint, but the tagger is the source of truth.
| Output | Path | Built by |
|---|---|---|
| Cascade + ranked markets + guidance + countries | web/public/data/scenarios/<id>.json (+ scenarios_index.json) |
build_scenarios.py |
| Measured historical evidence (event study) | web/public/data/history/<id>.json (+ history_meta.json) |
build_history.py |
| The public page | web/public/scenario/<slug>.html |
build_seo_pages.py |
| Probability path over time | web/public/data/prob_history/<id>.json |
build_prob_history.py |
| Narrative / prob‑note / deviation / headline | web/public/data/narratives.json |
the intelligence passes (§9) |
(1) CONCEIVE ─► (2) SPECIFY ─► (3) ROOTS+CASCADE ─► (4) MEASURE HISTORY ─►
source a real write the build_scenarios build_history
concern/episode yaml record (drivers→cascade) (event study, evidence)
│ │
▼ ▼
(5) ENRICH ─► (6) QUALITY GATE ─► (7) PUBLISH ─► (8) VERIFY PROD ─► (9) MONITOR/CALIBRATE
prob · dev the finalize build_seo_pages clean‑URL + coverage + Brier +
· headline POST‑CHECK ⛔ + git push SHA checks retire/update loop
· guidance │
└──────────────────────────── feeds the next conception ◄─────────┘
Each stage has a checklist. A scenario is DONE only when every ⛔ box in stages 6–8 is ticked.
⛔ Any value outside these sets will silently mis‑map (wrong sector), get skipped at build, or break the cascade. The generator (
merge_ccar_scenarios.py) validates against these — honour them by hand too.
Timelines (use the EN‑DASH –, not a hyphen -):
0–6 months · 6–18 months · 1–3 years · 3–10 years
Categories (the CCAR‑expansion set — preferred for new macro/stress scenarios; 23):
asia_macro, europe_macro, emerging_macro, central_banks_fx, sovereign_fiscal, oil_gas, power_energy, metals_mining, agri_food_water, crypto_markets, crypto_infra, ai_compute, tech_cyber, automation_labor, greatpower_geo, regional_conflict, trade_sanctions, financial_plumbing, realestate_demographics, health_biotech, society_politics, climate_frontier, markets_corporate
Legacy scenarios use other category strings (e.g.
geo_escalation). The real gate is: the category MUST resolve to a sector inbin/news_taxonomy.py(SECTORS/sector_for()), else the page won't map to a section front. When in doubt, use one of the 23 above. Pre‑check: §6.
Root‑shock factors (use ONLY these — 50):
risk_appetite, credit_spreads, geopolitical_risk, financial_conditions, crypto_confidence, oil_supply_risk, industrial_demand, ai_capex, recession_signal, climate_supply, trade_tension, crypto_liquidity, china_growth, inflation_surprise, automation_displacement, dollar_confidence, fed_hawkishness, growth_surprise, food_inflation, european_energy, semiconductor_risk, EM_FX, defense_spend, carry_appetite, robot_productivity, VIX, BTC, real_yields, pandemic, XCU, fertilizer, NG, labor_shortage, inflation_expectations, consumer_discretionary, oil_demand, diesel, XAU, global_growth, mortgage_rates, china_stimulus, DXY, gasoline, XAG, WHEAT, risk_parity_delever, jet_fuel, CORN, curve_slope, labor_surplus
Sign convention (positive = the factor intensifies):
risk_appetite − = risk‑off · recession_signal + = recession rising · credit_spreads + = wider ·
financial_conditions + = tighter · dollar_confidence − = USD doubts · china_growth − = China slowing ·
real_yields + = yields up · EM_FX − = EM currencies fall · oil_supply_risk + = supply threatened ·
inflation_surprise + = hotter · crypto_confidence/crypto_liquidity − = crypto stress ·
risk_parity_delever + = forced deleveraging.
Root recipes (start here, adapt magnitude to severity) — full list in
docs/CCAR_SCENARIO_GENERATION_SPEC.md. Examples:
- Recession / equity crash: {recession_signal:0.6, risk_appetite:-0.6, credit_spreads:0.5, financial_conditions:0.5}
- Oil supply shock: {oil_supply_risk:0.8, inflation_surprise:0.5, geopolitical_risk:0.6}
- Crypto / stablecoin run: {crypto_confidence:-0.8, crypto_liquidity:-0.7, BTC:-0.6}
- EM sudden stop: {EM_FX:-0.8, credit_spreads:0.6, risk_appetite:-0.5, DXY:0.4}
Goal: a scenario worth a person's attention — anchored to a real concern, not invented. Grounded in scenario‑planning practice (plausible, decision‑relevant, driver‑based) and intelligence analytic standards (ICD 203: sourced, assumptions explicit) — see §17 & §26 refs.
PRE‑CHECK — idea quality (tick all before writing the record): - [ ] It traces to a real anchor: a regulator/stress‑test concern (Fed CCAR/DFAST, BoE, EBA/ECB/ESRB, BoJ, APRA, OSFI, FINMA, PBoC, RBI, MAS, HKMA, IMF FSAP/GFSR, FSB, BIS, NGFS…) or a documented historical episode. (No anchor → not a scenario yet.) - [ ] It names a specific channel and magnitude/episode — not "markets fall." (e.g. "US office CRE crash," "Hormuz closure choking ~20% of seaborne oil"). - [ ] It is decision‑relevant: a reader could plausibly act on it (hedge, trim, wait, allocate). - [ ] It is distinct — not a near‑duplicate of an existing scenario. (Search the library / Scenario Lab first; the bulk merger dedups by normalised title, but check by hand for singletons.) - [ ] It is falsifiable: you can imagine the measured event study that would confirm or deny its cascade. - [ ] It maps to ≥1 HL‑coverage need OR adds genuine world coverage (a country/theme we're thin on). - [ ] Its driver(s) exist in the 50‑factor root vocabulary (§4) — if the true driver isn't expressible, the scenario may be out of the engine's competence (flag it, don't force a wrong root). - [ ] Assumptions are explicit (ICD 203): you can state the 1–2 load‑bearing assumptions in one line.
Goal: a clean scenarios.yaml record + its roots.
CHECKLIST:
- [ ] Pick the next free id = max(existing)+1. Never reuse or renumber an id (renumbering nukes live URLs/slugs).
- [ ] category ∈ allowed set and resolves in news_taxonomy (§4).
- [ ] status: candidate (promote to keep only after the Quality Gate, §10).
- [ ] timeline uses the en‑dash and is one of the four allowed values.
- [ ] title = 2–6 words, concrete, reads as a headline.
- [ ] scenario = one precise sentence: shock + trigger + magnitude/episode.
- [ ] Add the roots entry (id → {probability, timeline, roots:{…}}) in a roots file: 2–5 factors, each signed per the convention (§4), magnitudes scaled to severity (use the recipes).
- [ ] Probability is sober: severe‑tail 0.02–0.10; plausible‑cyclical 0.15–0.45; developing trend up to 0.85. (Calibration detail: §18.)
- [ ] If bulk‑generated, the record validates against merge_ccar_scenarios.py (vocab + dedup). (See §16.)
⛔ A scenario with no roots is skipped at publish (no cascade, no page). Roots are mandatory.
build_scenarios.py)Goal: turn drivers into the full butterfly effect. build_scenarios.py maps the scenario → signed root shocks
→ runs the deep causal‑propagation engine (macroguru/cascade/propagation.py) → ranks impacted markets by
decision‑relevance (|move| × confidence) → gates each to Hyperliquid (tradable ⇒ trade button) → builds the
left‑to‑right mindmap (trigger → 1st → 2nd order) → writes the long/short/cash guidance.
PYTHONPATH=. .venv/bin/python bin/build_scenarios.py
# → web/public/data/scenarios/<id>.json (+ scenarios_index.json) + enriched config/scenarios.yaml
CHECKLIST:
- [ ] Build runs clean (no traceback); scenarios/<id>.json exists for the new id.
- [ ] The cascade has immediate (1‑hop) and ripple (2+‑hop) nodes — not a single flat list.
- [ ] Materiality floor did its job: no spurious tiny nodes cluttering the mindmap (the floor prunes them — confirm the kept nodes are economically sensible).
- [ ] Impacted markets are ranked sensibly (the dominant market matches the thesis).
- [ ] HL gating is right: tradable assets show a trade button; non‑tradable are shown without one.
- [ ] Countries auto‑tagged correctly (spot‑check the countries in the compiled JSON vs the text).
- [ ] Guidance (long/short/cash) generated and directionally coherent with the cascade (longs are the up‑moves, shorts the down‑moves, cash if net risk‑off).
build_history.py)Goal: prove the cascade against reality. This is what separates us from opinion. The event study measures
abnormal returns for every (scenario × asset) over the analogous historical events matched from
config/historical_events.yaml, and records hit‑rate, sample size n, a confidence score, and whether the
measured sign agrees with the projected cascade.
PYTHONPATH=. .venv/bin/python bin/build_history.py
# → web/public/data/history/<id>.json (+ history_meta.json)
CHECKLIST:
- [ ] history/<id>.json exists and contains events_used (≥1 analogue). ⛔ No events_used ⇒ the page will be noindex (thin) — see §10.
- [ ] The analogues are genuinely analogous (right tags/theme), not coincidental keyword hits.
- [ ] Each analogue has a real, working source_url (ICD 203 sourcing; YMYL trust).
- [ ] The per‑asset table shows abnormal returns at 20d and 5d, hit‑rate, n, confidence.
- [ ] Where the measured sign disagrees with the cascade, that's expected sometimes — it must be explained by a deviation thesis (§9), not hidden.
- [ ] If analogues are too thin/stale (n tiny, all pre‑2010): either widen the event library (config/historical_events.yaml via seeds or the adversarial sourcing workflow → merge_history.py) or mark the scenario low‑conviction. (See §16.4 the evidence bottleneck.)
A new event in config/historical_events.yaml:
- id: b2-trade_sanctions-436 # stable unique id
date: '1929-05-28' # ISO; date_confidence captures fuzziness
name: Smoot-Hawley clears the US House (protectionism signal)
category: United States
tags: [risk_off, trade_war, geopolitical] # tags drive analogue matching
summary: <one neutral sentence>
source_url: https://… # ⛔ must resolve
source_name: en.wikipedia.org
source: workflow # seed | workflow (adversarially verified)
date_confidence: 0.8
verified: true
region: United States
theme: trade_sanctions
bin/merge_history.py (de‑dup/normalise) before build_history.py.Goal: a calibrated probability, an honest "why we may diverge," a human headline, and actionable guidance.
These are the token‑costing monthly tier (bin/REFRESH_RUNBOOK.md). Run them for the new ids (or library‑wide).
| Pass | Command chain | Writes | Standard it serves |
|---|---|---|---|
| Probability (re)calibration | prep_probs.py → 40‑agent Workflow → apply_probs.py |
scenario_overrides.yaml + narratives.json::prob_notes |
base‑rate anchoring (superforecasting) |
| Deviation thesis | prep_deviation.py → Workflow → apply_deviation.py |
narratives.json::deviations |
analytic rigor / ICD 203 |
| Headline (only when new scenarios were added) | prep_headlines.py → Workflow → apply_headlines.py |
narratives.json::headlines |
E‑E‑A‑T / readability |
CHECKLIST:
- [ ] Probability is base‑rate‑anchored to n_analogues (frequent patterns higher; rare/tail lower) and well‑spread, not clustered. (Bands: §18.)
- [ ] Each probability carries a ≤18‑word rationale (prob_notes) — never a bare number (ICD 203: pair the estimate with its basis).
- [ ] Where cascade ≠ measured history on a material asset, a deviation thesis (≤34 words) names the failure mode: regime contamination · swamped channel · thin/stale sample · structural change · history‑wins — and says which side to trust.
- [ ] Headline is a natural, grammatical "What if …?" (not the terse internal title).
- [ ] ⛔ apply_headlines.py MERGES into the headlines dict — confirm it did not wipe the existing 1,200+ (it setdefault().update()s; never replace the dict).
- [ ] Guidance (from build_scenarios) passes the emotional‑integrity standard (§19): it tells the reader what to do and steadies them.
This is the gate the user asked for: "what we pre‑check and post‑check once the scenario is finalised." A scenario may be promoted to
status: keepand published only when every box here is ticked. If any ⛔ fails, fix it or leave the scenario ascandidate(it will rendernoindex/thin until fixed).
A. Integrity (the three invariants):
- [ ] ⛔ Measured: history/<id>.json has events_used ≥ 1 with real sources; the per‑asset abnormal returns render.
- [ ] ⛔ Calibrated: probability is a sober, base‑rate‑anchored prior with a one‑line rationale; not a forecast.
- [ ] ⛔ Coverage: if the scenario introduces/relies on an HL asset, that asset is in coverage_report.json as covered (price/proxy + ≥1 scenario + ≥1 analogue). Run bin/coverage_report.py; gaps must not increase.
B. Content quality ("no weak proposition" — §17): - [ ] Distinct (no near‑duplicate); concrete channel + magnitude; precise one‑sentence scenario. - [ ] Cascade is economically coherent (signs make sense; ripple chain is real, not decorative). - [ ] Deviation thesis present wherever cascade and measured history disagree on a material asset. - [ ] No fabricated precise statistics; magnitudes realistic and consistent with the cited concern.
C. Emotional integrity (§19): - [ ] "What to do if this happens" renders with Long/Short/Cash guidance + a plain‑English common‑man line. - [ ] The tone steadies (it frames probability + horizon + what to watch), it does not induce panic.
D. SEO / trust / YMYL (§20):
- [ ] Indexable test passes: probability and events_used both present (else it is correctly noindex).
- [ ] Title/description are honest and specific; JSON‑LD (Article + Dataset + BreadcrumbList) is intact.
- [ ] Sources are cited and resolve; the probabilistic‑future + not‑investment‑advice disclaimer is present.
E. Data quality (DAMA 6 dimensions — §17.1): - [ ] Accuracy (matches the cited reality) · Completeness (roots+cascade+history+guidance all present) · Consistency (signs/units agree across cascade, guidance, history) · Timeliness (probability reflects current regime) · Validity (vocab honoured) · Uniqueness (no duplicate id/title).
status: candidate → keep only after A–E all pass.build_seo_pages.py → GitHub)Goal: render the page + all hubs, run tests, ship via Git.
PYTHONPATH=. .venv/bin/python bin/build_seo_pages.py # pages + hubs + sitemaps + stats + search_index
PYTHONPATH=. .venv/bin/python -m pytest -q # ⛔ expect "227 passed" (or current count)
git add -A && git commit -m "…" && git push origin main # ⛔ DEPLOY = git push (auto-builds on Vercel)
CHECKLIST:
- [ ] build_seo_pages.py ran clean; the new web/public/scenario/<slug>.html exists.
- [ ] data/stats.json count went up by the number added (counts self‑update; never hardcode a count anywhere — counts.js + [data-mg-count] fill them).
- [ ] data/search_index.json includes the new scenario (with c=kicker, g=sector).
- [ ] Sitemaps regenerated; the new indexable page is in sitemap-scenarios.xml.
- [ ] ⛔ Tests pass (pytest -q).
- [ ] ⛔ Deploy via git push origin main — the Vercel project public (rootDirectory web/public, branch main) auto‑builds server‑side. Do NOT use vercel deploy --archive=tgz (it re‑uploads the whole site and exhausts the free upload quota — api-upload-free 429). The refresh_scenarios.py docstring still says "archive deploy" — that is stale; ignore it.
- [ ] Commit only the intended files; do not sweep in unrelated periodic‑refresh data churn (charts/history) unless that's the point of the commit.
Goal: confirm it's actually live and correct — not just deployed. Prod alias: https://macroguru.app.
CHECKLIST:
- [ ] Poll the deploy by commit SHA (the first READY you see is often the previous deploy). vercel ls public --yes → vercel inspect <building-url> until Ready.
- [ ] ⛔ Fetch the clean URL (cleanUrls strips .html): curl -sL https://…/scenario/<slug> — not …/scenario/<slug>.html (that returns a ~15‑byte redirect stub and looks broken). For static assets, byte‑compare (curl … | wc -c vs local) to confirm freshness.
- [ ] The page shows: oddsbar (prob + crowd), markets table, "What to do if this happens," "Historical precedent" table, related scenarios.
- [ ] Search (type a token) returns the scenario with a kicker + highlighted match + prob pill.
- [ ] Mobile: at 390px and 360px the page has zero horizontal overflow (scrollWidth == viewport) and no overlap. (Mobile standards: §21.)
- [ ] No console errors; the price chart line renders (the --line var fallback is present).
A scenario is never "done forever" — it is scored against reality and updated. This is the self‑improving loop
(docs/SELF_IMPROVING_ENGINE.md, stages D/E/G).
CHECKLIST (ongoing):
- [ ] Coverage stays green: bin/coverage_report.py → coverage_report.json shows no new gaps after the add.
- [ ] Calibration is watched: calibration.json (reliability curve by confidence band, corroboration score) — high‑probability scenarios should fire more often than low‑probability ones; mis‑calibrated buckets get nudged toward the realized base rate (apply_probs next cycle).
- [ ] Prob‑history updates: bin/build_prob_history.py redraws how the probability moved + the events that moved it.
- [ ] Fires are logged: when a matched real event fires, score the realized abnormal returns vs the projected cascade (sign hit‑rate); a systematic miss → re‑map roots + fresh deviation thesis.
- [ ] Retire/deactivate, never delete: an obsolete scenario is set status: drop (or an asset deactivated) — history is preserved for future re‑use (invariant).
"All the pointers related to adding the market in the product." Triggered when a new market matters (a new Hyperliquid listing, or a market we want charts/evidence for). Goal: no tradable left behind.
CHECKLIST:
- [ ] Classify the asset (crypto / commodity / tokenized‑equity / FX / index / rate / vol).
- [ ] Price source + ticker map: register it in the price layer (macroguru price map / ASSET_YF) and, for the chart, add it to bin/build_asset_charts.py's set; run it → web/public/data/charts/<tk>.json (+ charts_index.json).
- [ ] Short‑history proxy: if the asset has < ~1y of data, map a long‑history peer/proxy (new L2 → ETH/SOL beta; new gold product → XAU) so the event study has something to measure.
- [ ] HL alias: add the HL symbol alias in hl_universe.TICKER_ALIASES (e.g. our XAU ↔ HL PAXG).
- [ ] Scenario inclusion: for every existing scenario, does the asset's class intersect the scenario's root shocks? If yes it inherits the cascade automatically (it becomes a new leaf in cascade/propagation.py).
- [ ] Evidence: run build_history.py so the asset gets measured abnormal returns across the matched analogues.
- [ ] Net‑new asset‑specific scenarios: generate the scenarios the asset makes relevant that the library lacks (e.g. a new staking token → "staking‑yield collapse"). Run them through §5–§11.
- [ ] Chart var sanity: confirm the price line renders on news.css pages (the --line fallback). Click‑to‑chart is gated to charted tickers.
- [ ] ⛔ Coverage assertion: coverage_report.py shows the asset as covered (price/proxy + ≥1 scenario + ≥1 analogue). A remaining gap fails the loop — fix the price source or proxy (e.g. the known HYPE gap → wire HL's candle API / CoinGecko).
- [ ] Rebuild + publish (§11) + verify (§12).
For scheduled catalysts (NFP, CPI, FOMC, OPEC+, options expiry, Jackson Hole, shutdown, elections…). Content
lives in bin/catalyst_playbook.py; pages render at /week-ahead/<type>.
CHECKLIST:
- [ ] Add a PLAYBOOK[<ctype>] entry: slug, label, sector, kicker, blurb, watch, poly_q, kalshi_q, and 3 outcomes.
- [ ] Each outcome has: name, prob (base rates summing ≈ 1 across the three), tag (risk-on|risk-off|mixed), a thesis, a cascade (_imm/_rip nodes: asset, signed direction, magnitude prior, one‑line mechanism), and a guide (stance, long, short, cash?, common).
- [ ] Magnitudes are reaction‑function priors grounded in published cross‑asset consensus, tuned to the current regime — labelled as priors (not measured returns).
- [ ] Add a CALENDAR row in bin/build_upcoming.py with the matching ctype.
- [ ] Run build_upcoming.py then build_seo_pages.py; verify /week-ahead/<slug> renders 3 outcomes + cascade + "what to do," and the landing card links to it.
- [ ] Mobile + verify on prod (§12).
The top section of the landing page (index.html) — a small, curated list of the highest‑probability,
falsifiable macro calls for the next ~7 days, blended from the scheduled calendar, analyst consensus,
prediction‑market crowd odds, the verified world‑state, and the engine's own scenarios. It is the product's
sharpest expression of the mission: peek into the coming week, with calibrated odds, not hype.
Data flow: config/predictions_week.yaml (the research‑authored weekly seed) → bin/build_predictions.py
(matches each call to the nearest library scenario for a deep‑dive link, resolves asset tickers→names,
validates the honesty contract, writes web/public/data/predictions.json) → web/public/predictions.js
renders the cards into #solid-predictions. build_predictions.py runs automatically at the tail of
build_seo_pages.py (so the scenario deep‑dive links stay synced to fresh slugs).
Four surfaces, one feed (predictions.json): (1) the landing section (index.html via predictions.js);
(2) the /news lead band (server‑rendered render_solid_predictions()); (3) the site‑wide ribbon
(web/public/predictions-ribbon.js — a STATIC, dismissible, theme‑aware strip of the top calls injected on every
page after the breaking‑bar; no auto‑scroll per WCAG 2.2.2; dismissal persists per week_of; self‑suppresses on
/predictions); and (4) the dedicated /predictions page (render_predictions_page() → predictions.html)
— the full week with per‑card "≈ N in 100", ours‑vs‑crowd edge, the fixed resolution rule (joined from
predictions_log.json), Article+ItemList JSON‑LD, an evergreen dateless URL re‑rendered in place. All four refresh
from the same weekly rebuild — no extra steps.
⛔ Honesty contract (non‑negotiable): every prediction ships with a calibrated probability, an
explicit basis (scheduled | consensus | base_rate | seasonality | crowd | regime), and real source
URLs. "scheduled" = the event WILL occur (~certain); the outcome is the prediction — never conflate them.
The section header says "Calibrated odds, not advice." Calibrated, never clairvoyant.
WEEKLY REFRESH CHECKLIST (run every Sunday/Monday for the new week):
- [ ] Re‑run the 4 research streams (macro calendar + consensus · geopolitics/energy · markets/crypto/corporate ·
prediction‑market odds) for the new Mon–Sun window; verify dates against ≥2 sources; capture the current
regime + verified world‑state. (See the deep‑research prompts in the session log / §27.)
- [ ] Rewrite config/predictions_week.yaml: week_of, window, as_of, regime, and ~6–9 predictions.
Each needs: title, claim (falsifiable), probability (0–1, calibrated), optional crowd_probability,
basis, timeline (the date), category (a VALID_CAT), what, assets (engine tickers + up|down|flat),
recommendation, optional edge, a match hint (keywords → scenario), and ≥1 sources.
- [ ] Keep the mix honest + diverse: a few scheduled‑certain anchors, the highest‑conviction outcome calls,
and 1–2 genuine edges (a seasonality base rate, an OURS‑vs‑crowd gap). Don't pad with filler.
- [ ] Write every claim / recommendation / edge in the MacroGuru voice (docs/VOICE.md):
lead with the call, the number is the confidence, one‑line caveat, cut the process‑meta. No "graded
against / according to / we track whether". B2C — sharp, not chatty.
- [ ] Run PYTHONPATH=. .venv/bin/python bin/build_predictions.py → confirm every call links to a sensible
scenario (tighten the match hint if a link is off‑theme/direction).
- [ ] Rebuild the site (build_seo_pages.py re‑runs predictions automatically) → verify the landing section,
the ribbon (top of every page; dismiss re‑shows next week), and the /predictions page all render,
are mobile‑clean, and links resolve. Commit + deploy + verify on prod (§12).
"How close to reality are we?" is the only test our work is ultimately judged on. Every published prediction is logged with a fixed, source‑tied resolution rule and scored against what actually happens — wins and losses, in public, at
/reality-check. This is the accountability spine of the product. ⛔ Never cherry‑pick: once a call is published it stays on the record, win or lose.
Data flow: each weekly prediction (with resolves_on, resolution_criteria, scheduled_certain) auto‑syncs
from config/predictions_week.yaml into the append‑only ledger config/predictions_log.json via
bin/build_predictions.py (open entries; it NEVER overwrites a human‑set resolution). bin/build_scorecard.py
scores the ledger → web/public/data/scorecard.json → the /reality-check page renders it. Both run at the tail
of build_seo_pages.py. Methodology = proper scoring rules (one‑sided Brier in [0,1], Murphy
reliability/resolution decomposition, Brier‑skill‑score vs the base rate AND vs the prediction‑market crowd,
calibration‑by‑bucket with Wilson bands), per Tetlock/GJP + Metaculus practice.
WEEKLY RESOLUTION CHECKLIST (run every Monday for the week that just ended):
- [ ] For each now‑past forecast in config/predictions_log.json, research what ACTUALLY happened against its
pre‑registered resolution_criteria (verify with ≥2 sources).
- [ ] Set status: "resolved", outcome: true|false (or partial), resolved_on, and evidence: {text, url}.
If the premise was voided/ambiguous, set status: "annulled" (counts for nobody, stays visible).
- [ ] ⛔ Do NOT edit probability or the claim after publication — the forecast is frozen at publication. Only
add the resolution fields.
- [ ] Scheduled‑certain calls (scheduled_certain: true) stay in the ledger but are excluded from the skill
Brier/BSS — don't let calendar gimmes inflate the headline.
- [ ] Run bin/build_scorecard.py (or any build_seo_pages.py); confirm the Brier, calibration curve, and
our‑vs‑crowd edge update, and the resolved calls show ✓/✗ with their evidence. Commit + deploy + verify (§12).
- [ ] Sanity: report counts + a confidence caveat while n is small; never headline a Brier off <~15 resolved calls.
The path from ~5k → 100k is industrialised generation + the same per‑scenario gates on every record. Volume never lowers the bar.
jurisdiction × mechanism × magnitude × horizon (the CCAR landscape: every stress‑test body × every channel). See docs/CCAR_STRESS_TEST_LANDSCAPE.md.data/gen/L<NN>_<theme>.json, each record = {category, probability, timeline, title, scenario, countries, roots} per docs/CCAR_SCENARIO_GENERATION_SPEC.md..venv/bin/python bin/merge_ccar_scenarios.py # validates vocab, dedups by normalised title, fresh ids, appends
max+1, appends YAML (minimal diff), writes roots to scenario_roots_ccar.yaml.build_scenarios.py → cascades · build_history.py → evidence · enrich (§9: probs + headlines required for new ids) · build_seo_pages.py · test · push · verify.config/scenarios_clean.md (bin/dump_scenarios_md.py) so the human‑readable mirror stays current.A scenario is only indexable (first‑class, public, ranked) when it has measured events_used. So 100k
indexable scenarios requires the event library to scale with them:
- [ ] Grow config/historical_events.yaml (currently ~993) via the adversarial event‑sourcing workflow → merge_history.py, so every new lane has real analogues to match.
- [ ] Ensure new scenarios' tags/themes overlap the event library so build_history.py finds analogues (a scenario with no matchable analogue stays noindex — that's correct, not a bug).
- [ ] Track the indexable ratio in stats.json (indexable / scenarios). Driving that ratio up is the work of scaling — raw count without evidence is vanity.
stats.json); deliberately commission lanes for under‑represented regions/themes.Coverage is not only geographies and crises. The second coverage axis is the frontier of our species — the developments that re‑shape how we live: medicine, AI, technology, science journals, patents & discoveries, biology, nature. This is the home of invariant #4. The full framework (lanes, sources, vocab, ontology, the good/bad rule, worked examples) lives in
docs/FRONTIER_SCENARIO_SPEC.md— read it before commissioning a frontier lane. The operational checklist:
ai_breakthrough, scientific_breakthrough, biotech_breakthrough, longevity,
clean_energy, space_economy, neuro_interface, biosecurity_risk, biodiversity_loss. These are wired in
macroguru/cascade/propagation.py, labelled in bin/build_scenarios.py, accepted by merge_ccar_scenarios.py
(FRONTIER_FACTORS), and tagged for evidence in bin/build_history.py (root_tags). Add to all four when
introducing a new frontier factor.e4-frontier-* events in config/historical_events.yaml
(AlphaFold, ChatGPT, GPT‑4, the Nvidia AI‑capex wave, mRNA/GLP‑1/CRISPR, NIF fusion, Neuralink, LK‑99, JWST,
SpaceX landing, IPBES biodiversity…). A novel scenario will show agree=False vs an imperfect analogue — that
is the deviation thesis, surfaced honestly, not a bug.macroguru/data/science_news.py (Nature, Science, arXiv, MIT Tech Review, MIT News, FDA) →
corroboration.TOPIC_KEYWORDS (frontier topics) → alerts.TOPIC_TO_EVENT → a frontier ontology event in
config/event_ontology.yaml (ai_capability_breakthrough, ai_capex_bust, biomedical_breakthrough,
biosecurity_shock, clean_energy_abundance, space_economy_milestone, scientific_discovery). A topic with
no ontology event is context‑only (it never drives a cascade). After editing the ontology, recompile
(compile_ontology() in bin/refresh_site.py).\bagi\b, not
agi — which matches "magic", "agitation"). After any vocab change, run the false‑positive guard before
committing. The corroboration gate (2 independent classes; a lone Nature paper or FDA approval = WATCH, not
ALLOW) is the second line of defence.A scenario is strong only if it passes all of these (synthesis of scenario‑planning, superforecasting, ICD 203, and DAMA data‑quality consensus — §27):
| Dimension | For a scenario, it means |
|---|---|
| Accuracy | impacts/sources match cited reality |
| Completeness | roots + cascade + history + guidance + probability all present |
| Consistency | signs/units agree across cascade ↔ guidance ↔ measured history |
| Timeliness | probability reflects the current regime; prices/history are fresh |
| Validity | category/timeline/roots honour the controlled vocab |
| Uniqueness | unique id; no duplicate title |
n_analogues); then adjust for current conditions.0.05–0.40 · structural/tail 0.01–0.10 · developing
trends 0.50–0.85. CCAR spec: severe‑tail 0.02–0.10, plausible‑cyclical 0.15–0.45.calibration.json holds the rolling reliability curve by confidence band; the target is
that predicted ≈ realized frequency. (Human superforecasters land Brier ≈ 0.15–0.20; that's the bar to beat over time.)apply_probs).Full contract: docs/CALIBRATION_METHODOLOGY.md. Engine:
macroguru/calibration/derive.py. Neither the probability nor the per‑asset impact % is asserted any more —
each is built from a reference class, the build is recorded so it can be audited, and after the fact it is
scored against reality.
derive_probability() in build_history.py: shrink the assigned prior toward the
reference‑class (category×timeline) base rate by precedent strength, log‑odds‑pool with the crowd, extremize
mildly, reserve mass for unknown‑unknowns + Cromwell‑clamp [2%,97%], and report a credible interval that
widens with thin precedent. Written to each scenario's prob_derivation and surfaced as
"History‑derived X% · 90% range A–B%" on every scenario page + the Lab/What‑If reasoning panel.derive_impact(): shrink the measured analogue abnormal return toward 0 by reliability
(sample × consistency × confidence), blend with the cascade prior, band it (fat‑tail aware). Written per market
as impact; surfaced as the "hist A–B%" range next to every projected move.impact_accuracy.json (built by build_history.py) scores the published % vs the measured
analogue move (MAE/RMSE/bias/dir‑hit/coverage/skill‑vs‑no‑move, by confidence band); build_scorecard.py folds
it into scorecard.json; both show at /reality-check alongside the probability Brier/calibration.Run bin/recalibrate.py (auto‑runs at the tail of build_seo_pages.py) → recalibration.json. It proposes
(never auto‑applies):
1. Impact bias offset — if magnitude bias is materially ≠ 0, apply −bias to published moves.
2. Confidence recalibration — if directional accuracy doesn't rise with the confidence score, down‑weight/re‑fit it.
3. Probability — once forecasts resolve, fit a Platt recalibration on the resolved set; and review the scenarios
flagged where the history‑derived probability diverges ≥15pts from the assigned prior.
Apply the accepted proposals via scenario_overrides.yaml / propagation.py (mirrors the apply_probs flow),
then rebuild. Each cycle closes the gap to reality a little more.
We exist to help people peek into the future, manage their emotions, and decide better. Behavioral‑finance research is explicit about the failure modes we must counter: loss aversion, recency bias, and panic selling driven by fear of immediate loss. So every published scenario must:
Litmus test: would this scenario make an anxious reader calmer and better‑prepared, or just more scared? If the latter, it fails — add the guidance and the framing.
Finance is YMYL ("Your Money or Your Life") — Google holds it to the strictest E‑E‑A‑T bar, and so do we.
source_url).noindex until they earn evidence (don't index vanity pages).Every page (scenario, hub, week‑ahead, landing, legacy) must pass:
- [ ] Zero horizontal overflow at 390px and 360px (scrollWidth == viewport).
- [ ] Touch targets ≥ 44px (Apple HIG) / ≥ 24px min (WCAG 2.5.8); inputs ≥ 16px (no iOS zoom‑on‑focus).
- [ ] Tables scroll inside their own container (comparison tables) or stack (content tables) — never push the page.
- [ ] Marquees/animation pause on hover/focus + honour prefers-reduced-motion (WCAG 2.2.2).
- [ ] No sticky element covers content on mobile.
- [ ] Verify with cache‑busted CSS or on prod (the local preview caches CSS/JS across reloads).
git push, never vercel deploy --archive=tgz (re‑uploads ~5k+ files → api-upload-free 429, a 24h lockout). The refresh_scenarios.py docstring is stale on this.cleanUrls: verifying a page with curl …/x.html returns a 15‑byte redirect stub (looks like a failed deploy). Use the clean path …/x or curl -L. .css/.js/.json are unaffected.READY in vercel ls is often the previous build; match the commit SHA before declaring success.apply_headlines.py must MERGE (setdefault().update()), never replace the headlines dict, or it wipes the existing thousands.build_seo_pages.py owns news.html, scenario/*, sectors/*, countries/*, assets/<slug> (hubs), risks/*, week-ahead/*. refresh_site.py renders data_quality.html + honest_limits.html from the root *.md via its _DOC_TMPL (and the periodic job re‑runs it — edit the template, not the output, or your change is wiped). Everything else (scenarios.html, monitor.html, whatif.html, world.html, assets.html, index.html, dashboards) is hand‑maintained.probability AND events_used — a scenario with no measured analogue renders noindex (correct, but means "not first‑class yet").stats.json; fill via [data-mg-count] + counts.js.build_history/build_scenarios write‑if‑changed (α/β quantised) — re‑running on unchanged prices writes ~0 files; a huge diff means inputs really changed (or upstream data drifted).▲▼◆ and › break a plain grep in zsh (character not in range) — export LC_ALL=en_US.UTF-8 first.?bust=) or verify on prod.| Cadence | What runs | Mechanism | Cost |
|---|---|---|---|
| 5 min | alert bus, hypothesis monitor, live‑monitor feed | event_tick.py / reactive_engine.py (launchd) |
free |
| Daily | prices → history → cascades → coverage → calibration → deploy | refresh_scenarios.py (launchd) |
free |
| Monthly / on‑trigger | probability recalibration, deviation theses, headlines (new ids), deep event sourcing, bulk scenario generation | bin/REFRESH_RUNBOOK.md (ops/com.macroguru.intel.plist) |
~5–10M tokens |
| On new HL listing | new‑asset deep round (§14) | event‑triggered fast‑path | bounded tokens |
claude -p unattended.pytest green + prod verified by SHA.One scenario, idea → live. (Bulk: do §16 first, then this per record.)
PRE‑CHECK (idea)
[ ] real anchor (regulator concern / historical episode)
[ ] specific channel + magnitude; distinct; decision‑relevant; falsifiable
[ ] driver expressible in the 50‑factor root vocab; assumptions explicit
SPECIFY
[ ] next free id (never reuse) · category ∈ taxonomy · status: candidate
[ ] timeline en‑dash · title 2–6 words · one precise sentence
[ ] roots entry: 2–5 signed factors · sober probability
BUILD
[ ] build_scenarios.py → scenarios/<id>.json (cascade imm+ripple, markets ranked, HL gated, countries, guidance)
[ ] build_history.py → history/<id>.json with events_used ≥1, real sources, 20d/5d abnormal returns
ENRICH (monthly tier)
[ ] probability recalibrated + ≤18‑word rationale
[ ] deviation thesis where cascade ≠ measured history
[ ] headline "What if …?" (apply_headlines MERGES)
QUALITY GATE ⛔ (finalize)
[ ] Integrity: measured ✓ calibrated ✓ coverage (no new gap) ✓
[ ] Content: distinct, concrete, coherent cascade, deviation explained
[ ] Emotional: what‑to‑do + common‑man + steadying frame
[ ] SEO/YMYL: indexable (prob+events_used), sources resolve, JSON‑LD, disclaimer
[ ] DQ: accuracy·completeness·consistency·timeliness·validity·uniqueness
[ ] promote status: candidate → keep
PUBLISH
[ ] build_seo_pages.py · stats.json count up · search_index includes it · sitemap updated
[ ] pytest -q green
[ ] git push origin main (NOT archive=tgz)
VERIFY PROD ⛔
[ ] poll deploy by SHA → READY
[ ] curl clean URL (…/scenario/<slug>, with -L) shows oddsbar + markets + what‑to‑do + historical precedent
[ ] search returns it; mobile 390/360 zero overflow
MONITOR
[ ] coverage_report green · calibration watched · prob_history drawn · retire = status:drop (never delete)
AR = r − (α + β·r_SPX) around analogous historical events.historical_events.yaml) matched to a scenario by tags/theme.events_used — the analogues actually used to measure a scenario; required for indexability.probability AND events_used → first‑class public page (in sitemap).coverage_report.json).jurisdiction × mechanism × magnitude × horizon slice used for bulk generation.docs/CCAR_SCENARIO_GENERATION_SPEC.md and
bin/news_taxonomy.py (those + merge_ccar_scenarios.py's validator are the machine‑enforced truth).Companion docs (read alongside this one):
SELF_IMPROVING_ENGINE.md (the master loop) ·
HISTORICAL_ENGINE.md (event‑study mechanics) ·
CCAR_SCENARIO_GENERATION_SPEC.md (the generation contract) ·
CCAR_STRESS_TEST_LANDSCAPE.md (the world's stress‑test bodies = the lane map) ·
../bin/REFRESH_RUNBOOK.md (the monthly intelligence refresh) ·
the rendered operator docs DATA_QUALITY.md & HONEST_LIMITS.md.
The standards above are grounded in published best practice, not invented:
docs/FRONTIER_SCENARIO_SPEC.md — the companion spec to this SOP.End of SOP. If you followed every checkbox, the scenario is measured, calibrated, covered, emotionally honest, indexable, accessible, and live — and it does the one thing we exist for: help someone meet the future calmly and decide better. Now do it 100,000 times.