rendered 2026-07-03T12:35:48Z

MacroGuru — Data Source Quality Assessment

Last reviewed: 2026-05-22. Scope: every external data source MacroGuru ingests, graded A/B/C/D for production reliability with the single most important "gotcha" highlighted for each.

Grading rubric: - A — Trust as primary. Authoritative publisher, transparent methodology, well-understood revision behaviour, fit for risk-bearing decisions. - B — Trust with caveats. Reliable but needs awareness of cadence/quirks; usually fit for production after a sanity filter. - C — Cross-validate before use. Useful but with structural quality issues — never the sole input behind a trade. - D — Avoid for production. Use only for exploratory research or display.

1. FRED — `api.stlouisfed.org`

Publisher. Federal Reserve Bank of St. Louis. FRED is an aggregator, not an originator: every series carries an upstream "Source" attribution which is what actually matters for quality. The series MacroGuru consumes split as follows:

Series	True upstream publisher	Cadence (FRED)	Latency
SP500, NASDAQCOM	S&P Dow Jones Indices / Nasdaq Inc.	Daily close	T+1 business day, ~7pm CT
DCOILWTICO, DCOILBRENTEU	U.S. EIA (Cushing OK / Europe Brent spot)	Daily	T+1 business day
DEXUSEU, DEXJPUS	Federal Reserve H.10 noon buying rates	Daily	T+1, posted ~16:15 ET
DTWEXBGS	Federal Reserve H.10 broad trade-weighted USD index	Daily	T+1
CBBTCUSD, CBETHUSD	Coinbase (see §2)	Daily 7-day	Posted ~19:05 CT next day
DFII10	U.S. Treasury (TIPS constant-maturity)	Daily	T+1
VIXCLS	Cboe Global Markets	Daily close	T+1
BAMLH0A0HYM2	ICE BofA US High Yield OAS	Daily	T+1 (often T+2 — ICE lag)
T10Y2Y	Derived from H.15 (DGS10 - DGS2)	Daily	T+1
FEDFUNDS	NY Fed effective fed funds rate	Monthly	Mid-following-month
DGS30	U.S. Treasury constant maturity	Daily	T+1

Revision policy. FRED preserves vintages in ALFRED. Treasury/H.10/H.15 series are revised quietly when the underlying source republishes. The H.10 noon rate was discontinued for the daily real-time feed in 2008 and is now a daily synthetic; FX print is not a real-time mid. ICE BofA HY OAS (BAMLH0A0HYM2) is methodology-licensed from ICE Data Indices and can be restated up to 5 business days after first print.

Gaps. All daily series follow the U.S. federal holiday calendar — no weekend or US-holiday observation. Crypto (CBBTCUSD/CBETHUSD) is the only 7-day-week series, but the print is a daily snap not a 24h average.

Grade: A. FRED itself is the gold-standard distribution layer for U.S. macro data. The only risk is mis-using a series whose true latency or revision profile MacroGuru doesn't model (e.g., treating BAMLH0A0HYM2 as same-day).

Gotcha: FRED daily series are not real-time; they post the following business day. Any code that assumes "today's value is available today" will silently produce stale signals — and the H.10 FX series in particular is a single 12:00 ET fix, not a live mid.

Source: https://fred.stlouisfed.org/docs/api/fred — https://fred.stlouisfed.org/series/CBBTCUSD

2. Coinbase via FRED — `CBBTCUSD`, `CBETHUSD`

Venue. Coinbase Exchange (the public retail order book, formerly Coinbase Pro), not Coinbase Prime and not a multi-venue aggregate. Coinbase furnishes a daily snap to St. Louis Fed.

Time-of-day of the print. Per FRED's own series notes: "All data is as of 5 PM PST." So CBBTCUSD for 2026-03-09 is the BTC-USD last trade on Coinbase Exchange at 17:00 America/Los_Angeles on 2026-03-09. That is 00:00 UTC on 2026-03-10 — meaning the FRED date is shifted forward relative to UTC midnight by ~8 hours.

Lag vs live. FRED publishes the prior-day print at ~19:05 CT the following day → real wall-clock latency from the 17:00 PT snap to availability in the API is ~26–28 hours. Useless for intraday; fine for end-of-day backtests if you treat the date label correctly.

Gaps. Effectively none — Coinbase doesn't halt — but if Coinbase Exchange has a degraded matching engine at 17:00 PT (this has happened during BTC ETF launch volatility and the May-2022 LUNA crash), the FRED print will be whatever last-trade Coinbase reported, even if it was a stale or wide quote.

Grade: B. Trustworthy as a clean end-of-day series; do not use for anything tighter than daily because the timestamp convention is non-obvious.

Gotcha: The "5 PM PST" snap is a single-venue last trade, not a TWAP or volume-weighted mean — a 30-second wick on Coinbase Exchange can permanently mark the official daily print.

Source: https://fred.stlouisfed.org/series/CBBTCUSD (series notes section)

3. Stooq — `stooq.com`

Publisher. Polish data aggregator run by Stooq sp. z o.o., free tier with no documented SLA. Stooq does not disclose upstream sources for most equities. Independent reporting (QuantStart, PolitesiMilano big-data thesis) indicates U.S. equities are sourced from a mix of free EOD feeds; commodities from Barchart; crypto from CoinAPI; Polish/European equities are first-party from WSE/Xetra feeds.

Tier. No paid API. CSV downloads are unauthenticated, no rate limit published — but throttled in practice (a few req/sec before HTML rate-limit pages appear).

Known accuracy issues. Comparative studies (the Milano thesis especially) rank Stooq below Yahoo Finance on completeness and accuracy dimensions. Specific failure modes seen in production: - Adjusted close is conflated with close — the "close" column is already split/dividend adjusted, which silently breaks any pipeline that expects raw close. - Missing rows on partial-trading days (e.g., U.S. half-days after Thanksgiving, NYSE 9/11 anniversary halts) — Stooq sometimes omits these, sometimes carries forward. - Polish daylight-saving boundary glitches on intraday data (we don't use Stooq intraday in MacroGuru, so non-issue here). - Cryptos diverge from Coinbase/Binance by 0.1–0.3% routinely because CoinAPI's underlying composite differs.

Cadence. End-of-day, typically posted 1–2 hours after each market's local close. Not real-time and not advertised as such.

Grade: C. Acceptable as a secondary cross-check for daily equities/FX/commodities but never as the only source. The undocumented adjustment behaviour alone disqualifies it from being primary.

Gotcha: Stooq's "Close" column is already adjusted for splits and dividends. Comparing it to a true close from another vendor will produce a phantom gap — and you cannot recover the raw close.

Source: https://stooq.com/db/ — https://www.quantstart.com/articles/an-introduction-to-stooq-pricing-data/

4. Binance Vision — `data.binance.vision`

Venue. Binance.com (global) only — not Binance.US. The S3 bucket prefixes data by product: - data/spot/ — Binance.com spot - data/futures/um/ — USDT-margined perpetuals (linear) - data/futures/cm/ — coin-margined futures (inverse) - data/option/ — options

Contents. Per binance/binance-public-data GitHub: - klines/ — OHLCV bars at 1s, 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w, 1mo - trades/ — every fill (tick data) - aggTrades/ — aggregated taker trades (matched against the same price level) - bookDepth/ — periodic L2 snapshots (futures only) - bookTicker/ — best-bid/best-ask updates - metrics/ — futures-only: open interest, long/short ratios, taker volume - fundingRate/ — historical funding (futures only) - No L2 order book full-stream archive; only the periodic snapshots.

Cadence. Two layouts: daily/ (one zip per day) and monthly/ (rolled-up monthly zips). The previous-day daily zip lands on S3 typically within 5–30 minutes of 00:00 UTC. Monthly rollups arrive a few days into the next month. Each zip has a .CHECKSUM sibling (SHA256) — verify before ingest.

Quality. This is Binance's own archive of its own exchange feed, so by definition it is the canonical source for Binance prints. Caveats: data is republished from production, so any matching-engine errors are preserved verbatim (e.g., the Nov-2022 5-minute ZRX-PERP gap from a maintenance window remains in the file).

Grade: A. Best-in-class for crypto historical data, assuming you've corroborated that Binance is a venue you care about (it is — Binance is 50%+ of crypto perp volume, dominant in spot too).

Gotcha: Zips are not idempotent for the most recent day — Binance occasionally re-uploads a day's zip with corrected fields hours later. Always re-fetch and re-verify the last 48 hours' checksums before treating data as final.

Source: https://github.com/binance/binance-public-data — https://data.binance.vision/

5. Yahoo Finance — `query1.finance.yahoo.com` / `yfinance`

Authoritative upstream. Yahoo Finance redistributes (mostly) NYSE/Nasdaq/CBOE direct feeds for U.S. equities/ETFs, ICE for FX, and an internal cryptocompare-style aggregate for crypto. Yahoo is downstream of the exchange direct feed plus a 15-min delay for non-subscribed users.

Terms of service. Yahoo's ToS expressly forbids automated scraping and commercial redistribution. yfinance is a third-party scraper (Ran Aroussi's package) — not a sanctioned API. Yahoo has materially tightened rate limits, added cookie/crumb auth, and (week of 2025-02-17) changed historical endpoints so most older versions of yfinance broke until updated. As of 2026, free historical data via the unofficial endpoints intermittently returns empty payloads with a "possibly delisted" message even for valid tickers. Reliable bulk historical access now effectively requires the paid Yahoo Finance Premium subscription.

Documented quirks. - Close (raw) and Adj Close (split- and dividend-adjusted) are both delivered — but the adjustment history is rewritten on every new corporate action, so the same (ticker, date) query yields different Adj Close values over time. Backtests that re-pull data weeks later will silently drift. - Splits and dividends are reported but with occasional timing-of-day errors (ex-date vs pay-date confusion). - Crypto pairs are quoted in the chosen FX but the underlying composite changes when Yahoo swaps exchange constituents — no notice given. - After-hours and pre-market bars are not consistent; the "regular hours" filter sometimes leaks 16:00 ET trades.

Grade: C. Use only as a redundancy check against authoritative sources. The combination of opaque ToS status, repeated API breakage in 2024–2025, and silently mutating adjusted history makes this unsafe as a sole input.

Gotcha: Adj Close is recomputed every time a corporate action happens — your backtest's results literally change after every split or dividend if you re-pull, even for dates years in the past.

Source: https://github.com/ranaroussi/yfinance/issues/2340 — https://medium.com/@trading.dude/why-yfinance-keeps-getting-blocked-and-what-to-use-instead-92d84bb2cc01

6. Hyperliquid REST — `api.hyperliquid.xyz/info`

Endpoints we use. - metaAndAssetCtxs — per-asset mark price, oracle price, funding, open interest. - l2Book — L2 order book snapshot. - fundingHistory — historical hourly funding rates.

Mark price methodology (from Hyperliquid docs, "Robust price indices"): mark = combination of three components: 1. The validator-computed oracle price plus a 150-second EMA of (Hyperliquid mid − oracle). 2. The median of HL's own best-bid, best-ask, and last trade. 3. The median of Binance, OKX, Bybit, Gate.io, and MEXC perp mid prices with weights 3-2-2-1-1.

The oracle itself is the weighted median of Binance, OKX, Bybit, Kraken, KuCoin, Gate.io, MEXC, and Hyperliquid spot mids, weights 3-2-2-1-1-1-1-1, published by each validator every 3 seconds. This is a real multi-venue robust index — substantially harder to manipulate than a single-venue feed, but explicitly spot-derived for the oracle, which is what JELLY exploited.

Funding. 8-hour rate F = Average Premium Index + clamp(interest − Premium, −0.0005, 0.0005), but paid hourly at 1/8 the rate. Funding payment converts position size using the oracle price, not mark. This means oracle drift directly affects funding cash flows.

Known incidents. - 2025-03-26 JELLY-PERP oracle manipulation. An attacker pumped JELLY on Bybit perp (and elsewhere on low-float spot) to drag Hyperliquid's oracle ~500% in <1 hour, triggering liquidations on a $4.5M short and force-feeding the loss into the HLP vault. Hyperliquid validators voted to override the oracle and force-settle the position at an off-market price to socialize-back HLP losses. Bitget's CEO publicly called this "FTX 2.0." HLP absorbed ~$12M of unrealized losses; attacker withdrew $6.26M before withdrawal freeze. (Source: Halborn, Kaiko, hyperliquid-co.gitbook.io wiki.) - Several minor depeg events on low-liquidity HIP-3 perps where deployer-set oracle weights were inadequate.

HIP-3 deployer trust. HIP-3 (introduced 2025) lets third parties deploy new perp markets on Hyperliquid by staking HYPE and configuring oracle/risk parameters. Each HIP-3 perp inherits the trust assumptions of its deployer. For MacroGuru, this means treating any non-core perp (anything outside the original ~50 HL-deployed markets) as effectively a different counterparty per market.

Grade: B. Best-in-class for what it is — a transparent on-chain perp venue with public methodology — but the JELLY precedent is binding: in a tail event, Hyperliquid will override the price feed to protect HLP, and your historical funding/mark prints can be rewritten ex post. For live execution: trust. For long-horizon backtests on small caps: treat with suspicion.

Gotcha: In the JELLY incident, validators retroactively overrode the oracle. Any historical perp price/funding data you fetched during a future incident could be rewritten in the on-chain history. Cache historical funding aggressively and timestamp your pulls.

Source: https://hyperliquid.gitbook.io/hyperliquid-docs/trading/robust-price-indices — https://www.halborn.com/blog/post/explained-the-hyperliquid-hack-march-2025 — https://hyperliquid-co.gitbook.io/wiki/introduction/roadmap/incident/2025-26-03

7. DeFiLlama Yields — `yields.llama.fi`

Publisher. DefiLlama (the Llama Corp DAO), the dominant independent DeFi data aggregator. Yields product is a subdomain of the broader TVL service.

Methodology. DeFiLlama ingests directly from on-chain protocol contracts via per-protocol adapters (open-source: DefiLlama/yield-server on GitHub) — not by scraping protocol front-ends. Each adapter is a TypeScript file that reads pool state and emits {apyBase, apyReward, apy, tvlUsd} rows. For lending protocols, apyBase typically comes from the interest model contract; for AMM LP, from a 7-day swap-fee average; for vault strategies, from the strategy contract's reported performance.

APY definition. apy = apyBase + apyReward. Reward APY is the contentious part — it values protocol-token emissions at current spot price and annualizes the most recent emission rate. This: - Massively overstates yields when a reward token is trending down (rate is computed at peak price, realized in cash at lower). - Understates compounding (most pools quote APR-equivalent and label it APY). - Has no adjustment for impermanent loss in AMM pools.

Known accuracy issues. - Per the protocol's own issue tracker (GitHub yield-server #6): APY at sub-daily timescales is ill-defined for protocols with discrete reward distributions — a poll that happens just after an epoch boundary will compute a stale APY for hours. - Several historical reporting errors on Curve gauge pools where boosted vs base APY was conflated. - TVL — and therefore APY denominators — can spike or crash on bridge events that double-count or zero-count assets. - DefiLlama explicitly disclaims: "DefiLlama doesn't audit nor endorse any of the protocols listed."

Cadence. Hourly refresh of the /pools endpoint; per-pool history at hourly resolution back ~2 years.

Grade: B. Best-available free DeFi yields oracle and what every serious yield aggregator uses, but the reward-APY accounting flaw is real and well-known. MacroGuru's apy_oracle already discounts reward APY — keep that policy.

Gotcha: Reward APY is a forward extrapolation of the latest emission rate at the latest token price. When the reward token is illiquid or volatile, the headline APY can be off by 50%+ within a single day.

Source: https://github.com/DefiLlama/yield-server/issues/6 — https://api-docs.defillama.com/

8. deBridge DLN — `dln.debridge.finance`

Settlement model. Intent-based cross-chain. User signs an intent ("I want X of token A on chain 1 in exchange for at least Y of token B on chain 2"). A network of competing solvers (anyone can run a Taker node) fills the intent on the destination chain from their own inventory, then claims the source-chain assets after the lock period. Crucially: no shared liquidity pool. Each solver bears its own inventory risk.

Audits. Repeated audits by Halborn on every meaningful module: DLN Taker, EVM↔Solana serializer, DLN EVM bridge contract, DLN Solana release, CrosschainForwarder, deBridge Core Solana contracts, and DLN EVM upgrades. Most recent published audit: 2024-12-30. Full report list at github.com/debridge-finance/debridge-security.

Incidents. None financially material since inception (2022). One August-2022 phishing attempt against deBridge team via a malicious PDF was reported and contained with no user-fund loss. The absence of a liquidity pool means there is structurally nothing to drain — exploits would have to target individual solvers (their problem, not user-side).

Fee model. Variable: the protocol takes a small fixed dlnFeeBps per chain pair (typically 4–8 bps); the rest of any spread is the solver's auction profit. Quotes are firm once accepted — no slippage from quoted-to-filled rate.

Cadence/latency. Quote API responds in <500ms typical; on-chain settlement: 2–30 seconds depending on source-chain finality (Solana fastest; Ethereum slowest pre-execution-layer-finality).

Grade: A. This is the cleanest cross-chain infra MacroGuru could pick. No pool → no honeypot. Active audits. Production-grade liquidity.

Gotcha: The protocol fee is small but the solver markup is market-determined — quoted output for low-volume corridors (e.g., Avalanche → Aptos) can be visibly worse than the same pair via a larger aggregator. Always price-check against 1inch Fusion or LI.FI for any single transfer >$100k.

Source: https://github.com/debridge-finance/debridge-security — https://docs.debridge.com/

9. GDELT 2.0 — `api.gdeltproject.org`

Methodology. NLP-based event extraction from a continuously growing corpus of online news. Each event is a CAMEO-coded actor-action-actor triple with geocoding, tone scoring, and source URL. Updates every 15 minutes. Translates 65 core languages via machine translation; samples 35 more with human translation.

Known biases. - English-language and Western overweight. Even with machine translation, the source-domain crawl skews to U.S. and U.K. media — multiple academic studies (UK ONS data-quality note; Politecnico di Milano thesis) document this. A U.S.-domestic political flare is over-reported relative to (say) an equivalent-magnitude Brazilian or Nigerian event. - Source diversity matters more than count. GDELT's mention counts are dominated by re-syndication: AP wire stories propagate to 200+ outlets and inflate mention counts even though there is one underlying event. - Key-field accuracy is ~55% per the Polimi thesis benchmark; data redundancy ~20%. - GDELT itself describes the database as "experimental" and explicitly warns against treating CAMEO codes as ground truth.

Useful for. Detecting large changes in narrative volume (regime change in coverage of central banks, geopolitical hotspots, sanctions, war). Cross-source corroboration of major events.

Not useful for. Trade-trigger-grade signals on single events. Precise sentiment on specific actors (tone score is too noisy at the per-article level). Non-English emerging-markets coverage at any granularity finer than country-level.

Grade: C. Excellent for relative-change features (week-over-week volume spikes), poor as a sole signal. MacroGuru's design — treating GDELT as a regime-feature input and not a triggering input — is correct.

Gotcha: Spikes in GDELT's NumMentions field reflect wire-service syndication as much as event severity. A single AP story can drive a 10x spike that has nothing to do with the world-state — always normalize by NumSources distinct domains.

Source: https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/ — https://www.mdpi.com/2306-5729/10/10/158 — https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/methodologies/globaldatabaseofeventslanguageandtonegdeltdataqualitynote

10. RSS Feeds — Political Influencers & Central Banks

Reliability tier varies sharply by feed type.

Tier 1 — first-party institutional feeds (A grade). These publish their own RSS and are authoritative: - Federal Reserve press releases: https://www.federalreserve.gov/feeds/press_all.xml - FOMC statements: https://www.federalreserve.gov/feeds/press_monetary.xml - ECB press: https://www.ecb.europa.eu/rss/press.html - Bank of England news: https://www.bankofengland.co.uk/news/rss - Bank of Japan: https://www.boj.or.jp/en/rss/whatsnew.xml - U.S. Treasury press releases - White House briefings: https://www.whitehouse.gov/feed/ - Congressional GovInfo bills/votes: govinfo.gov RSS endpoints

Latency from publication to RSS: under 5 minutes for the Fed, BoE, ECB; up to 30 minutes for BoJ; near-instant for the White House.

Tier 2 — news aggregators (B grade). Reuters, Bloomberg (paid), Wall Street Journal, FT — most have RSS or RSS-like JSON; reliable if you respect their robots.txt and ToS. Bloomberg in particular requires a Terminal license for redistribution.

Tier 3 — Twitter/X "mirrors" (D grade). Since the Twitter API became paywalled (Feb 2023) and Nitter instances were largely shut down through 2024–2025, all third-party RSS mirrors of Twitter accounts (rsshub.app, twiit.app, etc.) are either dead, intermittent, or returning stale data hours late. There is no reliable free RSS path to a specific Twitter handle's feed in 2026. Anyone who needs Trump-or-Powell tweets in their trading loop must either pay for the Twitter API ($5k+/month for the Enterprise tier with reasonable rate limits) or accept multi-hour latency from web-scrape mirrors.

Tier 4 — political-influencer aggregators (C grade). Substacks with RSS, Politico Playbook, Axios — first-party RSS available, latency 5–15 minutes, but signal-to-noise is dominated by editorial selection.

Gotcha (single most important): Central-bank RSS feeds publish the press-release URL, not the full body — your parser must fetch the linked HTML/PDF for the actual policy text. Many "RSS sentiment pipelines" sentiment-score only the RSS title and miss the actual signal entirely.

Grade overall: B. First-party central-bank and government RSS is A-grade; the Twitter-mirror tier is D-grade and should not be in any production path. MacroGuru's design — first-party feeds only, with body-fetch-on-link — is the correct architecture.

Source: each institution's own RSS endpoints; Twitter API pricing per developer.x.com.

Summary — How MacroGuru Should Lean

Primary inputs (A grade — trust): FRED for U.S. macro and rates; Binance Vision for crypto historical klines/funding; deBridge for cross-chain settlement; first-party central-bank and government RSS for policy events. These four pillars carry MacroGuru's risk-bearing decisions. They share three properties: authoritative publisher, transparent methodology, and either no revision behaviour or a fully-published revision policy.

Cross-validate before use (B grade): Coinbase-via-FRED is fine for daily crypto reference but the 17:00 PT single-venue print must be cross-checked against Binance Vision for any backtest with $-PnL implications. Hyperliquid REST is canonical for what's happening on Hyperliquid, but the JELLY precedent means any small-cap or HIP-3 perp signal must be quorum-checked against at least two CEX feeds before triggering. DeFiLlama is the best available DeFi yields oracle but reward-APY must be discounted by the live-spot-vs-realized-emission gap — the existing apy_oracle policy of haircutting reward APY by 50% is conservative and should stay.

Downgrade or replace (C grade): Stooq's silent close-vs-adjusted-close conflation and undocumented missing-day behaviour mean it should be removed as a primary source for any equity series MacroGuru actually trades against; demote to "third opinion" only. Yahoo Finance/yfinance is structurally fragile and ToS-questionable — keep it only as a free fallback and migrate the primary path to a paid feed (Polygon.io, Tiingo, or direct exchange data) before the next strategy iteration. GDELT stays as a regime-feature input only — never as a trade trigger.

Avoid (D grade): Twitter/X RSS mirrors for political-influencer sentiment. There is no reliable free path in 2026. Either budget for the paid Twitter API or remove influencer-tweet sentiment from the engine entirely; the current architecture should not be wired to any rsshub.app-style mirror.

The recurring theme: every source has a publisher, a cadence, a revision behaviour, and a worst-case tail. The sources MacroGuru can rely on are the ones where all four are documented in writing by the publisher itself. Everything else needs a cross-source quorum or a haircut.

MacroGuru — Data Source Quality Assessment

1. FRED — api.stlouisfed.org

2. Coinbase via FRED — CBBTCUSD, CBETHUSD

3. Stooq — stooq.com

4. Binance Vision — data.binance.vision

5. Yahoo Finance — query1.finance.yahoo.com / yfinance

6. Hyperliquid REST — api.hyperliquid.xyz/info

7. DeFiLlama Yields — yields.llama.fi

8. deBridge DLN — dln.debridge.finance

9. GDELT 2.0 — api.gdeltproject.org