How the research gets done.
Every WireSift claim ties back to a primary document. This page documents the methodology behind the AI Adoption Tracker — the source tiering, extraction approach, models used, and validation layers — so any reader can audit how a finding was produced.
The source hierarchy
WireSift uses a tiered source framework, where every claim in our research is anchored to the strongest available evidence. Tiers run from 1 (strongest) to 7 (weakest), and the tier of each source is disclosed in our claim ledger.
- Tier 1 — primary documents under regulatory or legal obligation (10-K filings, earnings call transcripts, court filings, peer-reviewed research). Earnings calls are Tier 1 because management speaks under securities-disclosure obligations.
- Tier 2 — primary documents not under formal obligation (investor day presentations, board letters, official communications).
- Tier 3 — direct journalism by reputable outlets quoting Tier 1 sources verbatim.
- Tiers 4–7 — reduced rigor, used as supporting context only and labeled accordingly.
The AI Adoption Tracker pipeline
The AI Adoption Tracker reads every S&P 500 Q1 2026 earnings call transcript through a structured extraction pipeline, producing a comparable, auditable dataset. The pipeline runs in three stages.
1. Source acquisition
Earnings call transcripts are pulled from Financial Modeling Prep (FMP), which sources directly from the call audio. Each transcript is cached locally with a SHA-256 hash so we can verify integrity and detect any upstream revisions.
2. Structured extraction
Each transcript runs through a single-pass extraction using Anthropic’s Claude Sonnet 4.6 with a versioned schema. The schema captures:
- Mentions:every distinct AI claim made by management, with the verbatim quote, speaker, role, section (prepared remarks vs. Q&A), specificity score (1–5), ai_scope, time horizon, and named entities (products, partners, models, customers).
- AI revenue disclosure: whether management disclosed AI revenue, the disclosure method (GAAP segment, ARR, run-rate, bookings, qualitative), and the verbatim quote.
- Disclosure gaps: analyst questions where management declined to quantify a number — often as informative as the disclosures themselves.
- AI activities:investments (capex, opex, R&D, acquisitions), realized outcomes (productivity gains, cost savings), partnerships (model providers, hyperscalers, chip vendors).
- Specificity score on a 1–5 scale: 1 = aspirational, 2 = directional, 3 = operational, 4 = quantified, 5 = financialized (specific dollar / margin / revenue figure tied to AI).
Every extracted claim must be backed by a verbatim quote present in the source transcript. The pipeline fails closed when a quote can’t be located in the source — extraction quality is non-negotiable.
3. Quality gates
Two layers of validation run on every extraction before it enters the public dataset:
- Layer 1 (per-extraction): automated checks for quote integrity, scope coherence, schema compliance, and internal consistency. Flags surface for manual review before publishing.
- Layer 2 (cross-model): a stratified random sample (~10%) is independently re-extracted by Claude Opus 4.6 and compared field-by-field. Aggregate agreement on substantive judgments runs at 80%+; disagreements are reviewed manually.
Editorial choices we disclose
A few editorial calls are applied at render time on the public tracker. Each is disclosed in the chart’s source line:
- Big Tech treated as Information Technology. Four companies are reclassified into the IT sector regardless of their GICS label, because their AI commentary is dominated by tech-AI use cases and treating them as their nominal sector obscures the real picture:
- Alphabet (GOOG) — GICS Communication Services. AI commentary is Cloud, Search, and Workspace.
- Amazon (AMZN) — GICS Consumer Discretionary. AI commentary is dominated by AWS.
- Meta (META) — GICS Communication Services. AI commentary is AI infrastructure, Llama, and AR/VR.
- Tesla (TSLA) — GICS Consumer Discretionary. AI commentary is FSD, Optimus, and robotaxi.
- Alphabet share classes consolidated. Alphabet’s Class A (GOOGL) and Class C (GOOG) shares represent the same legal entity and the same earnings call. GOOGL is excluded from all aggregates so Alphabet appears once, under GOOG. Disclosed in the chart source line.
Versioning and change tracking
The extraction schema is semver-versioned. Old extractions are never deleted — when the schema changes (a new field, a refined controlled vocabulary, a renamed enum), prior data stays in its original schema version and the change is logged in our public changelog. This means a finding shipped under schema v2.0 can always be reproduced from the v2.0 record.
What we won’t do
A few discipline points worth naming explicitly:
- No LLM-side normalization of quantifications. Numbers are extracted as raw strings; any normalization (units, currency, period) happens downstream in deterministic Python code. This keeps the audit trail clean — what the model extracted is what management said.
- No company-specific prompt engineering.The same prompt runs against every transcript. We don’t tune extraction for individual companies, which would bias comparability.
- No editorial paraphrase as a substitute for the quote.Every claim shows the verbatim quote, even when it’s long or awkward. The quote is the evidence; our framing around it is commentary.
Open methodology
The full pipeline source code, schema, prompts, and changelog are public on GitHub. Anyone can audit how a claim was produced — or fork the pipeline against a different universe of companies.
Questions
Methodology questions, data licensing inquiries, or audit requests: info@wiresift.com.