Tutorial

How to match I/B/E/S vintage without the landmines?

A practical step-by-step Stata recipe for cross-vintage matching.

Why this tutorial exists

Comparing two IBES Unadjusted Detail History drops (say a 2023 vintage vs. a 2025 vintage) looks trivial. In practice, it’s where many projects quietly go off the rails. Small, harmless changes in values, currency flags, names, and time-of-day stamps can make the same forecast event look "different" across vintages. And a naive merge will "discover" millions of spurious non-matches. That, in turn, can conclude that IBES anonymizes forecasts.

The fix is simple:
(1) put both vintages on the same information set (an as-of window), and
(2) merge on a stable event key that identifies the event, not its attributes.

This workflow is grounded in what we know about how IBES evolved in recent years:

Together, these facts justify the design choices below and explain why a careful merge often yields >99% overlap after basic hygiene. (The small residual is real and diagnosable, rather than a keying artifact.)

Identity vs. attributes

Think of each row as a library card for a forecast event. Identity should not depend on decorative details.

Identity (use in the key):

Attributes (never in the key):

Those attribute fields often change across drops without implying a different event. Keying on them manufactures "missing" that isn’t real. The "identity vs. attributes" distinction is the single most important idea in this tutorial. And it is consistent with the documented history of anonymization, backfilling, and the relatively stable recommendation tape.

The 2 rules that prevent a lot of headaches

  1. Shared as-of window: Restrict both datasets to the same estimate-date cutoff (prefer STATPERS; fallback ANNDATS). This ensures you’re comparing what the earlier vintage could have contained.
  2. Stable analyst-event key: { firm, ANALYS, FPEDATS, FPI, MEASURE, estimate_date } This identifies the event. Everything else is metadata. Use it for analysis, not for identity.

A quick, readable recipe (Stata, ~45 lines)

Stata code recipe for IBES vintage matching

Let us run this.

What to expect? In large samples, the Jaccard overlap commonly exceeds 0.99 after these two steps. The remaining <1% is the diagnostic gold: structured, explainable differences we should label and (when relevant) carry into robustness checks.

The three things that usually explain the residual, and how to diagnose them

1) Tiny date refinements (±1 day)

It is common for estimate dates to shift by a day across releases as timestamps are cleaned or backfilled. Test a ±1 day tolerance on the estimate date while holding {firm, ANALYS, FPEDATS, FPI, MEASURE} fixed. If a large share reconciles, you’ve found the culprit.

How to test quickly (Stata tip): Use rangejoin (SSC) after you build a "short key" without edate, then match within ±1 day. (The goal is attribution, not creating a fuzzier main result.)

2) Analyst-ID maintenance / anonymization layer

Drop ANALYS from the key (i.e., move to the consensus level) and re-merge. If non-matches collapse, the issue lives in the analyst-ID layer: anonymization or maintenance that has been documented to primarily affect non-US forecasts after 2017Q1, with very little impact on US EPS forecasts and minimal changes in recommendations. (See Figure 1 and US vs. non-US tabulations in Tables 2–3.)

This is more than just housekeeping. Your paper’s research finding can sometimes depend on whether the instability is analyst-layer or event-layer. Let’s label it.

3) Firm-ID hygiene (especially non-US)

Tickers and display names are fragile. For diagnostics, create a hybrid firm id: CUSIP8 for US, SEDOL7 for non-US, else fall back to ticker. If mismatches fall when you use the hybrid, you are seeing identifier maintenance, not content loss. Keep your main key simple (firm), but report the diagnostic result.

Why this matters: US vs. non-US asymmetry in anonymization and that recommendations were largely unaffected. Hence they can help triangulate identity and even recover anonymized IDs in many cases. (See Table 3 for recommendations, and Table 4 for recovery rates.)

What not to do (and why)

Why editors and reviewers will care

Attribution table

After your main merge, run a three-line attribution table. Each row shows the share of non-matches resolved when you apply the step:

  1. ±1-day estimate-date tolerance (timestamp refinement)
  2. Drop ANALYS (consensus-level identity)
  3. Hybrid firm id (CUSIP/SEDOL hygiene)

An optional addition is FPI recoding check (are non-matches concentrated in specific horizon codes?) and an "older vs. newer years" split. This triage turns vague "data drift" into labeled mechanisms and mirrors best practice on documented structural changes and their timing.

FAQ (what your RAs will ask)

Q1. Why is STATPERS preferred to ANNDATS?
Because STATPERS represents when the estimate is on the tape from the database’s perspective. It aligns with the "information set" you want to hold fixed across vintages. If STATPERS is missing, use ANNDATS and document that choice.

Q2. Can I do this at the consensus level?
Yes. Use { firm, FPEDATS, FPI, MEASURE, estimate_date } (drop ANALYS). Just remember: if consensus merges beautifully but analyst-level does not, your issue is the ID layer (anonymization/maintenance), not event identity. (That asymmetry is exactly what you’d expect post-2017 outside the US.)

Q3. What about recommendations and price targets?
Recommendations are largely unaffected by the anonymization pattern that hits forecasts. They’re excellent for cross-checks and (in many cases) ID recovery. Price target reassignment is rare and often looks like corrections of names, not substantive changes. (See Table 3 and the discussion around the price-target file.)

Q4. Should I ever key on VALUE or currencies?
No, not for identity across vintages. Those are attributes, not identity. They legitimately change and will inflate false non-matches. Use them for analysis after you’ve matched events.

Q5. Are there "problem vintages" I should avoid?
Not at this moment. Be explicit about the download date you use. Avoid the ambiguous June-September 2019 window in point-in-time comparisons and follow the recommendations in the paper’s concluding section. (See "Recommendations" section.)

Final words

Adopt the two rules at the top. They prevent 90% of the pain when merging IBES vintage.

← Back to Good-bye I/B/E/S