Framework v0.2

Failure Mode Definitions

Taxonomy of nine failure modes for AI-assisted research.

Overview

This document defines the nine Failure Modes (FMs) that form the diagnostic layer of the Research Grounding Framework. Each FM is a distinct class of AI research quality failure — a way that AI-assisted output can be wrong, misleading, or unreliable in a manner that is not caught by ordinary review.

FMs are organized by mechanism: what goes wrong, at what point, and why. A secondary property is noted for each: whether the failure is primarily a generation failure (the AI produces incorrect or distorted content) or a retrieval failure (the AI surfaces an incomplete or biased evidence base). Several FMs are compound — they involve both.

The FM taxonomy is the abstraction layer that precedes grounding intervention selection. The correct sequence is: classify the claim type and its highest-risk failure mode, then select the appropriate GI. Applying GIs without FM classification produces exhaustive checking, which is impractical, or arbitrary checking, which is no better than intuition.

This is a working version subject to revision through use. The GI definitions that map against this taxonomy, and the matrix connecting them, are maintained as companion documents.

FM-1: Fabrication

Type: Generation failure

A claim is made and attributed to a source that does not exist, cannot be located, or — in a stricter form — exists but contains no material relevant to the claim. The AI has generated a plausible citation or supporting reference from parametric memory rather than from actual retrieval.

Fabrication is the most structurally dangerous failure mode because it is self-concealing: a fabricated source looks like a real source in prose. It passes the surface coherence checks that most readers apply. Detection requires independent verification of source existence, which ordinary reading does not involve.

Scope boundary: Fabrication is an existence failure, not an accuracy failure. A source that exists but misrepresents its findings is Attribution Drift (FM-2), not Fabrication. Fabrication applies strictly to the case where the cited source is absent, inaccessible, or entirely unrelated.

Adjacent failure modes:

FM-2 Attribution Drift: Fabrication produces a missing source; attribution drift produces a present source that doesn’t support the claim. The diagnostic step is the same (locate the source); the finding differs.
FM-4 Synthesis Validity: AI-generated synthesis can shade into fabrication when the AI presents a synthesized claim as if sourced. Distinction: fabrication attaches a specific false citation; synthesis validity involves an unsourced claim presented as derived from the literature without specifying a false source.

Notes: Fabrication rates vary substantially by model generation and domain. Quantitative figures from the corpus (GPTZero ~17% hallucinated citation rate in ICLR 2026 early submissions; Pangram Labs ~21% of ICLR 2026 peer reviews AI-generated) should be treated as time-stamped estimates against specific model versions, not stable baselines. The existence of fabrication as a failure mode is well-established across all model generations studied; the rate is not stable.

FM-2: Attribution Drift

Type: Generation failure (with retrieval component)

A source exists and is accessible, but the claim attached to it has drifted from what the source actually says. The drift may be a paraphrase that subtly overstates, a finding generalized beyond its scope, a caveat omitted, a causal claim substituted for a correlational one, or a conclusion applied to a population or context the study did not examine.

Attribution drift is a failure of fidelity between source and claim. Unlike fabrication, the citation is real and will survive an existence check. Detection requires reading the source and comparing it against the attributed claim.

Scope boundary: Attribution drift requires a directional error — the claim has moved away from what the source supports. A claim that accurately represents the source but draws an unwarranted inference beyond the source is Synthesis Validity (FM-4). A claim that accurately represents the source in isolation but omits contradicting evidence is Absence of Disconfirmation (FM-3).

Adjacent failure modes:

FM-1 Fabrication: Fabrication — no source; attribution drift — source present but misrepresented.
FM-4 Synthesis Validity: Attribution drift distorts a single source; synthesis validity constructs a claim across sources.
FM-5 Confidence Miscalibration: Attribution drift frequently produces confidence miscalibration as a downstream effect.

Notes: The source validation log found attribution drift in multiple corpus entries — most commonly as overstated universality (Sharma et al., Shapira et al.) or omitted caveats (Fernandes et al.). The pattern of AI-assisted synthesis consistently amplifying findings beyond what primary sources claim is itself a demonstration of FM-2 operating at scale via sycophantic retrieval.

FM-3: Absence of Disconfirmation

Type: Retrieval failure

The evidence base supporting a claim has been populated with confirming sources; evidence that contradicts, qualifies, or limits the claim has not been retrieved. The AI-assisted search process has returned a biased sample of the available evidence, and the claim rests on that biased sample without acknowledgment.

The failure is not that no contradicting evidence exists — it may not. The failure is that no search for contradicting evidence was conducted. A claim that has been actively tested against opposing evidence and found to stand is epistemically different from a claim that has never been tested.

Scope boundary: This failure mode applies to the retrieval process, not to the claim itself. A claim can be true and still exhibit FM-3 if the researcher has not searched for contradicting evidence. FM-3 is a procedural failure in how the evidence base was assembled.

Adjacent failure modes:

FM-7 Omission: FM-3 is a directional bias — confirming material retrieved, disconfirming material not. FM-7 is a completeness failure — relevant material not retrieved regardless of direction. FM-3 implies a skew; FM-7 implies a gap.
FM-8 Pragmatic Distortion: FM-3 is a retrieval failure that can cause FM-8. FM-3 is corrected by conducting the disconfirmation search; FM-8 may persist even after the search if the framing of the complete evidence base remains asymmetric.

Notes: AI-assisted search exhibits a documented tendency toward sycophantic retrieval — surfacing material that confirms the framing of the query. Mandatory Disconfirmation Search (GI-5) and Contradiction Forcing (GI-6) are specifically designed to counter this tendency.

FM-4: Synthesis Validity

Type: Generation failure

The AI combines material from multiple sources — or generates from parametric memory — to produce a claim not present in any single source. The synthesis may be coherent, plausible, and even correct, but it is a constructed position rather than a documented one. It is presented, implicitly or explicitly, as if it were a finding from the literature rather than an inference made by the AI.

Synthesis validity failures are particularly difficult to detect because they do not involve a false citation (FM-1) or a misread source (FM-2). The claim simply doesn’t exist anywhere in the sourced material.

Scope boundary: Not all synthesis is invalid. Warranted synthesis — explicit, reasoned inference from clearly identified sources — is legitimate and often essential. FM-4 applies when the inferential step is unacknowledged, when the claim is presented as if directly sourced, or when the inferential step is acknowledged but unwarranted.

Adjacent failure modes:

FM-1 Fabrication: Fabrication attaches a false citation; FM-4 produces a claim without citation basis.
FM-2 Attribution Drift: FM-2 involves a single source misrepresented; FM-4 involves synthesis not present in any source.
FM-5 Confidence Miscalibration: FM-4 claims are inherently T4 (AI-generated synthesis). If treated as T1 or T2, FM-5 is the downstream effect.

Notes: The epistemic tier system in GI-3 (Epistemic Status Tagging) directly addresses FM-4 by requiring T4 designation for AI-generated synthesis.

FM-5: Confidence Miscalibration

Type: Generation failure

The epistemic status of a claim is not tagged, is incorrectly tagged, or is systematically misrepresented in the weight given to it in an argument. A practitioner observation is treated with the evidentiary weight of a randomized controlled trial. An AI-generated synthesis is presented as a primary empirical finding. A single-study result is generalized as if it were meta-analytic consensus.

Confidence miscalibration is often invisible in well-written prose — the language of certainty is consistent regardless of whether the underlying evidence warrants it.

Scope boundary: FM-5 is an epistemic-status failure, not an accuracy failure. A claim can be accurate and still exhibit FM-5 if its epistemic status is misrepresented.

Adjacent failure modes:

FM-2 Attribution Drift: Attribution drift can produce miscalibration as a downstream effect. Treat these as independent errors requiring independent correction.
FM-4 Synthesis Validity: T4 claims presented as T1/T2 are simultaneously FM-4 and FM-5. Both corrections are required.

Notes: Fernandes et al. (2025) provides empirical grounding for FM-5 — AI use produces universal overestimation and AI-literate users show lower metacognitive accuracy. The working tier system (T1–T4) is defined in GI-3. FM-5 can also be produced downstream of FM-2, FM-4, and FM-6, making it both a standalone failure mode and an indicator that another failure mode has occurred upstream.

FM-6: Contextual Override

Type: Generation failure

The AI defaults to a familiar, training-data-dominant response pattern when the actual task requires processing context that differs materially from that pattern. The model responds to what the prompt resembles rather than what it says. Two mechanisms produce this failure, both resulting in outputs that are contextually inappropriate despite appearing fluent:

Mechanism A — Pattern-match override (training-pattern dominance): The model recognizes a prompt as resembling a familiar problem type and applies the trained solution for that type, overriding context-specific details that modify or contradict it. Documented by Soffer et al. (2025, npj Digital Medicine) in modified lateral thinking puzzles and medical ethics scenarios: models reverted to canonical solutions even when the scenario was explicitly altered. Error rates were 58–92% for lateral thinking tasks and 76–96% for medical ethics scenarios. Consistent with dual-process theory — System 1 (fast, pattern-based) overriding System 2 (slow, deliberative) processing.

Mechanism B — Unfaithful CoT (CoT-output mismatch): The model’s chain-of-thought trace arrives at a correct or qualified conclusion, but the final generated output contradicts or ignores it. The verbalized reasoning is not a reliable window into the computation that produced the output — it is closer to post-hoc rationalization generated to justify an answer already determined by other means. Primary source: Turpin et al. (2023), Language Models Don’t Always Say What They Think — existence check pending; cited as provisional. Post-hoc rationalization rates vary substantially by model class: GPT-4o-mini ~13%, down to ~0.04% for reasoning-optimized models. Thinking models are meaningfully more faithful but not fully faithful.

Both mechanisms produce outputs that look correct on surface inspection. Detection requires either comparison against the actual context (Mechanism A) or inspection of the reasoning trace where available (Mechanism B) — though Mechanism B makes reasoning traces unreliable as verification tools.

Scope boundary: Contextual override is distinct from fabrication (FM-1) — the content may be accurate for the canonical case; the failure is in its application to a modified context. Distinct from attribution drift (FM-2) — no source is involved. Distinct from structural drift (FM-9) — FM-6 is a within-turn failure; FM-9 is a cross-turn failure accumulating over a session.

Adjacent failure modes:

FM-1 Fabrication: Both produce incorrect outputs, but FM-1 involves false sourcing; FM-6 involves correct sourcing applied to the wrong context.
FM-9 Structural Drift: FM-9 is session-level framing accumulation; FM-6 is single-turn pattern substitution. Session Reset (GI-13) addresses FM-9; it has limited effect on FM-6 because the override is triggered by the prompt itself.

Notes: Mechanism B (unfaithful CoT) has an important safety implication: CoT traces cannot be trusted as a verification or monitoring channel. A model can generate a plausible-looking reasoning trace while the output was determined by other means. GI-11 (Adversarial Probing) provides partial post-generation coverage but cannot rely on the reasoning trace as evidence. GI-14 (Context-Delta Marking) targets Mechanism A at the pre-generation stage; no equivalent pre-generation intervention exists for Mechanism B — this gap should be named explicitly. The matrix coverage analysis identified FM-6 as the most thinly covered failure mode in the prior version; GI-14 and expanded GI-4 designation partially address this.

Status flag (Mechanism B): Turpin et al. 2023 cited as provisional primary source, existence check pending per source-validation-log Entry 9. Remove this flag when existence check is complete.

FM-7: Omission

Type: Retrieval failure

Relevant information exists in the available evidence base but is not surfaced in the AI’s output. The output is not wrong about what it says — it is incomplete in what it includes. The failure is in coverage, not in accuracy.

Omission can be selective (consistent pattern of omitting certain types of findings) or incidental (a relevant source simply not retrieved due to query construction or training-data gaps). Both result in a materially incomplete picture of the evidence.

Scope boundary: FM-7 is a completeness failure, not a directional bias failure. Absence of disconfirmation (FM-3) is a specific, directional form of omission. FM-7 is broader: the omitted material may be confirming, disconfirming, contextualizing, or qualifying. All FM-3 failures are FM-7 failures; not all FM-7 failures are FM-3 failures.

Adjacent failure modes:

FM-3 Absence of Disconfirmation: FM-3 is a subset of FM-7. The distinction is operationally important: FM-3 has a dedicated intervention (GI-5); FM-7 in its broader form is addressed by GI-15 (Scope Enumeration).
FM-8 Pragmatic Distortion: FM-7 omits material; FM-8 distorts material that is present.

Notes: The CMU/Toronto deception taxonomy (Shi et al. 2026) provides empirical grounding for FM-7’s importance — their gap analysis of 50+ benchmarks found omission to be critically under-benchmarked relative to fabrication. GI-15 (Scope Enumeration, new in v0.2) provides FM-7’s first dedicated primary intervention. GI-7 (9-Windows) provides secondary coverage by systematically expanding the frame across subsystem/supersystem/time dimensions.

FM-8: Pragmatic Distortion

Type: Generation failure

A claim is technically accurate — the source exists, the attribution is correct, the epistemic status is appropriately assigned — but the framing, emphasis, or presentation creates a misleading impression. Common forms: asymmetric emphasis (benefits foregrounded, costs buried), false balance (minority and majority positions presented as equivalent), decontextualization (a finding presented without limiting conditions), and sycophantic framing (output shaped toward what the researcher appears to want to hear).

Pragmatic distortion is the failure mode most resistant to source-based checking, because the sources are real, accurately cited, and faithfully represented. The distortion occurs at the level of selection, emphasis, and arrangement.

Scope boundary: FM-8 requires a technically accurate underlying claim. If the claim itself is false, one of FM-1 through FM-4 applies. FM-8 is specifically the case where the claim is accurate but the presentation is misleading.

Adjacent failure modes:

FM-3 Absence of Disconfirmation: FM-3 is a retrieval failure that can cause FM-8. FM-8 may persist even after the disconfirmation search if framing remains asymmetric.
FM-9 Structural Drift: FM-9 produces FM-8 as a downstream effect. FM-9 is the session-level accumulation; FM-8 is the claim-level manifestation.

Notes: The TRIZ-derived GIs (Contradiction Forcing GI-6, 9-Windows GI-7) were specifically designed to counter FM-8 by making it structurally impossible to return a one-sided output. FM-8 is the most heavily covered failure mode in the GI set (4 primary GIs), reflecting the framework’s architectural emphasis on surfacing tradeoffs. Kim et al. (2026) provides the clearest articulation of why FM-9 and FM-8 are distinct: structural drift does not require the AI to agree with the user — it operates through meaning-expansion regardless of stance.

FM-9: Structural Drift

Type: Generation failure (session-level)

Over the course of a multi-turn research session, the AI’s outputs increasingly reflect and amplify the framing established in the initial exchanges. Later outputs are not generated from a neutral starting position — they are generated from a context window that has been progressively shaped by earlier turns. The frame tightens. Confirming material is more readily produced. Qualifying or contradicting material recedes.

Two mechanisms operate at the session level:

Domain amplification: Anomaly levels or intensity within an existing concern increase across exchanges — the AI escalates the frame already present. Documented in Kim et al. (2026) across Atmosphere (d=0.46), Ipseity (d=0.31), Intersubjectivity (d=0.33), and Temporality (d=0.14) domains.

Domain expansion: New interpretive frames, concerns, or dimensions appear in AI responses that were absent from the user’s input — the AI actively introduces new dimensions rather than merely amplifying existing ones. Domain expansion occurred in 83.8% of dialogues in Kim et al. (2026), with LLM responses introducing a mean of 0.675 new domains per exchange. The divergence begins within the first 10% of normalized dialogue time and widens progressively.

Both mechanisms operate regardless of whether the AI agrees with the user — structural drift does not require sycophancy. The distortion is in how meaning is scaffolded across time, not in stance.

Scope boundary: FM-9 is a session-level, cumulative failure. A single output that is sycophantic or framing-captured without accumulated session context is better classified as FM-8 at the instance level. FM-9 applies when the distortion is traceable to accumulated session context. Session Reset (GI-13) is partly diagnostic: if a reset produces materially different outputs on the same prompt, FM-9 was active.

Adjacent failure modes:

FM-8 Pragmatic Distortion: FM-9 produces FM-8 as a downstream effect, but they operate at different levels. FM-8 can occur without FM-9; FM-9 will reliably produce FM-8 if left uncorrected.
FM-6 Contextual Override: Both involve the model responding to something other than the actual prompt content. FM-6 is triggered by the prompt itself (training-pattern memory); FM-9 accumulates across the session (context window accumulation).

Notes: Session Reset (GI-13) is the primary counter to FM-9. Its operational cost — cleared context must be partially restored through a researcher-constructed grounding summary — is itself a grounding act. A poorly constructed re-grounding summary can reintroduce the drift it was intended to clear. Recommended use is as a periodic structural discipline at natural break points, not only when drift is already visible.

Structural Observations

FM-3 as a subset of FM-7

FM-7 (Omission) is the broader category; FM-3 (Absence of Disconfirmation) is a directional specialization. The two are distinguished because FM-3 has a dedicated intervention set (GI-5, GI-6, GI-12) while FM-7’s broader form is addressed by GI-15 (Scope Enumeration, new in v0.2).

FM-8 and FM-9 as a level pair

FM-8 operates at the claim level; FM-9 operates at the session level. FM-9 reliably produces FM-8 downstream, but FM-8 can occur independently of FM-9.

FM-5 as a cross-cutting property

Confidence Miscalibration can be produced downstream of FM-2, FM-4, and FM-6. It functions as both a standalone failure mode and an indicator that another failure mode has occurred upstream. Detection of FM-5 should prompt investigation of its source failure mode.

FM-6 Mechanism B — provisional status

Mechanism B (unfaithful CoT) has been reinstated with Turpin et al. (2023) as provisional primary source, existence check pending. Do not cite forward as validated until existence check is complete.

Novel agentic failure modes outside this taxonomy

The QRP→AI analog mapping (2026-06-03 research pass) confirmed that novel agentic failure modes — memory poisoning, adversarial prompt injection, cross-agent trust escalation — have no QRP analog and are not addressed by this taxonomy. This taxonomy covers single-session and multi-turn AI-assisted research failures; multi-agent pipeline failures require separate treatment.

Companion documents: grounding-interventions-definitions.md (v0.2), failure-mode-gi-matrix.md (v0.2) Version history: v0.1 — Initial definitions, 2026-06-03 | v0.2 — FM-6 Mechanism B reinstated (provisional); FM-9 domain expansion added; agentic scope boundary added, 2026-06-03

← Back to Framework