Adaptive Recovery¶

Not all gate rejections are real failures. Some happen because the LLM picked the wrong temperature, the prompt framing was slightly off, or the score landed just below the threshold. The recovery system diagnoses why a sample was rejected and attempts a targeted repair before declaring it lost.

Two mechanisms work at different points in the pipeline:

Mechanism	When it runs	What it does
Inline probe	Immediately after each gate's rejects, before the next gate	Diagnoses the failure mode, retries generation at different temperatures or with different prompts, re-evaluates
Reward refiner	Post-pipeline, after all gates	Rewrites the answer targeting the weakest quality dimension, re-evaluates with RewardGate

Enable both with:

CuratorConfig(
    enable_diagnostic_probe = True,
    enable_reward_refiner   = True,
)

How inline recovery works¶

The pipeline runner checks for an attached probe after every gate. If one exists and there are rejected samples, it runs probe.diagnose_batch(rejected) before passing samples to the next stage:

Gate N
  ├─► passed
  └─► rejected ──► probe.diagnose_batch()
                     ├─► recovered  ──► merged with passed ──► Gate N+1
                     └─► still rejected ──► final rejected list

This means: if a probe is attached to both HallucinationGate and RewardGate, HallucinationGate probe recovery fires before RewardGate sees any samples.

Inline probe routing¶

The probe uses the gate's score to decide which path to try first:

Score >= probe_score_split (default 0.5)  →  near-boundary path
  └─► Temperature sweep (configured by probe_temperatures, e.g. [0.3, 0.5])
        ├─► All pass?            → THRESHOLD_MARGINAL (score was borderline)
        ├─► First pass, last fail → GENERATOR_TEMPERATURE (lower temp fixes it)
        ├─► Mixed results?       → THRESHOLD_MARGINAL
        └─► All fail?            → proceed to prompt variants (strict_grounding, ...)

Score < probe_score_split  →  clearly-failing path
  └─► Strict grounding prompt first
        ├─► Pass? → recovered (FailureMode: GENERATOR_PARAMETRIC)
        └─► Fail? → temperature sweep → remaining prompt variants

Call budget:

Outcome	LLM calls consumed
`rejected_above_threshold` (DPO contrast failure)	0 — early exit
Temperature sweep resolves at T=0.3	1
Temperature sweep resolves at T=0.5	2
Strict grounding resolves immediately	1
All probes exhausted	5 (worst case)

Failure mode taxonomy¶

Mode	Gate	What it means
`GENERATOR_TEMPERATURE`	Hallucination	High temperature caused drift from source; lower temp fixes it
`GENERATOR_PARAMETRIC`	Hallucination	Model answered from prior knowledge, ignored source passage
`SOURCE_AMBIGUOUS`	Hallucination	Source/response relationship unclear to the judge
`THRESHOLD_MARGINAL`	Hallucination	Score just below threshold; borderline case
`INSTRUCTION_QUALITY`	Reward	Generated question is poorly formed
`RESPONSE_QUALITY`	Reward	Answer is on-topic but too shallow; or DPO contrast failure (0 calls)
`DOMAIN_MISMATCH`	Reward	Prompt framing wrong for the domain
`NEAR_DUPLICATE`	Diversity	Too similar to an accepted sample; not recoverable
`UNKNOWN`	Any	All probes failed; inconclusive

Probe configuration¶

CuratorConfig(
    enable_diagnostic_probe = True,
    probe_temperatures      = [0.3, 0.5],     # temperature sweep values
    probe_score_split       = 0.5,            # routing boundary
    probe_generator_model   = None,           # None = use llm_model
    probe_extra_templates   = {},             # override named prompt templates (see below)
)

Custom probe templates¶

The probe selects from named templates, and its routing is fixed to named paths — probe_extra_templates does not add new probe paths. What it does:

Overriding strict_grounding, domain_specific, or default replaces the prompt the probe uses on that path. This is the main use case.
Overriding generate_question has no effect — the question-regeneration path always uses the built-in template.
New keys are stored but only used when a sample's metadata["domain_prompt_key"] names one of them: the domain-variant probe then uses that template instead of domain_specific. Keys that no sample's metadata points to are never selected.

CuratorConfig(
    enable_diagnostic_probe = True,
    probe_extra_templates   = {
        "strict_grounding": (
            "Answer this question using ONLY the provided passage. "
            "Quote specific sentences where relevant.\n\n"
            "Passage:\n{source}\n\nQuestion:\n{question}"
        ),
        "domain_specific": (
            "You are a legal analyst. Answer using only the passage text. "
            "Be precise about dates, parties, and obligations.\n\n"
            "Passage:\n{source}\n\nQuestion:\n{question}"
        ),
    },
)

Built-in template keys:

Key	Used when	Overridable via `probe_extra_templates`
`strict_grounding`	GENERATOR_PARAMETRIC path — force passage-only answer	Yes
`domain_specific`	DOMAIN_MISMATCH path — domain-adapted prompt (unless `metadata["domain_prompt_key"]` selects another key)	Yes
`generate_question`	INSTRUCTION_QUALITY path — regenerate the question	No — built-in always used
`default`	Temperature-sweep regenerations	Yes

Template variables: {source} (source text) and {question} (the instruction/question).

Reward refiner¶

Runs post-pipeline on samples that RewardGate still rejected after the inline probe. It reads the weakest scoring dimension from the gate's provenance, rewrites the answer targeting that specific axis, then re-evaluates.

CuratorConfig(
    enable_reward_refiner            = True,
    reward_refine_prompt_template    = None,   # None = built-in template
    reward_instruction_refine_template = None, # None = built-in; for question rewrites
)

What it skips: Samples with rejected_above_threshold (DPO contrast failures). These cannot be fixed by rewriting — the generation contrast needs to be fixed at the source.

Refiner output metadata:

{
  "reward_refined": true,
  "refinement_type": "answer_rewrite",
  "refinement_axis": "depth"
}

For DPO pairs: the refined answer becomes the new chosen; the original adversarial response remains as rejected.

Reading diagnostics¶

When enable_diagnostic_probe=True, result.diagnostics is populated:

d = result.diagnostics.to_dict()

d["total_diagnosed"]    # how many rejected samples were diagnosed
d["probe_recovered"]    # how many were recovered by the probe
d["probe_recovery_pct"] # recovery rate as a fraction (0.0-1.0)
d["total_probe_calls"]  # total LLM calls consumed by the probe
d["mode_counts"]        # {"generator_temperature": 14, "response_quality": 9, ...}

Interpreting mode_counts: - High generator_temperature → lower llm_temperature or try probe_temperatures=[0.2, 0.4] - High generator_parametric → add a strict_grounding template override - High response_quality with 0 probe calls → rejected_above_threshold; fix in generation config - High instruction_quality → generated questions are weak; adjust the generation task's prompt_template or difficulty

When NOT to enable recovery¶

adversarial_qa: The HallucinationGate is the intended filter — injected samples should fail. Enabling the probe will attempt to repair them back to grounded answers, defeating the purpose of adversarial generation.

rejected_above_threshold: The probe always exits immediately for these (0 LLM calls, no recovery). Save the budget by inspecting mode_counts first — if all failures are response_quality, the probe is not providing value and the generation config needs fixing instead.