Skip to content

Diagnostics

The adaptive recovery machinery: inline probe, failure-mode taxonomy, reward refiner.

curatorkit.diagnostic.failure_modes

Failure mode taxonomy for the CuratorKIT diagnostic loop.

FailureMode classifies WHY a sample was rejected by a quality gate (hallucination, reward, or diversity), so rejections become actionable: each mode maps to a concrete fix (lower the temperature, tighten the grounding prompt, regenerate the question, ...) instead of an opaque drop.

Recovery is INLINE: DiagnosticProbe attempts each probe path and stores the passing sample in FailureDiagnosis.recovered_sample if any path succeeds. There is no separate pass 2 — the pipeline routes recovered samples forward immediately.

FailureDiagnosis dataclass

FailureDiagnosis(mode: FailureMode, evidence: list[bool] = list(), probe_calls: int = 0, notes: dict[str, Any] = dict(), recovered_sample: DataSample | None = None)

Result of DiagnosticProbe.diagnose(). Attached to RejectedSample.diagnosis.

Fields

mode : FailureMode — the diagnosed cause evidence : list[bool] — Probe 1 pass/fail pattern [T=0.3, T=0.5] probe_calls : int — total LLM calls consumed by all probes notes : dict — extra info (e.g. which prompt variant succeeded) recovered_sample : DataSample | None — the passing re-generation from the probe, if any probe path succeeded. Pipeline routes this back into the accepted pool inline. None means all probes were exhausted.

Typical uses

mode + evidence aggregate into per-mode rejection breakdowns (see PipelineDiagnostics and diagnostic_summary.json); probe_calls tracks the LLM budget the probe consumed, so recovery yield can be cost-normalised; recovered_sample is not None marks an actual inline recovery.

was_recovered property

was_recovered: bool

True when the probe produced an inline passing re-generation.

to_dict

to_dict() -> dict[str, Any]

Serialise to plain dict for rejected.jsonl output.

curatorkit.diagnostic.diagnostics

PipelineDiagnostics — run-level accumulator for failure diagnoses.

Held by the Pipeline instance when the probe is active. Passed through PipelineResult to Curator, then accessible to the caller via result.diagnostics.

Recovery is INLINE: probe_recovery_count() reports samples where the DiagnosticProbe actually produced a passing re-generation. This replaces the old hypothetical recovery_rate() which counted RECOVERABLE dict flags.

Typical uses

mode_counts() and probe_recovery_count() feed the per-mode rejection breakdown written to diagnostic_summary.json; total_probe_calls() tracks the LLM budget the probe consumed, so recovery yield can be cost-normalised.

PipelineDiagnostics

PipelineDiagnostics()

probe_recovery_count

probe_recovery_count() -> int

Number of samples where the probe produced an inline passing re-generation.