Skip to content

Generators

LLM generation tasks. All subclass BaseGenerationTask and require the generation extra.

curatorkit.generators

LLM-powered generation tasks: QA pairs, preference pairs, GRPO rollouts, multi-turn dialogues, chain-of-thought traces, and adversarial variants.

AdversarialPreferenceTask

AdversarialPreferenceTask(llm: BaseLLM, num_questions: int = 1, injection_rate: float = 0.5, injection_types: list[str] | None = None, seed: int | None = 42, faithful_prompt_template: str | None = None, adversarial_prompt_template: str | None = None, difficulty: str = 'medium', concurrency: int = 10)

Bases: BaseGenerationTask

Generate DPO preference pairs with adversarially hallucinated rejected responses.

For each source chunk, generates num_questions faithful QA pairs (chosen), then for injection_rate fraction of pairs generates an adversarial variant of the answer as the rejected response. The remaining pairs use a quality- degraded variant (lower temperature re-generation) as the rejected response.

Parameters

llm : BaseLLM Generator LLM (used for both faithful and adversarial generation). num_questions : int Preference pairs per source chunk. injection_rate : float Fraction of pairs to inject with adversarial rejected responses (0–1). Remaining pairs get a naive re-generation at higher temperature as rejected. injection_types : list[str] | None Which adversarial types to sample from. None = all four. Options: contradicts_source, parametric_drift, domain_mismatch, instruction_quality seed : int | None RNG seed for reproducible injection assignment. faithful_prompt_template : str | None Custom prompt for faithful answer generation. Must include {context}, {num_questions}, {difficulty} placeholders. adversarial_prompt_template : str | None Custom prompt for adversarial rejected generation. Must include {context}, {question} placeholders. difficulty : str "easy" | "medium" | "hard" — passed to faithful generation prompt. concurrency : int

AdversarialQAGenerationTask

AdversarialQAGenerationTask(llm, injection_rate: float = 0.2, injection_types: list[InjectionType] | None = None, num_questions: int = 3, seed: int | None = 42, high_temp: float = 1.4, **kwargs)

Bases: QAGenerationTask

Generate QA pairs with a controlled fraction of adversarial samples injected directly in the first pass.

Parameters

llm : BaseLLM LLM for both faithful and adversarial generation. injection_rate : float Fraction of seeds to generate adversarially (0–1). Default 0.20. injection_types : list[InjectionType] | None Which adversarial types to use. Sampled uniformly. Default: all five. num_questions : int QA pairs per seed (faithful or adversarial). Default 3. seed : int | None Random seed for reproducible injection selection. high_temp : float Temperature used for high_temperature_drift injection. Default 1.4.

BaseGenerationTask

BaseGenerationTask(llm: BaseLLM, prompt_template: str | None = None, concurrency: int = 10, max_parse_retries: int = 1)

Bases: BaseNormalizer

Abstract base for LLM generation tasks.

Subclasses must implement

_build_messages(sample) -> list[dict[str, str]] Build the prompt messages for the LLM call.

_parse_response(sample, response) -> list[DataSample] Parse the LLM output into one or more DataSamples.

Parameters

llm : BaseLLM LLM backend to use for generation. prompt_template : str | None Custom prompt template. If None, uses the task's default. concurrency : int Number of concurrent async LLM calls (used by run_async).

task_name property

task_name: str

Human-readable task name for provenance.

rejected property

rejected: list[RejectedSample]

Samples that failed generation. Flushed by pipeline after run().

flush_rejected

flush_rejected() -> list[RejectedSample]

Return and clear accumulated rejected samples.

run

run(samples: list[DataSample]) -> list[DataSample]

Synchronous generation. Calls the LLM once per input sample.

Returns enriched DataSamples. Failed generations are collected into self.rejected (not returned in the output list).

run_async async

run_async(samples: list[DataSample]) -> list[DataSample]

Async generation with concurrency control.

Fires up to self.concurrency LLM calls in parallel.

ChainOfThoughtTask

ChainOfThoughtTask(llm: BaseLLM, mode: str = 'generate', prompt_template: str | None = None, cot_marker: str = '\n\n## Answer\n', concurrency: int = 10)

Bases: BaseGenerationTask

Generate chain-of-thought reasoning for instructions.

Parameters

llm : BaseLLM LLM backend for generation. mode : str "generate": Generate both CoT and answer from instruction only. "wrap": Given instruction + existing answer, generate CoT reasoning. prompt_template : str | None Custom prompt template. cot_marker : str String used to separate reasoning from the final answer in the output.

EvolInstructTask

EvolInstructTask(llm: BaseLLM, prompt_template: str | None = None, num_evolutions: int = 1, strategies: list[str] | None = None, generate_answers: bool = True, answer_prompt_template: str | None = None, concurrency: int = 10)

Bases: BaseGenerationTask

Evolve instructions into harder variants via LLM.

Parameters

llm : BaseLLM LLM backend for generation. prompt_template : str | None Custom prompt template. Must contain {instruction} and {strategy}. num_evolutions : int Number of evolved variants per instruction (each uses a different strategy). strategies : list[str] | None Which evolution strategies to use. Defaults to all five. generate_answers : bool If True, also generate answers for evolved instructions. answer_prompt_template : str | None Custom template for answer generation.

run

run(samples: list[DataSample]) -> list[DataSample]

Run evolution with strategy cycling.

Each sample gets num_evolutions variants, cycling through strategies. If generate_answers is True, a second LLM pass generates answers.

GRPORolloutTask

GRPORolloutTask(llm: BaseLLM, num_responses: int = 4, scoring_llm: BaseLLM | None = None, score_responses: bool = True, temperature_spread: float = 0.0, temperatures: list[float] | None = None, response_prompt: str | None = None, scoring_prompt: str | None = None, concurrency: int = 10)

Bases: BaseGenerationTask

Generate N diverse responses per prompt for GRPO training.

Parameters

llm : BaseLLM LLM backend for response generation. num_responses : int Number of responses to generate per prompt. scoring_llm : BaseLLM | None Separate LLM for scoring. If None, uses the same LLM. Set to None and score_responses=False for unscored rollouts. score_responses : bool Whether to score each response. temperature_spread : float Temperature variation across responses. Responses are generated at temperatures from (base - spread/2) to (base + spread/2). response_prompt : str | None Custom template for response generation. scoring_prompt : str | None Custom template for response scoring.

run

run(samples: list[DataSample]) -> list[DataSample]

Generate N responses per prompt with optional scoring.

Each input sample produces exactly one output sample with populated responses[] and reward_scores[] lists.

BadSampleInjector

BadSampleInjector(llm: BaseLLM, injection_rate: float = 0.2, injection_types: list[InjectionType] | None = None, seed: int | None = 42)

Inject controlled failure samples into a generated QA corpus.

Parameters

llm : BaseLLM LLM used to generate adversarial answers. injection_rate : float Fraction of samples to replace with injected failures (0–1). injection_types : list[InjectionType] Which failure types to inject. Types are sampled uniformly. seed : int | None Random seed for reproducible injection selection.

inject

inject(samples: list[DataSample]) -> list[DataSample]

Replace injection_rate fraction of samples with adversarial variants.

Returns a new list (original list is not mutated). Injected samples are shuffled back into the list at their original positions so the gate cannot exploit ordering.

The returned list has metadata["injected_failure"]=True on injected samples. All other samples are unchanged.

inject_by_type

inject_by_type(samples: list[DataSample], counts: dict[InjectionType, int]) -> list[DataSample]

Inject exact counts of each failure type (useful for controlled ablations).

samples are sampled without replacement per type.

MultiTurnTask

MultiTurnTask(llm: BaseLLM, num_turns: int = 3, mode: str = 'turn_by_turn', prompt_template: str | None = None, include_context: bool = True, concurrency: int = 10)

Bases: BaseGenerationTask

Generate multi-turn conversations from prompts or text chunks.

Parameters

llm : BaseLLM LLM backend for generation. num_turns : int Number of user-assistant exchange pairs. mode : str "turn_by_turn" (default) — each turn is a separate LLM call, conditioned on all prior real turns. "single_call" — one LLM call generates the full conversation. prompt_template : str | None Custom template for single_call mode. Required vars: {num_turns}, {context_section}, {initial_question}. include_context : bool If True, include source text as grounding context.

PreferenceGenerationTask

PreferenceGenerationTask(llm: BaseLLM, prompt_template: str | None = None, mode: str = 'single_call', chosen_prompt: str | None = None, rejected_prompt: str | None = None, concurrency: int = 10)

Bases: BaseGenerationTask

Generate preference pairs (chosen/rejected) for DPO training.

Parameters

llm : BaseLLM prompt_template : str | None Custom single-call template. Must contain {instruction} and optionally {context_section}. mode : str "single_call" — one LLM call generates both chosen and rejected. "two_pass" — separate LLM calls for chosen and rejected. chosen_prompt : str | None Two-pass template for chosen (must contain {instruction}, optionally {context_section}). rejected_prompt : str | None Two-pass template for rejected.

QAGenerationTask

QAGenerationTask(llm: BaseLLM, prompt_template: str | None = None, num_questions: int = 3, table_prompt_template: str | None = None, difficulty: str = 'medium', concurrency: int = 10)

Bases: BaseGenerationTask

Generate question-answer pairs from text chunks.

Parameters

llm : BaseLLM LLM backend for generation. prompt_template : str | None Custom prompt template. Must contain {context} and {num_questions}. num_questions : int Number of QA pairs to generate per chunk. table_prompt_template : str | None Separate prompt for table-derived chunks. difficulty : str Difficulty level hint: "easy", "medium", "hard".

run_multi_passage

run_multi_passage(seed_pairs: list[tuple[DataSample, DataSample]]) -> list[DataSample]

Generate cross-passage synthesis QA from pairs of adjacent chunks.

Each pair produces num_questions QA samples whose answers require facts from both passages. The combined context (passage_1 + passage_2) is stored in DataSample.input so the hallucination gate judges against the full source.

Parameters

seed_pairs : list of (chunk_a, chunk_b) DataSample tuples