Generators¶
LLM generation tasks. All subclass BaseGenerationTask and require the generation extra.
curatorkit.generators ¶
LLM-powered generation tasks: QA pairs, preference pairs, GRPO rollouts, multi-turn dialogues, chain-of-thought traces, and adversarial variants.
AdversarialPreferenceTask ¶
AdversarialPreferenceTask(llm: BaseLLM, num_questions: int = 1, injection_rate: float = 0.5, injection_types: list[str] | None = None, seed: int | None = 42, faithful_prompt_template: str | None = None, adversarial_prompt_template: str | None = None, difficulty: str = 'medium', concurrency: int = 10)
Bases: BaseGenerationTask
Generate DPO preference pairs with adversarially hallucinated rejected responses.
For each source chunk, generates num_questions faithful QA pairs (chosen),
then for injection_rate fraction of pairs generates an adversarial variant
of the answer as the rejected response. The remaining pairs use a quality-
degraded variant (lower temperature re-generation) as the rejected response.
Parameters¶
llm : BaseLLM Generator LLM (used for both faithful and adversarial generation). num_questions : int Preference pairs per source chunk. injection_rate : float Fraction of pairs to inject with adversarial rejected responses (0–1). Remaining pairs get a naive re-generation at higher temperature as rejected. injection_types : list[str] | None Which adversarial types to sample from. None = all four. Options: contradicts_source, parametric_drift, domain_mismatch, instruction_quality seed : int | None RNG seed for reproducible injection assignment. faithful_prompt_template : str | None Custom prompt for faithful answer generation. Must include {context}, {num_questions}, {difficulty} placeholders. adversarial_prompt_template : str | None Custom prompt for adversarial rejected generation. Must include {context}, {question} placeholders. difficulty : str "easy" | "medium" | "hard" — passed to faithful generation prompt. concurrency : int
AdversarialQAGenerationTask ¶
AdversarialQAGenerationTask(llm, injection_rate: float = 0.2, injection_types: list[InjectionType] | None = None, num_questions: int = 3, seed: int | None = 42, high_temp: float = 1.4, **kwargs)
Bases: QAGenerationTask
Generate QA pairs with a controlled fraction of adversarial samples injected directly in the first pass.
Parameters¶
llm : BaseLLM LLM for both faithful and adversarial generation. injection_rate : float Fraction of seeds to generate adversarially (0–1). Default 0.20. injection_types : list[InjectionType] | None Which adversarial types to use. Sampled uniformly. Default: all five. num_questions : int QA pairs per seed (faithful or adversarial). Default 3. seed : int | None Random seed for reproducible injection selection. high_temp : float Temperature used for high_temperature_drift injection. Default 1.4.
BaseGenerationTask ¶
BaseGenerationTask(llm: BaseLLM, prompt_template: str | None = None, concurrency: int = 10, max_parse_retries: int = 1)
Bases: BaseNormalizer
Abstract base for LLM generation tasks.
Subclasses must implement
_build_messages(sample) -> list[dict[str, str]] Build the prompt messages for the LLM call.
_parse_response(sample, response) -> list[DataSample] Parse the LLM output into one or more DataSamples.
Parameters¶
llm : BaseLLM LLM backend to use for generation. prompt_template : str | None Custom prompt template. If None, uses the task's default. concurrency : int Number of concurrent async LLM calls (used by run_async).
rejected
property
¶
Samples that failed generation. Flushed by pipeline after run().
flush_rejected ¶
Return and clear accumulated rejected samples.
run ¶
Synchronous generation. Calls the LLM once per input sample.
Returns enriched DataSamples. Failed generations are collected into self.rejected (not returned in the output list).
run_async
async
¶
Async generation with concurrency control.
Fires up to self.concurrency LLM calls in parallel.
ChainOfThoughtTask ¶
ChainOfThoughtTask(llm: BaseLLM, mode: str = 'generate', prompt_template: str | None = None, cot_marker: str = '\n\n## Answer\n', concurrency: int = 10)
Bases: BaseGenerationTask
Generate chain-of-thought reasoning for instructions.
Parameters¶
llm : BaseLLM LLM backend for generation. mode : str "generate": Generate both CoT and answer from instruction only. "wrap": Given instruction + existing answer, generate CoT reasoning. prompt_template : str | None Custom prompt template. cot_marker : str String used to separate reasoning from the final answer in the output.
EvolInstructTask ¶
EvolInstructTask(llm: BaseLLM, prompt_template: str | None = None, num_evolutions: int = 1, strategies: list[str] | None = None, generate_answers: bool = True, answer_prompt_template: str | None = None, concurrency: int = 10)
Bases: BaseGenerationTask
Evolve instructions into harder variants via LLM.
Parameters¶
llm : BaseLLM LLM backend for generation. prompt_template : str | None Custom prompt template. Must contain {instruction} and {strategy}. num_evolutions : int Number of evolved variants per instruction (each uses a different strategy). strategies : list[str] | None Which evolution strategies to use. Defaults to all five. generate_answers : bool If True, also generate answers for evolved instructions. answer_prompt_template : str | None Custom template for answer generation.
run ¶
Run evolution with strategy cycling.
Each sample gets num_evolutions variants, cycling through strategies. If generate_answers is True, a second LLM pass generates answers.
GRPORolloutTask ¶
GRPORolloutTask(llm: BaseLLM, num_responses: int = 4, scoring_llm: BaseLLM | None = None, score_responses: bool = True, temperature_spread: float = 0.0, temperatures: list[float] | None = None, response_prompt: str | None = None, scoring_prompt: str | None = None, concurrency: int = 10)
Bases: BaseGenerationTask
Generate N diverse responses per prompt for GRPO training.
Parameters¶
llm : BaseLLM LLM backend for response generation. num_responses : int Number of responses to generate per prompt. scoring_llm : BaseLLM | None Separate LLM for scoring. If None, uses the same LLM. Set to None and score_responses=False for unscored rollouts. score_responses : bool Whether to score each response. temperature_spread : float Temperature variation across responses. Responses are generated at temperatures from (base - spread/2) to (base + spread/2). response_prompt : str | None Custom template for response generation. scoring_prompt : str | None Custom template for response scoring.
run ¶
Generate N responses per prompt with optional scoring.
Each input sample produces exactly one output sample with populated responses[] and reward_scores[] lists.
BadSampleInjector ¶
BadSampleInjector(llm: BaseLLM, injection_rate: float = 0.2, injection_types: list[InjectionType] | None = None, seed: int | None = 42)
Inject controlled failure samples into a generated QA corpus.
Parameters¶
llm : BaseLLM LLM used to generate adversarial answers. injection_rate : float Fraction of samples to replace with injected failures (0–1). injection_types : list[InjectionType] Which failure types to inject. Types are sampled uniformly. seed : int | None Random seed for reproducible injection selection.
inject ¶
Replace injection_rate fraction of samples with adversarial variants.
Returns a new list (original list is not mutated). Injected samples are shuffled back into the list at their original positions so the gate cannot exploit ordering.
The returned list has metadata["injected_failure"]=True on injected samples. All other samples are unchanged.
inject_by_type ¶
Inject exact counts of each failure type (useful for controlled ablations).
samples are sampled without replacement per type.
MultiTurnTask ¶
MultiTurnTask(llm: BaseLLM, num_turns: int = 3, mode: str = 'turn_by_turn', prompt_template: str | None = None, include_context: bool = True, concurrency: int = 10)
Bases: BaseGenerationTask
Generate multi-turn conversations from prompts or text chunks.
Parameters¶
llm : BaseLLM LLM backend for generation. num_turns : int Number of user-assistant exchange pairs. mode : str "turn_by_turn" (default) — each turn is a separate LLM call, conditioned on all prior real turns. "single_call" — one LLM call generates the full conversation. prompt_template : str | None Custom template for single_call mode. Required vars: {num_turns}, {context_section}, {initial_question}. include_context : bool If True, include source text as grounding context.
PreferenceGenerationTask ¶
PreferenceGenerationTask(llm: BaseLLM, prompt_template: str | None = None, mode: str = 'single_call', chosen_prompt: str | None = None, rejected_prompt: str | None = None, concurrency: int = 10)
Bases: BaseGenerationTask
Generate preference pairs (chosen/rejected) for DPO training.
Parameters¶
llm : BaseLLM prompt_template : str | None Custom single-call template. Must contain {instruction} and optionally {context_section}. mode : str "single_call" — one LLM call generates both chosen and rejected. "two_pass" — separate LLM calls for chosen and rejected. chosen_prompt : str | None Two-pass template for chosen (must contain {instruction}, optionally {context_section}). rejected_prompt : str | None Two-pass template for rejected.
QAGenerationTask ¶
QAGenerationTask(llm: BaseLLM, prompt_template: str | None = None, num_questions: int = 3, table_prompt_template: str | None = None, difficulty: str = 'medium', concurrency: int = 10)
Bases: BaseGenerationTask
Generate question-answer pairs from text chunks.
Parameters¶
llm : BaseLLM LLM backend for generation. prompt_template : str | None Custom prompt template. Must contain {context} and {num_questions}. num_questions : int Number of QA pairs to generate per chunk. table_prompt_template : str | None Separate prompt for table-derived chunks. difficulty : str Difficulty level hint: "easy", "medium", "hard".
run_multi_passage ¶
Generate cross-passage synthesis QA from pairs of adjacent chunks.
Each pair produces num_questions QA samples whose answers require facts from both passages. The combined context (passage_1 + passage_2) is stored in DataSample.input so the hallucination gate judges against the full source.
Parameters¶
seed_pairs : list of (chunk_a, chunk_b) DataSample tuples