Skip to content

Customisation

All CuratorKIT extension points are accessible through CuratorConfig — no subclassing required for prompt customisation, custom LLM backends, or custom rubrics. This page covers all the places you can plug in your own logic.


Custom prompt templates

Every generation task accepts a *_prompt_template override.

QA generation

Required variables: {context}, {num_questions}. Optional: {difficulty}.

Note: if your template omits {difficulty} and difficulty is set to anything other than "medium", a Difficulty level: ... line is appended after the template is rendered. Include the {difficulty} placeholder to control its position yourself.

CuratorConfig(
    generation_task   = "qa",
    qa_prompt_template = (
        "You are a legal analyst. Generate {num_questions} precise questions "
        "from the clause below. Each question must be answerable from the text alone.\n\n"
        "Clause:\n{context}\n\n"
        "Return JSON: [{{\"question\": \"...\", \"answer\": \"...\"}}]"
    ),
)

Preference pairs

Required variables: {instruction}, {context_section}

CuratorConfig(
    generation_task            = "preference",
    preference_prompt_template = (
        "Generate a chosen/rejected pair for DPO training.\n\n"
        "{context_section}"
        "Instruction: {instruction}\n\n"
        "chosen = comprehensive answer with all relevant details\n"
        "rejected = correct but missing the single most critical detail\n\n"
        "Return JSON: {{\"chosen\": \"...\", \"rejected\": \"...\", \"degradation_pattern\": \"...\"}}"
    ),
)

GRPO response prompt

Required variable: {instruction}

CuratorConfig(
    generation_task    = "grpo",
    grpo_prompt_template = "Answer the following concisely in under 100 words.\n\n{instruction}",
)

CoT prompt

Required variables for generate mode: {instruction} Required variables for wrap mode: {instruction}, {answer}

CuratorConfig(
    generation_task  = "cot",
    cot_mode         = "generate",
    cot_prompt_template = (
        "Solve step by step. Show your work.\n\n"
        "Problem: {instruction}\n\n"
        "## Steps\n(numbered reasoning)\n\n## Answer\n(final answer)"
    ),
)

Evol-Instruct prompt

Required variables: {instruction}, {strategy}, {context}

CuratorConfig(
    generation_task      = "evol",
    evol_prompt_template = (
        "Make this instruction harder using the '{strategy}' strategy.\n\n"
        "Original: {instruction}\nContext: {context}\n\n"
        "Return JSON: {{\"evolved_instruction\": \"...\", \"strategy_applied\": \"{strategy}\", \"complexity_notes\": \"...\"}}"
    ),
)

Multi-turn (single_call mode only)

Required variables: {num_turns}, {context_section}, {initial_question}

The template is only used in single_call mode, and CuratorConfig always builds the multi-turn task in turn_by_turn mode — so set a custom template by constructing the task directly:

from curatorkit.generators.multiturn_gen import MultiTurnTask

task = MultiTurnTask(
    llm  = my_llm,
    mode = "single_call",
    prompt_template = (
        "Generate a {num_turns}-turn Q&A conversation.\n\n"
        "{context_section}\n"
        "Opening question: \"{initial_question}\"\n\n"
        "Return JSON: {{\"turns\": [{{\"role\": \"user\", \"content\": \"...\"}}, ...]}}"
    ),
)

Custom LLM backends

Any OpenAI-compatible endpoint (vLLM, Ollama, custom servers)

CuratorConfig(
    llm_model    = "openai/Qwen/Qwen3-8B",
    llm_api_base = "http://localhost:8000/v1",
    llm_api_key  = "token-abc123",         # or set OPENAI_API_KEY env var
)

Ollama (local)

CuratorConfig(
    llm_model    = "ollama/llama3.1:8b",   # "ollama/" prefix routes to OllamaBackend
    llm_api_base = "http://localhost:11434",
)

Model-specific parameters via llm_extra_body

Pass any model-specific API parameters that LiteLLM forwards verbatim:

CuratorConfig(
    llm_extra_body = {
        "chat_template_kwargs": {"enable_thinking": True},   # Qwen3 thinking tokens
    },
    # Judge gets thinking disabled — structured output without preamble
    judge_llm_extra_body = {
        "chat_template_kwargs": {"enable_thinking": False},
    },
)

Separate generator and judge models

Using the same model as both generator and judge causes self-leniency bias — the judge scores its own outputs too generously. Always configure a separate judge model when possible:

CuratorConfig(
    llm_model            = "openai/Qwen/Qwen3-8B",   # generator
    llm_api_base         = "http://localhost:8000/v1",

    judge_llm_model      = "openai/gpt-4o-mini",      # judge — different model
    judge_llm_api_base   = None,                       # None = use standard API
    judge_llm_temperature = 0.1,                       # low temp for deterministic scoring
)

Custom reward rubric

The 7 built-in dimensions cover most cases. When they don't fit your domain, replace the entire judge prompt with reward_prompt_template. The template must produce a JSON object with "score" (float, 0–1) and "reasoning" (string). Custom dimension validation is bypassed.

Required variables: {instruction}, {response}

CuratorConfig(
    reward_threshold        = 0.7,
    reward_prompt_template  = (
        "Rate the following legal answer on a scale of 0.0 to 1.0.\n\n"
        "Criteria:\n"
        "- Cites the specific clause or article (0.4 weight)\n"
        "- States the obligation or right precisely (0.3 weight)\n"
        "- Identifies relevant exceptions or qualifications (0.3 weight)\n\n"
        "Instruction: {instruction}\n\n"
        "Response: {response}\n\n"
        "Respond with JSON only: {{\"score\": 0.XX, \"reasoning\": \"...\"}}"
    ),
)

Custom probe templates

probe_extra_templates is merged over the built-in probe templates, with your values taking precedence. There are four built-in keys, each tied to a probe path:

Key Probe path Overridable?
default Temperature-sweep re-generation Yes
strict_grounding Strict-grounding probe Yes
domain_specific Domain-grounding probe Yes
generate_question Instruction re-generation probe No — this probe always uses the built-in template; overriding the key has no effect
CuratorConfig(
    enable_diagnostic_probe = True,
    probe_extra_templates   = {
        # Override strict_grounding with a domain-specific instruction
        "strict_grounding": (
            "You are a financial analyst. Answer using ONLY the passage. "
            "Cite specific numbers and percentages.\n\n"
            "Passage:\n{source}\n\nQuestion:\n{question}"
        ),
        # Override domain_specific for a legal domain
        "domain_specific": (
            "You are a legal analyst. Answer using only the passage text. "
            "Be precise about dates, parties, and obligations.\n\n"
            "Passage:\n{source}\n\nQuestion:\n{question}"
        ),
    },
)

You can also add templates under new key names. They are never selected by the default routing, but the domain-grounding probe checks each rejected sample's metadata for a domain_prompt_key entry and uses the template with that name instead of domain_specific:

CuratorConfig(
    enable_diagnostic_probe = True,
    probe_extra_templates   = {"legal_strict": "Answer as a contracts lawyer, using only the passage.\n\nPassage:\n{source}\n\nQuestion:\n{question}"},
)
# A sample with metadata={"domain_prompt_key": "legal_strict"} is probed
# with the "legal_strict" template on the domain-grounding path.

If a sample names a key that doesn't exist, the probe falls back to the default template.

Template variables: {source} and {question}.


Custom preprocessing function

preprocessing_fn runs on every raw row before it becomes a DataSample. Return None to drop the row.

def preprocess(row: dict) -> dict | None:
    # Drop rows with very short output
    if len(row.get("response", "")) < 50:
        return None
    # Rename fields for field_mapping
    row["question"] = row.pop("user_query", "")
    row["answer"]   = row.pop("response", "")
    # Normalise whitespace
    row["answer"] = " ".join(row["answer"].split())
    return row

CuratorConfig(
    dataset          = "data/raw.jsonl",
    preprocessing_fn = preprocess,
    # {source_column: datasample_field} — keys are your columns
    field_mapping    = {"question": "instruction", "answer": "output"},
)

Building a custom generator

Subclass BaseGenerationTask from curatorkit.generators.base. Implement two methods:

from curatorkit.generators.base import BaseGenerationTask
from curatorkit.llm.base import BaseLLM, LLMResponse
from curatorkit.schema import DataSample
import uuid

class SummarisationTask(BaseGenerationTask):
    def _build_messages(self, sample: DataSample) -> list[dict]:
        source = self._get_source_context(sample)
        return [{"role": "user", "content": f"Summarise in 3 sentences:\n\n{source}"}]

    def _parse_response(self, sample: DataSample, response: LLMResponse) -> list[DataSample]:
        text = response.text.strip()
        if not text:
            return []
        return [DataSample(
            id=str(uuid.uuid4()),
            source_uri=sample.source_uri,
            instruction="Summarise the following passage.",
            input=self._get_source_context(sample),
            output=text,
            task_type="instruction_following",
            provenance_chain=list(sample.provenance_chain),
        )]

Use it directly with Pipeline (bypassing CuratorConfig):

from curatorkit.pipeline import Pipeline
from curatorkit.llm.litellm import LiteLLMBackend

llm  = LiteLLMBackend(model="openai/gpt-4o-mini")
task = SummarisationTask(llm=llm, concurrency=10)

pipeline = Pipeline([reader, schema_gate, task, alpaca_exporter], output_dir=Path("output/"))
result   = pipeline.run()

Building a custom gate

Subclass BaseGate from curatorkit.interfaces. Implement run():

from curatorkit.interfaces import BaseGate
from curatorkit.schema import DataSample, RejectedSample

class LengthGate(BaseGate):
    def __init__(self, min_output_words: int = 50):
        self.min_output_words = min_output_words

    def run(self, samples: list[DataSample]) -> tuple[list[DataSample], list[RejectedSample]]:
        passed, rejected = [], []
        for s in samples:
            word_count = len(s.output.split())
            if word_count >= self.min_output_words:
                passed.append(s)
            else:
                rejected.append(RejectedSample(
                    **s.model_dump(),
                    rejection_reason=f"output_too_short:{word_count}_words",
                    rejecting_step=type(self).__name__,
                ))
        return passed, rejected

Insert it directly into a Pipeline step list alongside built-in gates.


Next steps