Skip to content

Gates

Quality gates: pass samples through or reject them with a structured reason.

curatorkit.gates

DiversityGate

DiversityGate(embedding_model: str = 'sentence-transformers/all-MiniLM-L6-v2', similarity_threshold: float = 0.92, text_field: str = 'auto', coverage_field: str | None = None, batch_size: int = 64, device: str | None = None)

Bases: BaseGate

Reject samples that are semantically too similar to existing ones.

Parameters

embedding_model : str Sentence-transformers model name. similarity_threshold : float Cosine similarity above this → reject as near-duplicate. text_field : str Which DataSample field to embed. "auto" picks based on task_type. coverage_field : str Metadata or DataSample field to check for category coverage gaps. batch_size : int Encoding batch size for the embedding model.

HallucinationGate

HallucinationGate(llm: BaseLLM, threshold: float = 0.7, prompt_template: str | None = None, skip_if_no_context: bool = True, concurrency: int = 16)

Bases: BaseGate

Verify generated answers are grounded in their source text.

Parameters

llm : BaseLLM LLM backend for grounding judgement. threshold : float Minimum grounding score (0-1). Samples below this are rejected. prompt_template : str | None Custom grounding evaluation prompt. skip_if_no_context : bool If True, samples without source context pass through. If False, samples without source context are rejected.

run_async async

run_async(samples: list[DataSample]) -> tuple[list[DataSample], list[RejectedSample]]

Async execution — uses agenerate() with semaphore-bounded concurrency.

RewardGate

RewardGate(llm: BaseLLM, threshold: float = 0.7, dimensions: list[str] | None = None, prompt_template: str | None = None, store_score_in_label: bool = True, concurrency: int = 16)

Bases: BaseGate

Quality-score samples using an LLM judge and reject below threshold.

Parameters

llm : BaseLLM LLM backend for quality judgement. threshold : float Minimum quality score (0-1). Samples below this are rejected. dimensions : list[str] Quality dimensions to evaluate. Defaults to core UltraFeedback set. prompt_template : str | None Custom reward evaluation prompt. store_score_in_label : bool If True, store the overall score in DataSample.label.

run_async async

run_async(samples: list[DataSample]) -> tuple[list[DataSample], list[RejectedSample]]

Async execution — uses agenerate() with semaphore-bounded concurrency.

SchemaGate

SchemaGate(required_fields: list[str] | None = None, min_tokens: int = 10, max_tokens: int = 2048, use_tiktoken: bool = False, enforce_task_types: list[str] | None = None)

Bases: BaseGate

Validate samples against field, token-length, and encoding constraints.

Parameters:

Name Type Description Default
required_fields list[str] | None

Fields that must be non-empty. Default behaviour is auto-derived from the sample's task_type. Explicitly setting this overrides the automatic per-task-type check entirely.

None
min_tokens int

Minimum token count for the primary text fields.

10
max_tokens int

Maximum token count for the primary text fields.

2048
use_tiktoken bool

Use tiktoken cl100k_base instead of whitespace tokenizer.

False
enforce_task_types list[str] | None

If non-empty, only samples with these task_type values pass. Useful for single-paradigm pipelines (e.g. pure DPO).

None