Skip to content

Exporters

Output format writers. Each writes one JSONL file into the output directory.

curatorkit.exporters

AlpacaExporter

Bases: BaseExporter

Export to Alpaca instruction-following format.

CorpusExporter

Bases: BaseExporter

Export corpus chunks to corpus.jsonl with full chunk metadata.

DPOExporter

Bases: BaseExporter

Export preference data in TRL DPO format.

Only samples with task_type "preference" or "implicit_preference" are written. All others are skipped (not rejected — skipping is intentional when exporting a multi-task pipeline subset).

GRPOExporter

Bases: BaseExporter

Export to GRPO group rollout format.

Uses DataSample.responses and DataSample.reward_scores if populated. Falls back to empty arrays when no rollouts have been generated.

PPOExporter

Bases: BaseExporter

Export prompts in PPO training format.

ShareGPTExporter

Bases: BaseExporter

Export to ShareGPT multi-turn conversation format.