Train with AlignTune¶

AlignTune is Lexsi Labs' fine-tuning library. CuratorKIT's Alpaca and DPO exports match the dataset shapes AlignTune's SFT and RL trainers consume, so the two compose into a curate-then-train workflow.

1. Curate¶

Generate and gate a dataset, with splits:

from curatorkit import Curator, CuratorConfig

result = Curator(CuratorConfig(
    dataset                 = "handbook.pdf",
    llm_model               = "openai/gpt-4o-mini",
    generation_task         = "qa",                  # or "preference" for DPO pairs
    hallucination_threshold = 0.7,
    reward_threshold        = 0.7,
    export_formats          = ["alpaca"],
    output_split            = {"train": 0.9, "val": 0.1},
    output_dir              = "output/curated",
)).run()

This writes output/curated/train/sft_alpaca.jsonl and output/curated/val/sft_alpaca.jsonl, plus the provenance set.

2. Publish the dataset¶

AlignTune's trainers take a dataset name, so load the exported JSONL with the datasets library and push it to the HuggingFace Hub. The auto-generated dataset_card.md is a ready-made README for the dataset repository.

from datasets import load_dataset

ds = load_dataset("json", data_files={
    "train":      "output/curated/train/sft_alpaca.jsonl",
    "validation": "output/curated/val/sft_alpaca.jsonl",
})
ds.push_to_hub("your-org/handbook-qa-curated")

3. Train¶

from aligntune.core.backend_factory import create_sft_trainer

trainer = create_sft_trainer(
    model_name   = "Qwen/Qwen3-0.6B",
    dataset_name = "your-org/handbook-qa-curated",
    backend      = "trl",
    num_epochs   = 3,
    batch_size   = 4,
)
trainer.train()
print(trainer.evaluate())

For preference data, export with generation_task="preference" and export_formats=["dpo"], then use AlignTune's RL trainer:

from aligntune.core.backend_factory import create_rl_trainer

trainer = create_rl_trainer(
    model_name   = "Qwen/Qwen3-0.6B",
    dataset_name = "your-org/handbook-dpo-curated",
    algorithm    = "dpo",
    backend      = "trl",
)
trainer.train()

AlignTune's documentation covers backend selection, the other RL algorithms, and evaluation. The provenance manifest from step 1 stays valid for the published dataset: every training sample traces back to its source chunk.