Skip to content

LLM backends

Backends used by generation tasks and LLM-judge gates. The LiteLLM backend reaches any provider LiteLLM supports and requires the generation extra. The Ollama backend talks to a local server using only the standard library, so it works on a core install.

curatorkit.llm

LLM abstraction layer for CuratorKIT.

Provides a unified interface to 100+ LLM providers via LiteLLM, plus a dedicated Ollama backend for local models.

BaseLLM

BaseLLM(model: str, temperature: float = 0.7, max_tokens: int = 1024, api_key: str | None = None, timeout: float = 120.0, max_retries: int = 3)

Bases: ABC

Abstract base for LLM backends.

Subclasses must implement

_call(messages, **kwargs) -> LLMResponse

Subclasses may optionally implement

_acall(messages, **kwargs) -> LLMResponse (for async generation)

Parameters

model : str Model identifier string (format depends on backend). temperature : float Default temperature for generation. max_tokens : int Default maximum tokens for generation. api_key : str | None API key override. Falls back to environment variable if None. timeout : float Request timeout in seconds. max_retries : int Number of retries on transient failures.

generate

generate(messages: list[dict[str, str]], temperature: float | None = None, max_tokens: int | None = None, stop: list[str] | None = None, **kwargs: Any) -> LLMResponse

Synchronous generation with retry logic.

Parameters

messages : list[dict] OpenAI-style message list: [{"role": "user", "content": "..."}] temperature : float | None Override default temperature for this call. max_tokens : int | None Override default max_tokens for this call. stop : list[str] | None Stop sequences.

Returns

LLMResponse

agenerate async

agenerate(messages: list[dict[str, str]], temperature: float | None = None, max_tokens: int | None = None, stop: list[str] | None = None, **kwargs: Any) -> LLMResponse

Async generation with retry logic.

Same interface as generate() but returns a coroutine.

config_hash

config_hash() -> str

Hash the LLM configuration for provenance tracking.

LLMResponse dataclass

LLMResponse(text: str, model: str = '', prompt_tokens: int = 0, completion_tokens: int = 0, total_tokens: int = 0, latency_seconds: float = 0.0, metadata: dict[str, Any] = dict())

Structured response from an LLM call.

to_provenance_dict

to_provenance_dict() -> dict[str, Any]

Extract fields suitable for a ProvenanceRecord.notes entry.