LLM backends¶
Backends used by generation tasks and LLM-judge gates. The LiteLLM backend reaches
any provider LiteLLM supports and requires the generation extra. The Ollama backend
talks to a local server using only the standard library, so it works on a core
install.
curatorkit.llm ¶
LLM abstraction layer for CuratorKIT.
Provides a unified interface to 100+ LLM providers via LiteLLM, plus a dedicated Ollama backend for local models.
BaseLLM ¶
BaseLLM(model: str, temperature: float = 0.7, max_tokens: int = 1024, api_key: str | None = None, timeout: float = 120.0, max_retries: int = 3)
Bases: ABC
Abstract base for LLM backends.
Subclasses must implement
_call(messages, **kwargs) -> LLMResponse
Subclasses may optionally implement
_acall(messages, **kwargs) -> LLMResponse (for async generation)
Parameters¶
model : str Model identifier string (format depends on backend). temperature : float Default temperature for generation. max_tokens : int Default maximum tokens for generation. api_key : str | None API key override. Falls back to environment variable if None. timeout : float Request timeout in seconds. max_retries : int Number of retries on transient failures.
generate ¶
generate(messages: list[dict[str, str]], temperature: float | None = None, max_tokens: int | None = None, stop: list[str] | None = None, **kwargs: Any) -> LLMResponse
Synchronous generation with retry logic.
Parameters¶
messages : list[dict] OpenAI-style message list: [{"role": "user", "content": "..."}] temperature : float | None Override default temperature for this call. max_tokens : int | None Override default max_tokens for this call. stop : list[str] | None Stop sequences.
Returns¶
LLMResponse
agenerate
async
¶
agenerate(messages: list[dict[str, str]], temperature: float | None = None, max_tokens: int | None = None, stop: list[str] | None = None, **kwargs: Any) -> LLMResponse
Async generation with retry logic.
Same interface as generate() but returns a coroutine.
LLMResponse
dataclass
¶
LLMResponse(text: str, model: str = '', prompt_tokens: int = 0, completion_tokens: int = 0, total_tokens: int = 0, latency_seconds: float = 0.0, metadata: dict[str, Any] = dict())
Structured response from an LLM call.
to_provenance_dict ¶
Extract fields suitable for a ProvenanceRecord.notes entry.