get_provider_for_objective(objective, required_capabilities=None, prefer_local=None) -> dict¶
Resolve the provider selected by the model orchestrator for one objective.
The SDK does not wrap the provider. It forwards the request to the core orchestrator and returns the result.
Provider resolution is intentionally lazy: it validates and returns a provider
handle, but it does not preload the model into memory. The first provider method
call can still load the runtime if the model is not already active. Use
warmup_provider(...) when you want to preload explicitly.
The returned dictionary uses this shape:
statusproviderwhen resolution succeedserrorwhen resolution fails
Example:
result = await sdk.ai.get_provider_for_objective(
"chat",
required_capabilities=["chat"],
)
if result.get("status") != "ok":
raise RuntimeError(result.get("error", "Provider unavailable"))
provider = result["provider"]
response = await provider.generate_completion(
messages=[
{"role": "system", "content": "Answer briefly."},
{"role": "user", "content": "Summarize the ticket."},
],
options={"temperature": 0.2},
)Use required_capabilities only when the feature truly depends on that capability. Use prefer_local as a routing hint, not as a guarantee.
Provider methods are called directly:
embeddings = await provider.embed_texts(texts=["First text", "Second text"])
ranked = await provider.rerank(query="capital of France", texts=["Paris", "Berlin"])
labels = await provider.classify(texts=["I like this result."])
tokens = await provider.extract_tokens(text="Alice works at Example Corp.")warmup_provider(provider, wait=True) -> dict¶
Explicitly warm up a provider returned by get_provider_for_objective(...) or
get_provider_by_model_registry_id(...).
Warmup asks the engine orchestrator to resolve and load the provider runtime without invoking an AI method such as completion, embedding, transcription, or reranking.
Use it when you want model loading to be explicit instead of hidden in the first real call.
result = await sdk.ai.get_provider_for_objective("embedding")
if result.get("status") != "ok":
raise RuntimeError(result.get("error", "Provider unavailable"))
provider = result["provider"]
warmup = await sdk.ai.warmup_provider(provider, wait=False)
request_id = warmup["request_id"]With wait=False, the call returns immediately after scheduling the warmup:
{
"status": "queued",
"request_id": "...",
"selector_type": "objective",
"objective": "embedding",
}With wait=True, the call waits until the runtime is ready and includes the
measured warmup duration:
warmup = await sdk.ai.warmup_provider(provider, wait=True)
print(warmup["warmup_ms"])Use wait=True for model test pages, calibration flows, and diagnostics where
the user expects to see warmup time separately from invocation time. Use
wait=False for chat, upload, or background flows where the UI should remain
responsive while the runtime prepares.
If a real provider method is called while a warmup for the same engine row and runtime config is still in progress, the runtime converges on the same engine handle instead of creating a duplicate model instance.
Opt-in Knowledge Ingestion For Completion Calls¶
generate_completion(...) and generate_stream(...) can persist the call as
knowledge when the caller opts in with ingest=True.
The default is False. Runtime AI calls are private unless the module
explicitly asks for knowledge ingestion.
Use ingest_meta for application metadata that later needs to be used as a
retrieval filter, such as module name, chat id, thread id, workspace id, or a
feature-specific scope. Keep those values in metadata; do not pass them as
separate persistence identifiers.
Example:
response = await provider.generate_completion(
messages=[
{"role": "user", "content": "Summarize this discussion."},
],
options={"temperature": 0.2},
ingest=True,
ingest_meta={
"module_name": "chat",
"chat_id": "chat_42",
"thread_id": "thread_7",
},
)For streaming calls, ingestion happens after the stream completes successfully:
async for chunk in provider.generate_stream(
messages=[
{"role": "user", "content": "Continue the conversation."},
],
options={"temperature": 0.2},
ingest=True,
ingest_meta={
"module_name": "chat",
"chat_id": "chat_42",
},
):
...The caller does not pass user_id, organization_id, or pipeline_id.
user_idandorganization_idare resolved from the active request context.pipeline_idis assigned by the AI pipeline runtime and becomes the knowledge source id.- chat or module identifiers belong in
ingest_metaso retrieval can filter on them without binding module entities to core database columns.
Only textual user/assistant/tool/task messages and the final assistant output are ingested. System messages are not persisted into knowledge.
get_provider_by_model_registry_id(model_registry_id, confirm_swap=False) -> dict¶
Resolve a provider for one explicit model_registry row.
Use this for model test pages, calibration flows, benchmark actions, and admin diagnostics where the user has already selected one model binding.
This method does not let objective routing choose another model.
The return value is a dictionary with:
statusproviderwhen resolution succeedsmodelwhen resolution succeedserrorwhen resolution failsto_unloadwhen resource confirmation is required
Possible status values:
ok:providerwas resolved and is ready to receive calls.error: resolution failed; inspecterror.need_confirmation: loading the model requires unloading other runtime instances.
Example:
result = await sdk.ai.get_provider_by_model_registry_id(42)
if result.get("status") != "ok":
raise RuntimeError(result.get("error", "Model provider unavailable"))
provider = result["provider"]
await sdk.ai.warmup_provider(provider, wait=True)
response = await provider.generate_completion(
messages=[{"role": "user", "content": "Reply with one short sentence."}],
options={"max_tokens": 32},
)The model path/reference is resolved by the inventory/media-provider flow before the provider is built. Engine implementations receive the resolved config and should not reach into core registries.