Engines

Engines turn model providers into runtime services.

An engine implements configured runtime methods such as chat, generate stream, embeddings, STT, TTS, or image analysis. The application asks for a model or objective; the runtime resolves the provider, quota, placement, and invocation path.

Provider lifecycle

Model capability is configured before application code asks for it.

The system module handles install and activation. Feature code asks for a configured objective or model id.

Module callobjective, model id, or capability

Quota gaterequests or total tokens by scope

Resolvermodel capability and priority

Orchestratorlocal gRPC or cluster queue

Workersandboxed engine process

Streamresult, usage, audit trail

01

Install

Engines are installed and configured through the system module and provider catalog.

02

Configure

Models store capabilities, runtime methods, provider options, quotas, and performance defaults.

03

Warmup and placement

Model loading can happen outside the UI process, and cluster mode can prefer nodes with warm instances.

Single or cluster

The orchestrator can stay local or coordinate across nodes.

Local deployments use the same engine boundary as clustered deployments. The difference is configuration: direct gRPC and in-memory streams for one node, queue-backed invocation and Redis response streams for coordinated nodes.

Single-node provider

Good for desktop, development, and compact server deployments where the runtime and orchestrator live together.

Cluster provider

Good for deployments that need shared queues, node-aware placement, Redis streams, and engine workers on multiple machines.

Governance

Engine usage can be limited by requests or tokens.

Quota checks happen before engine invocation. Limits can target everyone, guests, organizations, roles, or individual users across rolling minute, hour, day, week, or month windows.

01

Request quotas

Limit invocation count for a configured engine over a rolling period.

02

Token quotas

Limit total token usage recorded through model usage and observability data.

03

Scoped decisions

Apply limits globally or by organization, role, user, or guest session.

Invocation model

The app calls a runtime method, not a provider-specific client.

This keeps modules away from provider SDK details and lets the engine layer normalize local and remote runtimes.

01

Runtime methods

Providers expose the methods they support. Test pages and UI should use those real method contracts.

02

Local and remote

OpenAI-compatible APIs, local runtimes, and specialized engines share the same high-level provider boundary.

03

Observability

Requests carry ids and context so failures, timings, and output can be traced.

Resource behavior

Local engines need resource-aware startup.

Local model runtimes can be expensive to load. The platform should keep those costs outside UI-critical paths.

01

Background workers

Engine processes can hold model memory away from the main application/UI process.

02

Warmup requests

A provider can be resolved first, then warmed up before the first user-facing invocation.

03

Configuration limits

Context length, runtime options, and performance defaults belong to model/provider configuration rather than page code.

Implemented engines

Available engine implementations.

These are the engine implementations exposed through the runtime/provider layer.

Chat and reasoning

Remote APIs, OpenAI-compatible providers, and local LLM runtimes used for chat and streamed generation.

OpenAI

OpenAI-compatible

Anthropic

Gemini

Ollama

llama.cpp

vLLM

ONNX

Speech

Speech-to-text and text-to-speech providers used by composer voice input and audio generation flows.

Whisper

OpenAI Whisper

OpenAI TTS

Edge TTS

Edge

eSpeak

Parler TTS

Qwen TTS

Vision and detection

Specialized engines for visual analysis and object detection workflows.

YOLO

Next

Related pages

Use these pages to move from the concept to adjacent parts of the runtime.

A2UI

Streaming output into clients.

Knowledge

Embedding and retrieval providers.

Extensibility

How engines fit the extension model.