Extractors

Extractors convert uploaded files into usable runtime context.

Extractor providers are selected by MIME type and produce normalized output for knowledge ingestion: full markdown, chunks, metadata, and status for asynchronous processing.

Upload path

Upload handling starts before the agent sees the message.

The composer should only accept MIME types that configured extractors can ingest, then attach request context to the uploaded file.

Uploadflow stage
MIME matchflow stage
Extractorflow stage
Ingestion queueflow stage
01

MIME capability

The UI accepts MIME types based on configured ingestible extractors instead of hardcoded lists.

02

Request metadata

The upload context can include module, thread, message, user, and organization information.

03

Asynchronous status

The message may reach an agent before extraction finishes, so tools must expose attachment and ingestion state.

Extractor output

The output must support both full-document and chunk retrieval.

Agents sometimes need a semantic search result and sometimes need the whole extracted document.

01

Full markdown

The complete extracted document remains available when the agent needs the whole source.

02

Chunks

Chunked content feeds embedding and retrieval stores.

03

Metadata

Source, request id, provider metadata, and scope fields keep downstream retrieval auditable.

Agent workflow

Extraction is not assumed to be complete at message time.

Tools need to make ingestion state visible because document processing is asynchronous.

01

List attachments

The agent can inspect which files belong to the conversation or message.

02

Wait when needed

A wait tool can pause briefly when a required document is still processing.

03

Retrieve when ready

Completed documents can be searched through embeddings or read as full markdown.

Next

Related pages

Use these pages to move from the concept to adjacent parts of the runtime.