The extractor_registry core model represents registered extractors and their matching rules.
If your module needs to manage which extractors exist, how they match files, and what their runtime metadata looks like, this is the model to use.
What This Model Represents¶
Rows include fields such as:
idnameextractor_idconfigfile_extensionsmime_typesprioritystatussupported
This is the registry-level representation of extractors, not the execution API itself. Extraction execution belongs to sdk.extractors.
Extractor MIME routing is not derived directly from these rows at call time.
The active routing decision is stored in extractor_mime_type_binding and
managed through sdk.extractors.set_mime_type_binding(...).
extractor_registry.mime_types defines what an extractor can support.
extractor_mime_type_binding defines which extractor is currently selected for
a MIME type.
Typical Use Cases¶
Use module_sdk.models.extractor_registry when you need to:
- list available extractors
- register a new extractor row
- update matching rules or status
- manage extractor support metadata
Listing Extractors¶
Example:
listing = module_sdk.models.extractor_registry.list(
page=0,
page_size=100,
filters={"status": "installed"},
)Supported filters include:
idnameextractor_idstatussupported
Creating An Extractor Row¶
Example:
created = module_sdk.models.extractor_registry.create(
{
"name": "docling_pdf",
"extractor_id": "docling",
"file_extensions": [".pdf"],
"mime_types": ["application/pdf"],
"priority": 10,
"status": "installed",
"supported": True,
}
)What It Validates¶
The model:
- requires
nameandextractor_id - rejects duplicate extractor names
- requires
configto be an object when provided - requires
file_extensionsandmime_typesto be lists when provided
Updating And Deleting¶
Update and delete behave like ordinary core-model registry operations:
updated = module_sdk.models.extractor_registry.update(row_id, {"priority": 20})
deleted = module_sdk.models.extractor_registry.delete(row_id)Use them when maintaining registry metadata, not when you actually want to run extraction work.