This section covers the two bridge methods that expose extractor routing and execution through the knowledge facade:
resolve_registered_extractor(...)extract_with_registered_extractor(...)
These methods exist for module authors who conceptually work in the knowledge domain but still need direct access to the registered-extractor path without dropping down into sdk.extractors explicitly.
resolve_registered_extractor(path=None, filename=None, mime_type=None) -> dict | None¶
This method resolves the extractor bound to the source MIME type through
sdk.extractors.resolve(...).
Example:
resolved = module_sdk.knowledge.resolve_registered_extractor(
filename="report.pdf",
mime_type="application/pdf",
)What It Is For¶
Use this when your module wants to stay on the knowledge façade but still needs to know which registered extractor would be selected for a source.
Typical examples:
- deciding whether a document type is supported before starting a flow
- debugging why a document will route to a specific extractor
- surfacing extractor information in a document-oriented admin screen
What It Does¶
The method:
- normalizes
path,filename, andmime_type - lowercases the mime type when present
- delegates directly to
sdk.extractors.resolve(...)
It does not add extra routing logic beyond that.
The current extractor resolver is binding-based. If the MIME type is not
configured in the extractor MIME binding table, this method returns None even
when an installed extractor declares theoretical support for the MIME type.
What It Returns¶
The return value is the resolved extractor payload from the extractor domain, or None if no extractor matches.
That payload includes routing details such as:
row_idextractor_idstatusprioritymime_matchextension_match
Why This Method Exists¶
Strictly speaking, a module could call module_sdk.extractors.resolve(...) directly.
This helper exists to keep document-oriented code readable when the surrounding flow is already written in terms of the knowledge domain.
It is a convenience bridge, not a separate subsystem.
extract_with_registered_extractor(path=None, data=None, filename=None, mime_type=None, config=None) -> dict | None¶
This method runs the active registered extractor through sdk.extractors.extract(...).
Example with bytes:
payload = module_sdk.knowledge.extract_with_registered_extractor(
data=file_bytes,
filename="contract.pdf",
mime_type="application/pdf",
config={"ocr_enabled": True},
)Example with a path:
payload = module_sdk.knowledge.extract_with_registered_extractor(
path="/tmp/contract.pdf",
)What It Is For¶
Use this when you want the knowledge-domain naming but the extraction semantics of the registered extractor flow.
This is most useful when:
- the feature is knowledge-oriented
- you still want the lower extractor behavior of returning
Noneinstead of raising - you want to pass raw bytes directly
What It Does¶
The method:
- normalizes
path,filename,mime_type, andconfig - converts
datatobyteswhen provided - delegates directly to
sdk.extractors.extract(...)
Like resolve_registered_extractor(...), this is a bridge method, not a separate runtime.
Important Difference From extract_document_data(...)¶
This method returns None when no extractor matches, because it follows the lower extractor-domain behavior.
That makes it different from extract_document_data(...), which raises a RuntimeError when no registered extractor is found.
When To Choose Which¶
Use extract_document_data(...) when:
- you have a path
- the operation is document-centric
- missing extractor support should fail explicitly
Use extract_with_registered_extractor(...) when:
- you want to handle the
Nonecase yourself - you are working with raw bytes
- you want a thinner bridge to the extractor facade
Why These Methods Matter¶
It may seem redundant to expose extractor behavior through the knowledge domain, but in practice these bridge methods help keep module code cohesive.
Sometimes a feature is clearly about document knowledge, not extractor administration, yet it still needs just enough control to:
- inspect the resolved extractor
- run extraction without the stricter error behavior of
extract_document_data(...)
These two methods solve that problem cleanly.