Retrieving Knowledge

This section covers scoped read, search, summary, delete, and retrieval methods exposed by the knowledge façade:

retrieve(...)
read_document_blocks(...)
list_items_for_summary(...)
store_item_summary(...)
search_extracted_items(...)
delete_source(...)
delete_by_metadata(...)

Retrieval is scoped automatically. Module code does not provide user_id, organization_id, or access_level; those values come from the active request context.

Shared Preconditions¶

Retrieval requires:

the Knowledge Query Service to be reachable
an active request context
an authenticated user id in that request context

The query service is started by the core runtime on the core network loop. It is a service boundary for engine/tool runtimes, not a separate process that opens knowledge storage independently.

If the query service is unavailable, the method raises a runtime error from the query client.

RuntimeError("knowledge_query_not_ready")  # startup/health check
RuntimeError("knowledge_query_retrieve_failed")  # retrieval RPC failure

If no request context is active, the method raises:

RuntimeError("knowledge_retrieval_missing_request_context")

If the request context does not contain a user id, the method raises:

RuntimeError("knowledge_retrieval_missing_user_id")

`await retrieve(query_text, query_vector=None, top_k=8, lexical_limit=8, graph_neighbors_limit=4, metadata_filters=None)`¶

This method asks the application Knowledge Query Service to run hybrid retrieval for the current request user.

Example:

result = await module_sdk.knowledge.retrieve(
    query_text="budget planning",
    top_k=5,
    metadata_filters={"module_name": "docs"},
)

for match in result.matches:
    print(match.title, match.score)

What It Is For¶

Use this when your module needs knowledge matches visible to the current user.

Typical cases:

contextual search inside a module page or action
retrieving supporting documents for an AI workflow
narrowing knowledge results with metadata filters
showing graph-neighbor context alongside retrieved items
reconciling chat attachment context for a known AI pipeline

What It Does¶

The method:

resolves user_id, organization_id, and access_level from the request context
validates that query_text is non-empty
builds a KnowledgeRetrieveRequest
sends the request to the Knowledge Query Service

The core retrieval service handles lexical search, vector search, optional reranking, graph expansion, and graph-neighbor enrichment according to the active runtime configuration.

When metadata_filters["pipeline_id"] is provided, retrieval can merge chat attachment context for media-derived knowledge items linked to that pipeline. Without pipeline_id, upload context is not merged.

Scope Behavior¶

Retrieval applies the knowledge visibility rules in the backend.

For a normal module call, the practical rule is:

the current user's own knowledge is visible
public knowledge visible to that request scope is visible
private knowledge owned by other users is not visible

The SDK does not expose parameters to override this scope. If a module needs retrieval, it should call module_sdk.knowledge.retrieve(...) and let the runtime apply the current request context.

Validation Behavior¶

If query_text is empty after normalization, the method raises:

ValueError("query_text is required")

top_k, lexical_limit, and graph_neighbors_limit are normalized to integers before the request reaches the knowledge service.

Return Value¶

The method returns the result from the knowledge service.

The intended return shape is KnowledgeRetrieveResult, which contains:

matches

Each match is a KnowledgeRetrieveMatch with:

item_id
source_id
kind
title
content
summary
score
metadata
graph_neighbors

Notes For Module Authors¶

retrieve(...) is async because the Knowledge Query Service can call async vector and graph providers.

Do not read knowledge repository records directly from a module. Repository methods are internal to the application layer and do not define the SDK scope contract.

Extracted Document Readers¶

Use read_document_blocks(...) when a module needs bounded access to extracted document content without loading the whole markdown document.

blocks = module_sdk.knowledge.read_document_blocks(
    request_id=request_id,
    max_chars=12000,
    kind="chunk",
)

If blocks is omitted, the method returns either the whole document when it fits the character budget or an index of available items for the selected kind. If blocks is provided, it returns those ordinals up to max_chars.

Supported kind values are the extraction item kinds produced by the runtime, including chunk, table, formula, and image.

Extracted Item Summary Helpers¶

Use list_items_for_summary(...) to read extracted items with their cached summary state:

items = module_sdk.knowledge.list_items_for_summary(
    request_id=request_id,
    item_type="chunk",
)

Use store_item_summary(...) to persist a summary for one extracted item:

saved = module_sdk.knowledge.store_item_summary(
    item_id=item_id,
    summary=summary,
)

These helpers are scoped to the current request user and organization. The summary cache is owned by the extraction runtime; module code should not update repository rows directly.

Searching Extracted Items¶

Use search_extracted_items(...) when the module needs lexical search across extracted markdown or chunks visible to the current request scope.

matches = module_sdk.knowledge.search_extracted_items(
    query_text="payment terms",
    limit=8,
    extraction_request_ids=[request_id],
)

The query text is required. extraction_request_ids can narrow the search to a known set of extraction requests.

Deleting Knowledge Data¶

Use delete_source(source_id) to soft-delete one owned knowledge source and enqueue projection cleanup:

deleted_item_ids = module_sdk.knowledge.delete_source(source_id)

Use delete_by_metadata(...) to delete records matching module-scoped metadata:

result = module_sdk.knowledge.delete_by_metadata(
    {"conversation_id": str(conversation_id)},
    force=True,
)

The SDK injects the current module name into metadata filters. Callers must not include module_name themselves. Ownership scope comes from the active request context.