Uploads and Stored Files

This page covers the storage side of the media contract:

add(...)
add_model(...)
add_model_from_source(...)
view(...)
get_path(...)
move(...)
delete(...)

These methods are the ones a module author should use when dealing with persisted media assets.

The Core Rule¶

When a file matters beyond the current request, your module should persist it through module_sdk.media and keep the resulting media path in its own state.

That means:

do not keep /tmp/... paths in your database
do not pass local filesystem paths around as if they were stable asset references
do not make the UI reason about provider paths or local files

The module should keep one stable thing: the stored media path.

`add(path, payload) -> str`¶

This is the write method.

Use it when your module has bytes and wants to persist them into media storage.

Example:

report_path = module_sdk.media.add(
    "monthly-summary.pdf",
    pdf_bytes,
)

What It Is For¶

Use add(...) for the normal module case:

an export was generated
a processed image was produced
a document was normalized
a binary artifact has to be kept for later

The important thing is that the file is now a persisted media asset, not just a transient file that happened to exist on disk once.

The path you pass here is only a filename or filename hint. The final storage path is computed by the runtime.

Why Use It¶

This method keeps the write path simple.

The module provides the file name it wants to preserve, the media layer computes the canonical storage path, persists the bytes through the active provider, and returns the stored path your module can save in the database or pass to later flows.

That computed path includes the runtime scope information that matters operationally:

module
user
organization when present
date partition
generated file id

What It Validates¶

add(...) is intentionally strict:

if path is empty, it raises ValueError("path is required")
if payload is empty, it raises ValueError("media payload is empty")

That strictness is useful because it prevents “successful” writes of meaningless data.

What The Returned Path Looks Like¶

The exact value depends on the current session, but the shape is intentionally scoped.

A typical stored path looks like:

media/<module>/org_<organization>/uid_<user>/<yyyy>/<mm>/<dd>/<file_id>_<filename>

That is why the caller should persist the returned value, not try to predict it.

Real Usage Pattern¶

Imagine a module that receives an uploaded PDF, extracts some pages, and stores the cleaned result:

cleaned_path = module_sdk.media.add(
    "cleaned.pdf",
    cleaned_pdf_bytes,
)

module_sdk.database.update(
    ContractDocument,
    document_id,
    cleaned_media_path=cleaned_path,
)

That is the right kind of persistence flow for the SDK: the module keeps the resulting media path, not a local temp path and not a provider-specific URL.

`add_model(model_id, *, payload=None, source_path=None, filename=None) -> str`¶

This is the shared model write method.

Use it when the artifact is not a normal user-scoped media file, but a model that must live in shared storage so that other nodes or runtimes can see it too.

Unlike add(...), this method does not write under the module/user/organization tree. It always writes under:

models/<id>/...

That distinction matters because model artifacts are runtime assets, not user uploads.

What It Is For¶

Use add_model(...) when you need to persist:

one model file
one already-stored model artifact that must be copied into the shared model namespace
one whole model directory

This is the SDK method for the case you described explicitly: model storage must be shared across multinode deployments and must not be partitioned by user.

Supported Input Modes¶

The method supports three practical input forms:

raw bytes plus an explicit filename
a source_path pointing to a file
a source_path pointing to a directory

It also accepts storage-backed sources using the stored media path returned by add(...) and copies them into the shared model namespace.

Example: Save One Model File From Bytes¶

model_path = module_sdk.media.add_model(
    "llama3-8b",
    payload=weights_bytes,
    filename="model.gguf",
)

The returned path is:

models/llama3-8b/model.gguf

Example: Save One Local Model Directory¶

model_prefix = module_sdk.media.add_model(
    "mistral-7b-instruct",
    source_path="/tmp/exported-model",
)

If the source is a directory, the SDK writes the whole tree under:

models/mistral-7b-instruct/...

and also writes a manifest file so the stored directory is explicit rather than implicit.

Example: Copy An Already Stored Artifact Into Shared Model Storage¶

shared_path = module_sdk.media.add_model(
    "qwen2.5",
    source_path=row.uploaded_media_path,
    filename="weights.safetensors",
)

This is useful when a module first receives an uploaded artifact through normal media flows, but the final canonical destination must be shared model storage.

Validation Rules¶

add_model(...) is strict on purpose:

if neither payload nor source_path is provided, it raises ValueError("model source is required")
if both are provided, it raises ValueError("provide either payload or source_path")
if bytes mode is used without filename, it raises ValueError("filename is required when saving model bytes")
if the local source path is unreadable, it raises ValueError("model source path is not readable")

Why This Method Exists Separately¶

It would be a mistake to overload add(...) with “sometimes user-scoped, sometimes shared model-scoped” behavior.

Keeping add_model(...) separate makes the storage intent obvious:

add(...) is for normal module media, scoped by module and user
add_model(...) is for shared model artifacts under models/<id>/...

`add_model_from_source(model_id, *, source, filename=None, progress_callback=None) -> str`¶

This is the remote model import method.

Use it when the model source is remote and the final artifact must be persisted under shared model storage.

The caller describes the source. The SDK handles the download mechanics, temporary staging path, Hugging Face token handling, and final write through the active media provider.

This method is part of the public media SDK because model provisioning flows need one stable path for remote model materialization. It is not a general HTTP download helper for arbitrary module data. Normal module files should still use add(...); already-local or already-uploaded model artifacts should use add_model(...).

Do not call provider download libraries such as huggingface_hub.snapshot_download directly from modules or engines. Also do not call low-level HTTP helpers from module or engine code just to assemble model files locally.

Large downloads are streamed to a temporary file. The SDK does not read whole model shards into memory. When the active media provider supports direct file persistence, the staged file is also copied or uploaded without loading the whole file into memory.

Hugging Face Snapshot Example¶

storage_ref = module_sdk.media.add_model_from_source(
    "qwen3-tts-tokenizer-12hz",
    source={
        "type": "huggingface",
        "repo": "Qwen/Qwen3-TTS-Tokenizer-12Hz",
        "snapshot": True,
    },
)

The SDK reads HF_TOKEN or HUGGING_FACE_HUB_TOKEN from the managed environment when present and sends it explicitly. It does not rely on user-home credential files such as ~/.netrc.

The returned value is a media-provider storage reference, usually:

models/qwen3-tts-tokenizer-12hz

Hugging Face File Example¶

storage_ref = module_sdk.media.add_model_from_source(
    "llama-3.2-1b-gguf",
    source={
        "type": "huggingface",
        "repo": "bartowski/Llama-3.2-1B-Instruct-GGUF",
        "files": ["Llama-3.2-1B-Instruct-Q4_K_M.gguf"],
    },
)

If exactly one Hugging Face file is provided and snapshot is not true, the SDK stores it as one model file under:

models/llama-3.2-1b-gguf/Llama-3.2-1B-Instruct-Q4_K_M.gguf

URL File Example¶

storage_ref = module_sdk.media.add_model_from_source(
    "custom-model",
    source={
        "type": "url",
        "url": "https://example.test/model.bin",
    },
    filename="model.bin",
)

Supported Source Shapes¶

The supported source types are:

{"type": "huggingface", "repo": "...", "revision": "main", "files": [...], "snapshot": true}
{"type": "url", "url": "..."}

For Hugging Face sources:

repo is required
revision defaults to main
files may be omitted when snapshot is true
one file without snapshot is stored as a single model file
snapshot sources are staged under the application temp directory and then persisted with add_model(...)

The final persistence path always goes through the active media provider.

Progress Callback¶

progress_callback is optional. It is intended for long-running provisioning flows that already run inside a background task.

The callback is synchronous and receives a dictionary shaped like:

{
    "label": "model-00002-of-00004.safetensors",
    "downloaded_bytes": 2147483648,
    "total_bytes": 3998751275,
    "file_index": 2,
    "total_files": 4,
}

total_bytes may be 0 when the remote server does not provide a content length. Callers should treat progress as advisory UI feedback, not as a source of persisted model metadata.

When calling from async module code, bridge the callback back to the event loop instead of awaiting inside it. A typical catalog/task flow does this by calling asyncio.run_coroutine_threadsafe(module_sdk.tasks.update_progress(...), loop) from the callback.

`view(path) -> bytes`¶

This is the read method.

Use it when you already have a persisted media path and need the asset bytes again.

Example:

payload = module_sdk.media.view(document.media_path)

What It Is For¶

Typical cases:

reading a stored PDF before sending it to a parser
loading a generated CSV before attaching it to an email flow
reopening a stored image before resizing it

The important distinction is that view(...) is for persisted assets. If your code still depends on a transient local file path from the upload phase, you have not finished normalizing the flow yet.

Why Use It¶

The caller should not care whether storage is local, remote, or provider-backed in some other way. view(...) keeps the read path focused on the one thing the module actually needs: bytes.

Accepted Path Form¶

view(...) accepts the stored media path returned by the SDK or runtime media provider. Keep that storage path in backend state, and convert it to a public URL only when rendering UI.

`get_path(path, *, destination_dir=None) -> MaterializedMedia`¶

Use get_path(...) only when the next library needs a local filesystem path and cannot consume bytes from view(...).

Example:

materialized = module_sdk.media.get_path(model.storage_path)
try:
    run_path_only_library(materialized.path)
finally:
    materialized.cleanup()

The active media provider decides how to satisfy the request:

local storage returns the existing path under the media root
distributed storage downloads the file or directory into a temporary local path

The returned object exposes:

path: local filesystem path
temporary: whether the path is a temporary materialization
cleanup(): releases temporary files when needed

Do not store the returned path in database rows. Store the media storage reference and call get_path(...) again when a path-only consumer needs it.

`move(source_path, destination_path) -> str`¶

This is the rename or relocation method.

Use it when an asset already exists in storage and your module wants that same asset to live under a new path.

Example:

final_path = module_sdk.media.move(
    "drafts/reports/run-42.pdf",
    "reports/2026/04/final-report.pdf",
)

What It Is For¶

This is useful when the module workflow has clear stages:

a draft becomes a final artifact
a temporary namespace becomes a canonical namespace
a user-owned path becomes an organization-owned path

The important thing is that the identity of the stored file changes at the storage-path level without forcing the module to manually read, re-add, and clean up the old object.

What It Does¶

At runtime the method:

reads the source bytes
writes them to the destination
deletes the old object
updates the upload metadata row when one exists

That last detail matters because the module should not have to remember to keep media metadata aligned after a move.

Why Use It¶

Without move(...), modules tend to reimplement the same fragile sequence in slightly different ways. Centralizing it in the SDK keeps both the stored object and the tracked metadata aligned.

`delete(path) -> None`¶

This is the removal method.

Use it when a persisted asset is no longer needed.

Example:

module_sdk.media.delete(document.media_path)

What It Is For¶

Typical cases:

a user deletes an attachment
a generated artifact is replaced and the old one should disappear
a cleanup action removes obsolete exports

What It Does¶

The method deletes:

the stored object itself
the matching upload metadata row when the path belongs to a tracked upload

That means the module does not need to remember two separate cleanup steps.

Why Use It¶

If the module removes the application record but leaves the stored asset behind, media storage slowly accumulates junk. delete(...) is the explicit cleanup point for the asset itself.

Practical End-To-End Example¶

A realistic module flow usually looks like this:

receive bytes from an upload or a generated artifact
persist them with add(...) or add_model(...) depending on whether the artifact is normal media or shared model storage
keep the returned media path in the module model
later reopen the asset with view(...)
if the asset changes namespace, use move(...)
if the asset is no longer needed, use delete(...)

Example:

stored_path = module_sdk.media.add(
    "original.pdf",
    uploaded_bytes,
)

module_sdk.database.update(
    Invoice,
    invoice_id,
    original_media_path=stored_path,
)

payload = module_sdk.media.view(stored_path)
signed_payload = sign_invoice_pdf(payload)

signed_path = module_sdk.media.add(
    "signed.pdf",
    signed_payload,
)

module_sdk.database.update(
    Invoice,
    invoice_id,
    signed_media_path=signed_path,
)

That is the level of abstraction module authors should have: persist, read, move, delete. Not provider plumbing.

Uploads and Stored Files

The Core Rule¶

add(path, payload) -> str¶

What It Is For¶

Why Use It¶

What It Validates¶

What The Returned Path Looks Like¶

Real Usage Pattern¶

add_model(model_id, *, payload=None, source_path=None, filename=None) -> str¶

What It Is For¶

Supported Input Modes¶

Example: Save One Model File From Bytes¶

Example: Save One Local Model Directory¶

Example: Copy An Already Stored Artifact Into Shared Model Storage¶

Validation Rules¶

Why This Method Exists Separately¶

add_model_from_source(model_id, *, source, filename=None, progress_callback=None) -> str¶

Hugging Face Snapshot Example¶

Hugging Face File Example¶

URL File Example¶

Supported Source Shapes¶

Progress Callback¶

view(path) -> bytes¶

What It Is For¶

Why Use It¶

Accepted Path Form¶

get_path(path, *, destination_dir=None) -> MaterializedMedia¶

move(source_path, destination_path) -> str¶

What It Is For¶

What It Does¶

Why Use It¶

delete(path) -> None¶

What It Is For¶

What It Does¶

Why Use It¶

Practical End-To-End Example¶

`add(path, payload) -> str`¶

`add_model(model_id, *, payload=None, source_path=None, filename=None) -> str`¶

`add_model_from_source(model_id, *, source, filename=None, progress_callback=None) -> str`¶

`view(path) -> bytes`¶

`get_path(path, *, destination_dir=None) -> MaterializedMedia`¶

`move(source_path, destination_path) -> str`¶

`delete(path) -> None`¶