The Python-level sandbox is the semantic layer: it knows the active subject, subject chain, session, and approvals.

On Linux, Democr.ai can add OS-level enforcement underneath it. This layer is designed to block bypasses that leave Python through native code, child processes, or direct sockets.

OS-level enforcement is not a replacement for structured manifest access declarations or access-policy approvals. It is a second line of defense for deployments that need kernel-level controls.

Controls

The Linux OS-level layer is made of three independent controls.

Control Purpose Enforcement scope
Landlock restrict filesystem access to read-only and read-write path allowlists current process and future children after restriction
seccomp BPF block dangerous syscalls such as process execution, ptrace, kernel module loading, filesystem root pivots, UID changes, BPF loading, and common exploit primitives current process, with TSYNC attempted for all threads
iptables/ip6tables + cgroup v2 restrict outbound network egress to allowed IP/port pairs for the application process cgroup process-level network egress

Landlock and seccomp are applied directly by the application process when enabled and supported. Network egress enforcement is applied by a privileged local helper because it needs access to cgroup and packet-filtering operations.

SDK subprocess execution has an additional launcher path. When sandbox.os.enabled is true, sdk.tasks.run_subprocess(...) starts a small sandbox launcher instead of executing the target command directly. The launcher receives the current parent guard policy, applies the available OS-level filesystem restrictions to the child process, and then executes the requested command.

Configuration

Process restrictions are controlled separately:

sandbox:
  os:
    seccomp:
      enabled: true
    landlock:
      enabled: true
      extra_read_paths:
        - /opt/democrai/read-only
      extra_write_paths:
        - /srv/democrai/work

Network allowlist enforcement uses these keys:

sandbox:
  os:
    enabled: true
    policy_file: /var/lib/democrai/os_sandbox_allowlist.json
    helper_socket: /run/democrai/os_sandbox_helper.sock
    refresh_seconds: 60

If paths are not configured, the runtime uses per-process local defaults under the application data and state directories. The default names include the application process identifier:

  • <state_dir>/os_sandbox_helper_<pid>.sock
  • <data_dir>/os_sandbox_allowlist_<pid>.json

This allows multiple application instances on the same machine to run separate helpers and policy files while applying the same logical allowlist to their own process IDs.

Runtime Requirements

OS-level enforcement is Linux-only.

Landlock requires a kernel with Landlock ABI support. The implementation requires ABI version 1 or newer, which starts with Linux 5.13.

seccomp BPF support currently depends on:

  • Linux
  • libc availability
  • a supported architecture, currently x86_64 or aarch64

Network egress enforcement depends on:

  • cgroup v2 mounted at /sys/fs/cgroup
  • iptables and ip6tables
  • permission to create cgroups and install packet-filtering rules

If the application process already has the required privileges, the helper can start directly. Otherwise desktop mode uses pkexec when available, and non-desktop mode uses sudo when available. Server deployments that enable OS network enforcement should provide a deliberate elevation path, such as a dedicated sudoers rule for the helper command.

Landlock Filesystem Rules

Landlock builds two path lists:

  • read-only paths for system and runtime dependencies
  • read-write paths for application data, configuration, cache, state, logs, user module directories, configured extra module/engine/extractor directories, temp, and local media storage

Configured sandbox.os.landlock.extra_read_paths and sandbox.os.landlock.extra_write_paths are appended to those lists.

Only paths that exist when Landlock is applied can be added. Landlock restriction is irrevocable for the process after it is applied, so it is applied after setup storage has been initialized.

The implementation checks the Landlock ABI before applying rules.

For SDK subprocesses, the launcher derives the child Landlock rules from the filesystem access rules already active in the parent guard. read and execute entries become read-only paths. create, modify, and delete entries become read-write paths. If the platform is not Linux or Landlock is not supported, the launcher does not apply filesystem restrictions and the subprocess continues through the normal SDK execution path.

seccomp BPF

The seccomp filter is a blocklist for high-risk syscalls. It currently supports known syscall numbers for x86_64 and aarch64.

The blocked groups include:

  • execve and execveat
  • ptrace
  • kexec
  • kernel module loading and unloading
  • pivot_root and chroot
  • UID/GID changing syscalls
  • bpf
  • perf_event_open
  • userfaultfd

The filter uses SECCOMP_RET_KILL_PROCESS for blocked syscalls. It sets PR_SET_NO_NEW_PRIVS, attempts SECCOMP_FILTER_FLAG_TSYNC to cover all threads, and falls back to applying the filter to the current thread if TSYNC is rejected.

Network Egress Allowlist

The network allowlist is built from runtime sources:

  • remote service endpoints from configuration
  • module manifests
  • engine manifests
  • extractor manifests
  • MCP server registry entries
  • permanent access-policy approvals
  • session access-policy approvals
  • observed runtime endpoints recorded after Python-level checks

The allowlist is serialized to a local JSON policy file. The helper reads that file, resolves hostnames to IP addresses, creates a cgroup for the target process, and installs iptables and ip6tables OUTPUT rules that allow only matching destination IP/port pairs for that cgroup.

The cgroup and packet-filtering names are per process:

  • cgroup suffix: democrai_os_sandbox_<pid>
  • chain base: DEMOCRAI_OS_<pid>
  • active/pending chains: DEMOCRAI_OS_<pid>_A and DEMOCRAI_OS_<pid>_B

The helper builds the pending chain, inserts the jump for the process cgroup, and then removes the previous chain for that same process. Different application instances on the same host therefore use distinct cgroups and distinct iptables chains by default.

System DNS resolvers from /etc/resolv.conf are added automatically on UDP and TCP port 53.

The helper periodically reapplies the policy when refresh_seconds is greater than zero. This keeps DNS-derived IP rules aligned with changing DNS answers.

Same-machine multi-instance deployment and multi-node deployment solve different problems:

  • same-machine instances need per-process helper sockets, policy files, cgroups, and iptables chains; the defaults provide this isolation
  • multi-node instances need approval refresh events to reach processes on other hosts; use a stream provider that crosses hosts, such as Redis

With an in-memory stream provider, refresh events are local to one process. Same-host instances can share a host-local IPC stream only if that provider is shared by all relevant processes. Multi-node deployments need a networked provider.

When an instance observes a session or permanent approval during an access check and OS allowlist enforcement is active, it also refreshes and applies its local allowlist. This is a local safety net for delayed refresh events; the normal propagation mechanism is still the distributed stream.

Privileged Helper

The helper is intentionally small. It receives:

  • socket path
  • policy file path
  • refresh interval
  • parent PID

It does not read application config, module manifests, engine manifests, extractor manifests, or database approvals. The application builds the policy file; the helper only validates and applies it.

The Unix socket client is authenticated with SO_PEERCRED. The helper accepts requests only when:

  • the client UID matches the expected application user
  • the client process is the parent process or a descendant of it
  • the requested target PID, when provided, is also the parent process or a descendant

The helper also validates the policy file owner and permissions before reading it. The policy file must be owned by the expected client user and must not be group- or world-writable.

The helper can be started:

  • directly when the process already has the privileges needed for cgroup and iptables operations
  • through pkexec in desktop mode
  • through sudo outside desktop mode

It runs a watchdog for the parent process. On modern Linux it uses pidfd_open when available; otherwise it falls back to polling with kill(pid, 0). If the parent exits, the helper exits.

Limits and Tradeoffs

The OS-level network policy is process-level, not truly per session. Session approvals can open a target at the process egress layer while the Python-level guard still keeps subject/session semantics.

The network policy filters OUTPUT. It does not filter INPUT, FORWARD, or NAT.

The network policy is IP-and-port based after DNS resolution. It does not enforce hostname, HTTP path, TLS SNI, or HTTP method.

Host patterns that cannot be resolved to concrete hostnames are not directly materialized as iptables rules. Concrete runtime endpoints that pass the Python-level policy can be recorded and included in later OS allowlist refreshes.

Landlock and seccomp are Linux-only and depend on kernel and architecture support. Unsupported controls are skipped or reported as unsupported by status helpers instead of silently pretending to apply.

Why This Design

This design is different from a virtual environment, restricted Python, or a container-only model:

Alternative What it does not solve alone Democr.ai sandbox layer
Python virtual environment does not restrict filesystem or network access runtime guard plus optional kernel controls
restricted Python style does not reliably control native libraries or direct syscalls Python-level policy backed by OS-level enforcement
subprocess/container isolation only can be expensive or coarse for in-process module execution in-process guard for normal runtime plus kernel controls where available

The result is a layered model: subject-aware policy in Python, administrator approval state in the application, and Linux kernel enforcement for deployments that require it.