Skip to main content

Documents

The Documents module stores plain-text documents with embedding vectors for semantic search across project content.

Overview

A Document IS a File — it always uses .txt format and is associated with a project. When a document is created, its text content is passed to a configured embedding provider (currently Ollama only), and the resulting vector is stored alongside the text. This allows cosine-similarity search at query time without an external vector database.

Documents are identified by an id prefixed with doc_. The internal database primary key is never returned.

See the Permissions Reference for the IAM action strings for this module.

Data Model

FieldTypeDescription
idstringPublic identifier prefixed with doc_
file_idstringID of the underlying File record
project_idstringID of the owning project
pathstring | nullLogical path within the project (e.g. /reports/q1.txt). Also used as the resource ID segment in path-based SRNs.
filenamestringOriginal filename (.txt extension)
sizenumberFile size in bytes
contentstringText content — only present in GET /documents/:id responses
created_atstringISO 8601 creation timestamp
updated_atstringISO 8601 last-updated timestamp

The embedding column (pgvector vector(N)) is stored in the database but never returned via the API.

Path Field

The path field is a logical, project-scoped identifier for a document — similar to a file path in a filesystem. It is optional at creation time; if omitted, the server defaults to /<filename>. Paths must be absolute (start with /) and are normalized (. and .. are resolved). The combination of project_id + path is unique within a project.

Path examples:

/reports/q1.txt
/datasets/raw/2024-01-01.txt

PATCH /documents/:id accepts a path field to move a document to a new logical path.

Key Concepts

Path-Based SRNs

Policies can target documents by their logical path rather than their id. When a document has a path set, the server evaluates both the id-based SRN and the path-based SRN:

SRN formMatches
soat:proj_ABC:document:doc_XYZSpecific document by ID
soat:proj_ABC:document:/reports/q1.txtDocument at the exact path /reports/q1.txt
soat:proj_ABC:document:/reports/*All documents under /reports/
soat:proj_ABC:document:*All documents in the project (id wildcard)
*All resources in the project

List and search endpoints apply policy filters at the SQL level — the database returns only rows the caller is permitted to see, so pagination counts are always accurate.

See the IAM Reference for full SRN syntax and policy authoring guidance.

Project ID Resolution

For endpoints that accept project_id, the field is optional. When omitted, the server resolves accessible projects based on the caller's identity:

Caller typeBehavior when project_id is omitted
project keyInfers the project from the key's own scope (single project)
JWT adminNo project filter — returns results across all projects
JWT userEnumerates all projects the user is a member of with the required permission

Regular users can only access documents in projects they are members of. Even if a user's policy contains resource: ["*"], the server checks project membership before evaluating policies — access is limited to the user's own projects. See IAM — Authorization Model for the full evaluation flow.

If project_id is supplied but the caller lacks permission for that project, the request returns 403 Forbidden.

Configuration

Environment VariableRequiredDescription
FILES_STORAGE_DIRYesDirectory where .txt files are written (shared with Files)
EMBEDDING_PROVIDERYesEmbedding backend — only ollama is supported
EMBEDDING_MODELYesModel name, e.g. qwen3-embedding:0.6b
EMBEDDING_DIMENSIONSYesVector dimensions — must match the model output, e.g. 1024
OLLAMA_BASE_URLNoOllama server URL, defaults to http://localhost:11434

Ollama setup example

# Pull the embedding model
ollama pull qwen3-embedding:0.6b

# Verify it's running
ollama list

Set the server environment variables:

EMBEDDING_PROVIDER=ollama
EMBEDDING_MODEL=qwen3-embedding:0.6b
EMBEDDING_DIMENSIONS=1024
OLLAMA_BASE_URL=http://localhost:11434

Examples

Create a document

soat create-document \
--project-id proj_ABC \
--filename q1-report.txt \
--path /reports/q1-report.txt \
--content "Q1 revenue was \$1.2M..."