Skip to Content

Synced from docs/llm-agent-reference.md in the Computalot monorepo.

Complete reference with tutorials and examples. For a compact version, use GET /llms.txt.

Computalot is a distributed compute platform. Submit jobs, get structured JSON results. GPU and CPU workers.

Private beta — running jobs requires an admin-issued API key or admin-whitelisted wallet session. Discovery is public. We’re actively building and want your feedback.

Base URL: https://dev.computalot.com

  • https://dev.computalot.com/skill.md — install this skill to get started
  • https://dev.computalot.com/llms.txt — compact reference
  • https://dev.computalot.com/llms-full.txt — this full reference
  • https://dev.computalot.com/api/v1/docs — machine-readable JSON index
  • https://dev.computalot.com/docs — human docs

Report Bugs & Request Features

This is beta software. Please report bugs, request features, and share ideas:

curl -sS -X POST https://dev.computalot.com/api/v1/feedback \ -H "Content-Type: application/json" \ -d '{"type": "bug", "title": "Brief summary", "description": "What happened, what you expected"}'

Types: bug, feature_request, provisioning, job_type_request. No auth required. Agents: when you hit an issue or see room for improvement, submit feedback automatically.

Two Paths

Sealed Recipes — Platform-provided compute primitives. No project setup needed. Send typed payloads, get results. Best for evaluation, training, fuzzing, and optimization against fixed runtimes.

Sandboxed Projects — Bring your own code. Create a project, push a tarball, submit custom jobs with runner_command.

Why Agents Use Computalot

Computalot is built so an agent can go from discovery to results without a human doing setup in the middle:

  • discover the service through /llms.txt
  • authenticate with an admin-issued API key or an admin-whitelisted wallet session
  • top up credits with x402 once that wallet is allowlisted
  • call a sealed recipe directly, or create a project and push code
  • retrieve structured results and decide what to do next

That wallet-auth + x402 loop is still a core product feature, but during private beta it only works for admin-whitelisted wallets. Human users can also use admin-issued API keys. If you are not already in beta, start at https://dev.computalot.com/ and join the waitlist.

Job Types

TypeUse caseKey fields
structured_runnerRun script with JSON in/out, optional fan-outrunner_command, payload, fan_out, merge_strategy
sweepGrid search over parameter combinationsrunner_command, parameters, fixed_payload, rank_by
map_reduceChunked parallelism with reduce operatorsrunner_command, split, reduce, payload
benchmarkCompare named candidates with replicasrunner_command, candidates, shared_payload, replicas, rank_by

Default to structured_runner unless another type clearly fits. Prefer a sealed recipe when the public catalog already exposes the compute primitive you need.

Auth

Two bearer-token paths, both resolve to the same account model:

  • API key: flk_... (admin-issued)
  • Wallet session: fls_... (challenge/verify for admin-whitelisted wallets during private beta)

Wallet auth flow

  1. POST /api/v1/auth/wallet/challenge with {"wallet_address":"0x...","chain":"base"}
  2. Sign the returned challenge.message with your wallet
  3. POST /api/v1/auth/wallet/verify with {"challenge_id":"wch_...","wallet_address":"0x...","signature":"0x..."}
  4. Use the returned token as Authorization: Bearer fls_...

Wallet auth creates or reuses an account linked to chain + wallet_address. That account owns all projects, jobs, results, and credits.

API keys

API keys (flk_...) work on all endpoints. Admin-issued during private beta.

No auth required: /health, /docs, /llms.txt, /llms-full.txt, /api/v1/docs/*, POST /api/v1/feedback, POST /api/v1/auth/register (returns 403 with beta guidance), POST /api/v1/auth/wallet/challenge, POST /api/v1/auth/wallet/verify. GET /metrics is operator-gated: local requests, admin auth, or a dedicated metrics token only.

Sealed Recipes

Sealed recipes are platform-owned compute primitives with:

  • a fixed tarball-backed runtime bundle
  • a fixed entrypoint
  • a typed payload schema
  • optional artifact input slots

Use a recipe when you want a published evaluator or scorer and do not want to upload or run your own command.

Use a project when you need to bring your own code, setup, and runner_command.

Discovery endpoints:

  • GET /api/v1/recipes
  • GET /api/v1/recipes/:name
  • GET /api/v1/docs/recipes

Current public recipes:

  • prop_amm
    • operations: eval, eval_chunk, validate, validate_full, bpf_eval, bpf_eval_chunk, build, concavity_check
    • note: validate is the fast default and skips native/BPF parity; use validate_full when you need the slower parity check
    • note: build publishes compiled native/BPF artifacts; eval_chunk / bpf_eval_chunk expose deterministic seed windows
    • typical inputs: strategy_ref or rs_source_b64, plus seed/range fields
  • packing
    • operations: eval, eval_batch, feasible_optimize, basin_hopping, differential_evolution
    • typical inputs: inline candidate arrays or candidate_ref / candidates_ref, plus bounded search knobs for the optimization operations
  • lightgbm_train
    • operations: train, cross_validate
    • note: shared tabular LightGBM recipe; uploads model/metrics artifacts for train and cross-validation metrics for cross_validate
    • typical inputs: immutable dataset_ref, target_column, and bounded LightGBM hyperparameters such as seed, n_estimators, num_leaves, and folds
  • echidna
    • operations: foundry_prebuilt, hardhat_prebuilt
    • typical inputs: immutable prebuilt project_ref, contract, and bounded fuzzing fields such as test_mode, test_limit, and seed

Recipe jobs do not take a user-controlled runner_command. The recipe determines the runtime and command. Do not send type for recipe submissions unless you have a specific reason to override it. When recipe is set or you use POST /api/v1/recipes/:name/jobs, the API infers type: "structured_runner" automatically.

Example:

POST /api/v1/recipes/prop_amm/jobs { "payload": { "operation": "validate", "strategy_ref": "art_123" }, "timeout_s": 900 }

Billing

Computalot uses account credits. Jobs reserve a hold on submission and settle to actual usage on completion.

Supported beta access today is either an admin-issued API key or an admin-whitelisted wallet session. Both authenticate the same account model and the same billing surfaces.

  • GET /api/v1/account/balance — check credits
  • GET /api/v1/account/ledger — transaction history
  • GET /api/v1/account/holds — active holds
  • Project init is free but requires $5 available balance
  • Fund via x402: POST /api/v1/account/quotes/topup → pay → POST /api/v1/account/quotes/:id/pay/x402

Account billing endpoints

Authenticated callers can inspect billing state with:

  • GET /api/v1/account/balance
  • GET /api/v1/account/ledger
  • GET /api/v1/account/holds
  • GET /api/v1/account/quotes

Billing truth lives on GET /api/v1/account/balance, GET /api/v1/account/holds, GET /api/v1/account/ledger, and GET /api/v1/account/quotes.

GET /api/v1/account/balance returns the main numbers a client should care about:

  • ledger_balance_usd: total credited minus settled debits
  • held_usd: funds currently reserved for active jobs
  • available_usd: spendable balance after holds
  • active_hold_count
  • open_quote_count

How pricing works in practice

Users should think in terms of quotes and holds, not hidden infrastructure details.

Before a job is admitted, Computalot derives a submit-time estimate from:

  • the requested job type
  • planned task count and fan-out shape
  • requested requirements
  • requested timeout_s
  • resolved reliability_mode

That estimate becomes the hold. If the account cannot cover it, the job is rejected before it starts. Once admitted, the job is allowed to finish inside that reserved exposure instead of being killed mid-flight for routine billing reasons.

Submit-time billing summary

Job submit responses can include:

  • summary.billing_estimate
  • summary.billing_admission
  • summary.billing_hold

Important fields include:

  • inferred resource_class
  • inferred runtime_class
  • resolved reliability_mode
  • estimated_hold_usd
  • whether the account had enough available balance to admit the job

For long-running or expensive jobs, treat the submit response as the authoritative pricing signal for that exact request.

x402 funding flow

x402 is the public funding rail for autonomous wallets.

  1. POST /api/v1/account/quotes/topup with a requested amount such as { "amount_usd": 5.0 } — the per-top-up cap is $10,000, submit multiple smaller top-ups for larger funding
  2. Computalot returns 402 Payment Required
  3. The response includes a PAYMENT-REQUIRED header and a top-up quote
  4. The client pays and retries POST /api/v1/account/quotes/:quote_id/pay/x402
  5. On success, the internal account balance increases and the server returns PAYMENT-RESPONSE

If a job submit or project-init gate fails for insufficient balance, Computalot can return the same 402 shape with a shortfall quote so the client can fund the exact gap and retry.

  • If POST /api/v1/projects/:name/init returns that shortfall quote, fund the account and retry POST /api/v1/projects/:name/init.
  • If POST /api/v1/jobs returns that shortfall quote, fund the account and retry the same submit request.

In practice, this means an agent does not need:

  • a subscription
  • a stored credit card
  • a pre-issued public API key

It can discover, fund, and use compute through the API itself.

Reliability mode

reliability_mode is a public submission field:

  • best_effort
  • strict_complete

Use strict_complete for research-sensitive fan-out work, sweeps, benchmarks, CMA generations, and training/evaluation batches where missing one task would invalidate the outcome.

Execution policy and placement

Jobs land in one of two execution modes, each with two placement options:

  • sandboxed (default) — your uploaded project code runs inside a gVisor sandbox.
    • placement_policy = "shared" (default): reuses warm workers with other sandboxed tenants.
    • placement_policy = "dedicated": holds a worker exclusively for your project for the reservation window.
  • sealed — only available for platform recipes (see GET /api/v1/recipes). Runs on sealed workers using the recipe’s prebuilt runtime.
    • placement_policy defaults to shared. dedicated is available for workloads that need warm reuse.
    • You cannot submit execution_policy = "sealed" with a user project — sealed is a recipe-only mode.
    • You cannot override a platform recipe’s execution_policy. Submitting any value other than sealed on a platform recipe returns 422.

Both modes use the same submission surface. The fields are optional — leave them out to accept the defaults (sandboxed + shared for user projects, sealed + shared for platform recipes).

Core Model

  • A job is the user-visible unit of work you submit
  • Tasks are the parallel execution units Computalot creates from your job
  • You do not target infrastructure directly — submit jobs with a project and optional resource requirements, and Computalot handles placement
  • Terminal jobs and results are queryable for 30 days

Resource Requirements & Guaranteed Capacity

Submit minimum resource needs with your job — Computalot places work on matching capacity:

{ "type": "structured_runner", "project": "my-ml-project", "runner_command": ["python", "train.py"], "payload": {"epochs": 3}, "requirements": { "cpu": 8, "memory_mb": 16384, "storage_gb": 40, "gpu_count": 1, "gpu_memory_mb": 12288, "profile": "gpu" }, "reservation": { "mode": "guaranteed", "parallelism": 2, "guaranteed_for_s": 1800, "max_wait_s": 0 } }
  • requirements are minimums. Computalot may place on larger machines.
  • profile: "cpu" or "gpu". CPU jobs can spill onto idle GPU capacity.
  • reservation.mode = "best_effort" is default.
  • reservation.mode = "guaranteed" is immediate admit-or-reject. Returns 409 if capacity unavailable.
  • Inspect reservation state: GET /api/v1/leases/:job_id.

How to request capacity well:

  • Ask for minimum real requirements. Oversized requests shrink eligible capacity and increase queue time.
  • Use profile: "gpu" only when the task truly needs GPU compute.
  • Use guaranteed only for short-window reserved parallelism. For normal batch work, best_effort is usually enough; the first job on a newly pushed revision may simply pay a cold-start cost.
  • Use POST /api/v1/projects/:name/init only when you explicitly want to prepare currently available workers ahead of time before a burst.
  • Use reliability_mode: "strict_complete" when missing one task would corrupt the final result set.
  • Treat the submit response as the pricing signal for that exact request. The hold estimate is derived from this shape.

Journey 1: Sign Up → First Job → Results

This walks through the public end-to-end path from zero to a completed job using wallet auth and account credits.

This is the canonical beta onboarding path once your wallet is allowlisted or you already have an admin-issued API key: wallet sign-in, billing checks, x402 funding, project setup, job submit, and result retrieval.

Prerequisites

# Base URL export BASE_URL="https://dev.computalot.com" # Your wallet address export WALLET_ADDRESS="0x1234567890abcdef1234567890abcdef12345678"

If you already have an admin-issued API key, you can skip the wallet flow and set:

export TOKEN="flk_..."

Otherwise, use wallet auth.

Step 1: Authenticate with your wallet

Ask for a challenge:

CHALLENGE_JSON=$(curl -sS "$BASE_URL/api/v1/auth/wallet/challenge" \ -X POST \ -H "Content-Type: application/json" \ -d "{\"wallet_address\":\"$WALLET_ADDRESS\",\"chain\":\"base\"}") echo "$CHALLENGE_JSON"

Sign challenge.message with your wallet provider or SDK, then verify:

export CHALLENGE_ID="wch_..." export SIGNATURE="0xSIGNED_MESSAGE" VERIFY_JSON=$(curl -sS "$BASE_URL/api/v1/auth/wallet/verify" \ -X POST \ -H "Content-Type: application/json" \ -d "{ \"challenge_id\":\"$CHALLENGE_ID\", \"wallet_address\":\"$WALLET_ADDRESS\", \"signature\":\"$SIGNATURE\" }") echo "$VERIFY_JSON"

Extract the returned session token and use it for the remaining steps:

export TOKEN="fls_..."

Step 2: Fund your account if needed

Check balance and inspect the same billing truth surfaces if you need more detail:

curl -sS "$BASE_URL/api/v1/account/balance" \ -H "Authorization: Bearer $TOKEN" curl -sS "$BASE_URL/api/v1/account/holds" \ -H "Authorization: Bearer $TOKEN" curl -sS "$BASE_URL/api/v1/account/ledger" \ -H "Authorization: Bearer $TOKEN" curl -sS "$BASE_URL/api/v1/account/quotes" \ -H "Authorization: Bearer $TOKEN"

If available_usd is below the minimum funded floor, request a top-up quote:

curl -sS "$BASE_URL/api/v1/account/quotes/topup" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "amount_usd": 5.0, "description": "initial project setup and first job" }'

That returns 402 Payment Required. An x402-capable client should then pay and retry:

curl -sS "$BASE_URL/api/v1/account/quotes/<quote_id>/pay/x402" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "PAYMENT-SIGNATURE: <x402 payment payload>"

If project init or job submit later returns a shortfall quote instead of admitting the request, fund the account and retry the same blocked request.

Step 3: Register your project

curl -sS "$BASE_URL/api/v1/projects" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "my-ml-project", "remote_dir": "/root/my-ml-project", "env": {"CLICKHOUSE_HOST": "db.internal"}, "setup_timeout_s": 1200 }'

remote_dir is where Computalot extracts your code. env is an optional map of runtime env vars. setup_timeout_s overrides the default 600s setup timeout.

Step 4: Create your project files

Your project needs a Dockerfile, computalot.project.json, and your code:

my-ml-project/ ├── Dockerfile ├── computalot.project.json └── job.py

Projects run as sandboxed OCI containers. See the Project Manifest docs for the full manifest schema.

FROM python:3.11-slim WORKDIR /workspace COPY . .
# job.py import json, os payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"])) result = {"status": "ok", "source": "getting-started"} json.dump(result, open(os.environ["COMPUTALOT_TASK_RESULT"], "w"))

Minimal manifest:

{ "version": 1, "runtime": { "kind": "oci", "sandbox": "gvisor", "workdir": "/workspace" }, "entrypoint": { "command": ["python", "job.py"] } }

Full schema and examples: https://dev.computalot.com/docs/projects/project-manifest

Step 5: Upload the project

cd my-ml-project tar czf ../code.tar.gz . # Upload the tarball as the raw request body curl -sS "$BASE_URL/api/v1/projects/my-ml-project/push" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ --data-binary @../code.tar.gz

After a successful push, the latest revision is published immediately. You can submit jobs right away; the first one may take longer while Computalot prepares runtime on demand.

Optional: if you want to prepare currently available workers ahead of time, call init manually:

curl -sS "$BASE_URL/api/v1/projects/my-ml-project/init" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{}'

Check status when you want to inspect published vs warm state:

curl -sS "$BASE_URL/api/v1/projects/my-ml-project/status" \ -H "Authorization: Bearer $TOKEN"

can_accept_new_jobs: true means the latest revision is published and can be submitted immediately. ready_for_jobs: true means Computalot finished platform-side runtime preparation. Neither field proves your application-level imports or credentials are valid — use manifest validation checks and run one small smoke job after setup changes.

GET /api/v1/projects/:name/status is the top-level readiness truth for the active revision. After a successful push, the new content hash is visible immediately and jobs can be submitted right away, even if that revision is still warming.

If you want the already-warm signal before a burst, wait for:

  • can_accept_new_jobs: true
  • init_state: "ready"
  • ready_for_jobs: true

Step 6: Submit a job

curl -sS "$BASE_URL/api/v1/jobs" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "type": "structured_runner", "runner_command": ["python3", "job.py"], "payload": {"test_case": "getting-started"}, "project": "my-ml-project", "timeout_s": 120, "requirements": { "cpu": 1, "memory_mb": 256, "profile": "cpu" }, "reliability_mode": "strict_complete" }'

The submit response can include:

  • summary.billing_estimate
  • summary.billing_admission
  • summary.billing_hold

Treat that response as the authoritative estimate for that exact request.

Step 7: Read results and billing state

# Job status curl -sS "$BASE_URL/api/v1/jobs/<job_id>" \ -H "Authorization: Bearer $TOKEN" # Structured results curl -sS "$BASE_URL/api/v1/results/<job_id>" \ -H "Authorization: Bearer $TOKEN" # Aggregated stdout/stderr curl -sS "$BASE_URL/api/v1/jobs/<job_id>/output" \ -H "Authorization: Bearer $TOKEN" # Billing state after the run curl -sS "$BASE_URL/api/v1/account/balance" \ -H "Authorization: Bearer $TOKEN" curl -sS "$BASE_URL/api/v1/account/holds" \ -H "Authorization: Bearer $TOKEN" curl -sS "$BASE_URL/api/v1/account/ledger" \ -H "Authorization: Bearer $TOKEN"

GET /api/v1/results/<job_id> is the canonical result surface. It returns raw per-task results plus top-level summary, aggregate_result, aggregate_aliases, completeness, result_persisted, and output_persisted. For weighted fan-out jobs, summary also carries alias fields like avg_edge directly plus coverage fields such as weight_field, expected_weight, completed_weight, and pending_weight. The response also preserves public submission metadata like meta and variant, and each task may include project_content_hash so you can confirm which project version produced it.

GET /api/v1/jobs/<job_id>/output is the live aggregated diagnostics surface. During auto-retry, it preserves the most recent failed attempt’s output and error until the next attempt emits its own diagnostics, so jobs do not go blank between retries. If a worker/runtime failure happens before your command starts, that visible text can be platform preflight stderr rather than user-process stdout.

Step 8: Update your project

Use PUT /api/v1/projects/:name for metadata only. For code changes, push a new tarball, invalidate, then init again:

tar czf ../code.tar.gz . curl -sS "$BASE_URL/api/v1/projects/my-ml-project/push" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ --data-binary @../code.tar.gz curl -sS "$BASE_URL/api/v1/projects/my-ml-project/invalidate" \ -X POST \ -H "Authorization: Bearer $TOKEN" curl -sS "$BASE_URL/api/v1/projects/my-ml-project/init" \ -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{}'

If POST /api/v1/projects/:name/push returns 409, Computalot is already initializing or refreshing that project. Poll GET /api/v1/projects/:name/status until init_state is no longer initializing or refreshing, then retry the push.

If the uploaded body is gzip but not a valid tarball, POST /api/v1/projects/:name/push now returns 422 with error: "invalid tarball" instead of surfacing a generic server error.

If computalot.project.json references project files, command working directories, build inputs, or named cache mounts that do not exist, POST /api/v1/projects/:name/push now returns 422 before the new version is accepted.

When Computalot can read the previous tarball locally, a successful push response also includes tarball_diff with added_files, removed_files, and changed_files so clients can catch incomplete uploads immediately.

That same push response can already show the active-revision transition for the new code via fields such as ready_for_jobs: false, status_message, next_action, and init_status.init_state: "refreshing".


Journey 2: Fan-Out Parallelism

Use when: You want to run the same script on many different inputs in parallel — model evaluation, agent swarms, batch processing, CMA generations.

Runner script

Your script reads $COMPUTALOT_TASK_PAYLOAD and writes to $COMPUTALOT_TASK_RESULT:

# evaluate.py import json, os payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"])) model_name = payload["model"] score = run_evaluation(model_name, payload["dataset"]) with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f: json.dump({"model": model_name, "score": score}, f)

Option A: Fan-out by values

Split a list into one task per item:

curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d '{ "type": "structured_runner", "runner_command": ["python", "evaluate.py"], "payload": { "models": ["model-a", "model-b", "model-c", "model-d"], "config": {"n_trials": 100} }, "fan_out": {"by": "models", "batch_size": 2}, "merge_strategy": "keyed", "project": "my-ml-project", "timeout_s": 600 }'

Creates 4 tasks. Each gets: {"models": "model-a", "config": {"n_trials": 100}}. With batch_size (or batch_per_task), Computalot groups multiple fan-out items into one dispatched task and adds payload._batch metadata. For fan_out.by, the split field becomes that task’s sub-list. Resolved shared-state aliases land in payload._shared.values, and scalar aliases are also exposed to the runner as COMPUTALOT_SHARED_<NAME> env vars.

Option B: Fan-out by explicit items (CMA / evolutionary)

One explicit payload per candidate — you control the payloads exactly:

{ "type": "structured_runner", "runner_command": ["python", "evaluate.py"], "fan_out": {"items": [ {"params": [0.1, 0.5, 0.3], "generation": 12}, {"params": [0.2, 0.4, 0.6], "generation": 12}, {"params": [0.3, 0.3, 0.1], "generation": 12} ]}, "project": "my-proj" }

Creates 3 tasks, one per item. The client (your optimizer) owns state between generations.

Option C: Fan-out by chunks

Split a numeric range into N chunks:

curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d '{ "type": "structured_runner", "runner_command": ["python", "simulate.py"], "payload": {"total_seeds": 10000}, "fan_out": {"chunks": 20, "range_field": "total_seeds", "total": 10000}, "merge_strategy": "collect", "project": "my-ml-project", "timeout_s": 1800 }'

Creates 20 tasks, each with {"start": 0, "count": 500}, etc.

Fan-out contract

Supported public fan_out shapes are:

  • {"fan_out": {"by": "field"}}
  • {"fan_out": {"items": [{...}, {...}]}}
  • {"fan_out": {"chunks": N, "total": N, ...}}

These shapes are mutually exclusive. Mixing by, items, or chunks + total in one request returns 422; Choose exactly one shape before retrying the submit.

Merge strategies

  • "collect" (default) — all results in a list
  • "keyed" — results indexed by a key from each task’s payload (requires fan_out.by)
  • "weighted_avg" — weighted average of a numeric field (set both value_field and weight_field)

Result quality

Each task result includes result_quality (0.0–1.0) and result_warnings. You can define custom validation with result_schema:

{ "result_schema": { "required_fields": ["score"], "field_types": {"score": "number"}, "field_ranges": {"score": [0.0, 1.0]} } }

Journey 3: Parameter Search (Sweep)

Use when: You want to try every combination of parameters and rank results. Grid search, hyperparameter tuning.

curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d '{ "type": "sweep", "runner_command": ["python", "evaluate.py"], "project": "my-ml-project", "parameters": { "learning_rate": [0.001, 0.01, 0.1], "batch_size": [32, 64, 128] }, "fixed_payload": {"dataset": "cifar10", "epochs": 5}, "rank_by": "accuracy", "rank_order": "desc", "timeout_s": 3600 }'

Creates 9 tasks (3x3 cartesian product). Each task’s payload:

{"learning_rate": 0.001, "batch_size": 32, "dataset": "cifar10", "epochs": 5, "_sweep_idx": 0, "_sweep_params": {"learning_rate": 0.001, "batch_size": 32}}

Your runner writes the rank_by field to $COMPUTALOT_TASK_RESULT:

import json, os payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"])) accuracy = train_and_evaluate(payload["learning_rate"], payload["batch_size"]) with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f: json.dump({"accuracy": accuracy}, f)

Result: a ranked leaderboard in the job summary:

{ "results": [ {"params": {"learning_rate": 0.01, "batch_size": 64}, "result": {"accuracy": 0.95}, "rank": 1}, {"params": {"learning_rate": 0.001, "batch_size": 128}, "result": {"accuracy": 0.92}, "rank": 2} ], "best": {"params": {"learning_rate": 0.01, "batch_size": 64}, "result": {"accuracy": 0.95}, "rank": 1} }

Key fields: parameters (map → value lists, max 1000 combos), fixed_payload, rank_by (required), rank_order ("desc" default or "asc").


Journey 4: GPU Training with Live Progress

Use when: You’re running a long training job and want real-time progress updates plus resumable checkpoints.

JOB_ID=$(curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d '{ "type": "structured_runner", "runner_command": ["python", "train.py"], "payload": {"epochs": 100, "batch_size": 32}, "project": "my-ml-project", "timeout_s": 7200, "requirements": {"profile": "gpu", "gpu_count": 1, "gpu_memory_mb": 16384}, "checkpointing": {"enabled": true, "resume_from_latest": true} }' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])") # Stream progress (SSE) curl -sS -N -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/jobs/$JOB_ID/stream

The stream starts with a snapshot, then emits job, task, and event deltas. Running-task frames include live_feedback.output_tail, so you can show rolling stdout/stderr before the job finishes.

Report progress from your training script:

import json, os payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"])) resume_state = payload.get("_resume") or {} start_epoch = resume_state.get("epoch", 0) for epoch in range(start_epoch, 100): loss = train_one_epoch() # Computalot captures COMPUTALOT_PROGRESS lines and streams them to the SSE endpoint print( f"COMPUTALOT_PROGRESS:{json.dumps({'epoch': epoch, 'loss': loss, 'percent': epoch})}", flush=True, )

Save model artifacts to $COMPUTALOT_ARTIFACT_DIR:

import os, shutil model_path = os.path.join(os.environ['COMPUTALOT_ARTIFACT_DIR'], 'model.pt') torch.save(model.state_dict(), model_path)

Artifact IDs appear in the task result. Download: GET /api/v1/artifacts/:id.

If you include a checkpoint object in progress or result payloads, Computalot persists the latest checkpoint and injects it back into _resume on retry when checkpointing.resume_from_latest is enabled. When the checkpoint can be durably published as an artifact, task state also records fields like artifact_id, artifact_source, publish_status, and published_at, and retries rewrite _resume.checkpoint.path to the downloaded local checkpoint file automatically.

For live UIs, combine:

  • GET /api/v1/jobs/:id/stream — SSE updates for one job
  • GET /api/v1/jobs/watch?ids=id1,id2,... — one SSE connection for 2-100 jobs with terminal summaries, aggregate fields, and persistence flags
  • GET /api/v1/jobs/:id/tasks — per-task live_feedback, latest_progress, checkpoint, resume_state, runtime_s, health_status, plus the preserved last failed attempt while a retry is queued or running
  • GET /api/v1/results/:job_id — the canonical terminal result surface with per-task payload/result/output presence, completeness, and artifact IDs
  • GET /api/v1/jobs/:id/output — aggregated stdout/stderr that preserves the most recent failed attempt until the current attempt emits new diagnostics
  • GET /api/v1/jobs/:id — job-level feedback_summary and checkpoint summary

Public job/task/watch/result payloads keep submitted payload, meta, variant, aggregate fields, and artifact IDs, but they redact placement-only fields such as current_node, provider IDs, runtime paths, and image refs/digests.


Journey 5: Multi-Stage Pipelines

Use when: Step 2 depends on step 1’s output.

# Step 1: Train JOB1=$(curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d '{ "type": "structured_runner", "runner_command": ["python", "train.py"], "payload": {"epochs": 50, "lr": 0.001}, "project": "my-ml-project", "timeout_s": 7200, "gpu_required": true }' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])") # Step 2: Evaluate (waits for step 1) curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d "{ \"type\": \"structured_runner\", \"runner_command\": [\"python\", \"evaluate.py\"], \"payload\": {\"model_path\": \"/root/my-ml-project/model.pt\"}, \"depends_on\": [\"$JOB1\"], \"project\": \"my-ml-project\", \"timeout_s\": 600 }"

Step 2 stays queued until step 1 completes. If step 1 fails, step 2 auto-cancels.

Passing files between stages (Artifacts)

# Upload artifact after step 1 ART_ID=$(curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/octet-stream" \ -H "X-Artifact-Filename: model.pt" \ --data-binary @model.pt \ https://dev.computalot.com/api/v1/artifacts | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])") # Reference in step 2's payload

Or use _artifacts.download in the payload for automatic pre-task download:

{ "payload": { "_artifacts": {"download": {"dataset": "art_abc123"}}, "model_type": "base" } }

Computalot resolves downloads before task execution (env var COMPUTALOT_ARTIFACT_<key> points to local path). For staged pipelines, downstream jobs can reference a named artifact produced by an upstream dependency instead of hardcoding an artifact ID:

{ "depends_on": ["job_setup_123"], "payload": { "_artifacts": { "download": { "dataset": {"job_id": "job_setup_123", "artifact": "dataset"} } } } }

For long-running ML or evaluation jobs, do not stop at _artifacts.download alone:

  • use manifest data_sources for immutable remote inputs that should be prepared before launch
  • use manifest cache_mounts for writable caches your code creates at runtime
  • if the input lives on Hugging Face and should stay read-only, prefer data_sources[].source = "huggingface" with delivery = "mount" so the worker uses hf-mount
  • if the runner downloads from Hugging Face itself, add a huggingface cache mount so HF_HOME and TRANSFORMERS_CACHE persist per worker; ad hoc runtime downloads do not use hf-mount automatically

For smaller coordination data, use project-scoped shared state plus dispatch-time injection:

curl -sS -X PUT \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/projects/my-project/kv/dataset_ready \ -d '{"value": {"status": "ready"}}'
{ "depends_on": ["job_setup_123"], "payload": { "_shared": { "resolve": { "dataset_ready": {"key": "dataset_ready"}, "best_score": {"job_id": "job_setup_123", "path": "results.0.score"} } } } }

Journey 6: Comparing Strategies (Benchmark)

Use when: You want to compare 2+ named strategies with replicas for statistical significance.

curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d '{ "type": "benchmark", "runner_command": ["python", "evaluate.py"], "project": "my-project", "candidates": { "strategy_a": {"model": "gpt4", "temperature": 0.7}, "strategy_b": {"model": "claude", "temperature": 0.5}, "baseline": {"model": "random"} }, "shared_payload": {"dataset": "test_set_v3", "n_trials": 100}, "replicas": 3, "rank_by": "score", "timeout_s": 1800 }'

Creates 9 tasks (3 candidates x 3 replicas). Each payload:

{"dataset": "test_set_v3", "n_trials": 100, "model": "gpt4", "temperature": 0.7, "_candidate": "strategy_a", "_replica": 1}

To vary a field per replica: "replica_vary": {"field": "seed_base", "stride": 1000}.

Result: leaderboard with per-candidate statistics:

{ "leaderboard": [ {"candidate": "strategy_a", "mean": 0.92, "std": 0.03, "min": 0.89, "max": 0.95, "count": 3, "rank": 1}, {"candidate": "strategy_b", "mean": 0.85, "std": 0.02, "min": 0.83, "max": 0.87, "count": 3, "rank": 2}, {"candidate": "baseline", "mean": 0.50, "std": 0.05, "min": 0.45, "max": 0.55, "count": 3, "rank": 3} ] }

Key fields: candidates (map, min 2), shared_payload, replicas (default 1, max 100), rank_by (required), rank_order.

Sweep vs Benchmark: Use sweep for exploring a parameter grid. Use benchmark for comparing named alternatives with replicas for statistical confidence.


Journey 7: Monte Carlo / Simulations (Map-Reduce)

Use when: You want to split a range into chunks, process in parallel, and aggregate with operators.

curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs \ -d '{ "type": "map_reduce", "runner_command": ["python", "evaluate_seeds.py"], "project": "my-project", "payload": {"strategy": "momentum"}, "split": {"field": "seed", "start": 0, "total": 10000, "chunks": 50}, "reduce": { "total_pnl": "sum", "sharpe_ratio": "weighted_avg:sample_count", "max_drawdown": "max" }, "timeout_s": 7200 }'

Creates 50 tasks. Each gets: {"strategy": "momentum", "seed_start": 0, "seed_count": 200}.

For non-contiguous ranges, use explicit split.ranges instead of start + total + chunks:

{ "type": "map_reduce", "runner_command": ["python", "evaluate_seeds.py"], "project": "my-project", "payload": {"strategy": "momentum"}, "split": { "field": "seed", "ranges": [ {"start": 860791000, "count": 1000}, {"start": 200000000, "count": 1000}, {"start": 500000000, "count": 1000} ] }, "reduce": { "avg_edge": "mean" } }

Your runner:

import json, os payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"])) results = run_simulation(payload["strategy"], payload["seed_start"], payload["seed_count"]) with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f: json.dump({"total_pnl": results.pnl, "sharpe_ratio": results.sharpe, "max_drawdown": results.drawdown, "sample_count": payload["seed_count"]}, f)

Result: reduced values in the job summary:

{ "reduced": { "total_pnl": 15234.50, "sharpe_ratio": 1.87, "max_drawdown": 0.23 } }

Key fields: split ({field, start, total, chunks}), reduce (map of field → operator).

Reduce operators: sum, mean, max, min, weighted_avg:<weight_field>, concat, count, collect.


Heavy Jobs (Large Data, Checkpoints, Training)

For GB-scale datasets, large checkpoints, and long training runs:

Inputs:

  • Keep payload small — do not embed large data in JSON.
  • Use _artifacts.download for large inputs. Workers download and cache before launch.
  • Resolved paths: payload._artifacts.local_paths. Single-file downloads also get COMPUTALOT_ARTIFACT_<NAME> env vars.

Outputs:

  • Write checkpoints and files to $COMPUTALOT_ARTIFACT_DIR.
  • Use _artifacts.upload for named uploads or direct upload to external storage. Computalot-managed uploads now prefer direct or multipart object-store transfer and only fall back to controller relay if the direct path is unavailable.
  • If a JSON result is too large, Computalot spills it to an artifact and returns result_spilled, result_artifact_id, result_filename; those spill uploads also prefer the direct object-store path.

Operational:

  • Set timeout_s above expected runtime with margin.
  • Submit jobs after push; use POST /api/v1/projects/:name/init only if you want to prepare currently available workers ahead of time.
  • Write checkpoints and outputs to $COMPUTALOT_ARTIFACT_DIR. Use $COMPUTALOT_TASK_SCRATCH_DIR or $TMPDIR for temp files.
  • Use external/object storage for multi-GB datasets and model bundles.

Project Setup

Projects run as sandboxed OCI containers. The lifecycle is:

  1. POST /api/v1/projects — register
  2. POST /api/v1/projects/:name/push — upload tarball with Dockerfile + computalot.project.json + your code
  3. POST /api/v1/jobs — submit work against the published revision
  4. Optional: POST /api/v1/projects/:name/init — prepare currently available workers
  5. GET /api/v1/projects/:name/status — inspect published vs warm state
  • Project init is free but requires $5 available balance
  • Init is asynchronous
  • After a push, can_accept_new_jobs can already be true even while ready_for_jobs is still false
  • After a code change, push the new tarball; use invalidate only if you want to discard old prepared runtimes

Project structure

my-project/ ├── Dockerfile ├── computalot.project.json ├── requirements.txt └── job.py

Dockerfile

Install dependencies in your Dockerfile:

FROM python:3.11-slim WORKDIR /workspace COPY requirements.txt . RUN pip install -r requirements.txt COPY . .

computalot.project.json

See the Project Manifest docs for the full schema.

Tips

  1. Run one small smoke job after setup changes before submitting a large batch
  2. Use manifest validation section for runtime checks (executables, files, commands)

Public project endpoints

  • POST /api/v1/projects
  • PUT /api/v1/projects/:name for metadata-only updates
  • POST /api/v1/projects/:name/push
  • POST /api/v1/projects/:name/init
  • POST /api/v1/projects/:name/invalidate
  • GET /api/v1/projects/:name/status
  • GET /api/v1/projects/:name/status/details
  • GET /api/v1/projects

Debugging failed setup

curl -sS -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/projects/my-project/status curl -sS -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/projects/my-project/status/details

Fix your Dockerfile/manifest, push, invalidate, re-init:

tar czf code.tar.gz . && \ curl -sS -X POST -H "Authorization: Bearer $TOKEN" --data-binary @code.tar.gz \ https://dev.computalot.com/api/v1/projects/my-project/push && \ curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/projects/my-project/invalidate && \ curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/projects/my-project/init -d '{}'

Runner Protocol (All Job Types)

All runner-based types use this contract:

  1. Computalot launches your runner_command with task-specific payload
  2. Payload written to temp file; path in $COMPUTALOT_TASK_PAYLOAD
  3. Result file path in $COMPUTALOT_TASK_RESULT
  4. Your script writes JSON result to $COMPUTALOT_TASK_RESULT
  5. Progress: print COMPUTALOT_PROGRESS:{json} to stdout

COMPUTALOT_* env vars are the Computalot runtime protocol.

Normal stdout/stderr is surfaced live through live_feedback.output_tail on task APIs and SSE streams. If your runner wraps another process, keep the child unbuffered or flush explicitly so Computalot can forward logs promptly.

Task env order: base runtime → project env files (.computalot.env, computalot.env, .env) → project env map → meta.env overrides. If .venv/bin/python exists, Computalot prepends .venv/bin to PATH.

Exit codes: 0 = success, non-zero = failure. Last ~1000 chars captured as error (tail, not head — preserves tracebacks). Full output (up to 10KB) stored per task.

Payload varies by type:

  • structured_runner: the payload field. Chunk fan-out adds {start, count}.
  • sweep: fixed_payload + parameter combination + _sweep_idx + _sweep_params
  • map_reduce: payload + {field_start, field_count} chunk boundaries
  • benchmark: shared_payload + candidate config + {_candidate, _replica}

Allowed executables: python, python3, node, deno, bun, ruby, julia, Rscript, uv, pip, npm, npx, cargo, rustc. Shell executables (bash, sh, zsh) are blocked.


Job Lifecycle

Statuses: planningqueuedrunningcompleted | partial | failed | cancelled

  • Terminal states: completed, partial, failed, cancelled
  • Poll: GET /api/v1/jobs/:id every 2-5s until terminal
  • Stream: GET /api/v1/jobs/:id/stream for SSE updates
  • Multi-job watch: GET /api/v1/jobs/watch?ids=id1,id2,... for one SSE stream covering 2-100 jobs
  • Canonical terminal results: GET /api/v1/results/:job_id
  • Per-task progress + retry continuity: GET /api/v1/jobs/:id/tasks
  • Aggregated output continuity: GET /api/v1/jobs/:id/output
  • Cancel: PUT /api/v1/jobs/:id/cancel with {"reason": "..."}
  • Auto-retry: set max_retries on submission
  • Jobs stuck in running with 0 active tasks auto-recover every 5 minutes
  • Jobs queued > 2x timeout_s (min 30 min) auto-cancel only when Computalot has live capacity; fleet-wide outages do not trigger queue-timeout cancellation by themselves
  • partial = some tasks failed/cancelled, OR all completed but some have low quality (< 0.5)
  • strict_complete is the recommended mode for research-sensitive runs where partial completion is not acceptable
  • Optional priority: "high" | "normal" | "low" biases scheduling between otherwise comparable jobs. Guaranteed reservations still take precedence.
  • Submit responses can include hold and admission metadata so the client can reason about cost before execution starts
  • Public job/task/watch/result payloads keep submitted payload, meta, variant, aggregate fields, and artifact IDs, but they redact placement-only fields such as current_node, provider IDs, runtime paths, and image refs/digests

Result Quality Validation

Computalot validates results and computes quality scores (0.0-1.0) per task.

  • Per-task: result_quality, result_warnings, result_present, output_present in GET /api/v1/results/:job_id
  • Per-job result surface: summary.aggregate_result, summary.aggregate_aliases, summary.completeness, and top-level aggregate_result / aggregate_aliases / completeness in GET /api/v1/results/:job_id
  • Per-job: summary.quality in GET /api/v1/jobs/:id with mean_quality, suspect_count, suspect_task_ids
  • Default schemas: sweep/benchmark require rank_by field as number; map_reduce requires all reduce fields
  • Custom: Add result_schema to job submission

Quality is advisory — all results are stored regardless.

Job Tags & Filtering

{"type": "sweep", "tags": ["experiment_42", "lr_search"], ...}

Filter: GET /api/v1/jobs?tag=experiment_42.

Job Priority

{"priority": "high"}

Use high for latency-sensitive fan-out or evaluation jobs that should start ahead of ordinary background work. Use low for background submissions. Jobs default to normal.

Batch Submission

Up to 200 jobs in one request:

curl -sS -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs/batch \ -d '{"jobs": [ {"type": "structured_runner", "runner_command": ["python", "eval.py"], "project": "my-proj", "payload": {"lr": 0.001}, "tags": ["sweep_72"]}, {"type": "structured_runner", "runner_command": ["python", "eval.py"], "project": "my-proj", "payload": {"lr": 0.01}, "tags": ["sweep_72"]} ]}'

Response: 201 (all ok) or 207 (partial): {jobs, submitted, errors, error_count}.

Webhook Callbacks

{"type": "structured_runner", "callback_url": "https://your-server.com/webhook", ...}

Computalot POSTs {event, job_id, status, project, type, output, error, tags, progress, completed_at} with 2 retries. The callback originates from the Computalot controller — localhost URLs only work if routable from the server.

Job Dependencies (DAG)

{"depends_on": ["job_20260312_143000_abc123"]}

Tasks not dispatched until all dependencies reach completed. If a dependency fails, dependent jobs auto-cancel.

Streaming Progress

SSE for one job

curl -sS -N -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/jobs/<job_id>/stream

Starts with snapshot, then incremental job, task, event frames. Ends with done.

Running-task frames include live_feedback.output_tail, which is the fastest public surface for live stdout/stderr.

SSE for multiple jobs

curl -sS -N -H "Authorization: Bearer $TOKEN" \ "https://dev.computalot.com/api/v1/jobs/watch?ids=<id1>,<id2>,<id3>"

Max 100 jobs. Idle periods emit ping. Ends with done when all terminal.

Terminal job frames include client_ref, tags, meta, variant, summary, aggregate_result, aggregate_aliases, completeness, and result_persisted / output_persisted when available, so a client can often avoid a follow-up result fetch. For weighted fan-out jobs that means fields like avg_edge can be present directly in the terminal SSE payload.

SSE for a whole project

curl -sS -N -H "Authorization: Bearer $TOKEN" \ "https://dev.computalot.com/api/v1/projects/<project>/stream"

One connection for all jobs in a project. Reconnect after 1h timeout.

Common Agent Patterns

Poll for completion:

while true; do STATUS=$(curl -sS -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/jobs/<job_id> | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])") case $STATUS in completed|partial|failed|cancelled) break ;; esac sleep 5 done

Read structured results:

curl -sS -H "Authorization: Bearer $TOKEN" https://dev.computalot.com/api/v1/results/<job_id>

Cancel a job:

curl -sS -X PUT -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/jobs/<job_id>/cancel \ -d '{"reason":"no longer needed"}'

Update project code:

tar czf code.tar.gz . && \ curl -sS -X POST -H "Authorization: Bearer $TOKEN" --data-binary @code.tar.gz \ https://dev.computalot.com/api/v1/projects/my-project/push && \ curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/projects/my-project/invalidate && \ curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/projects/my-project/init -d '{}'

Debugging Failed Jobs

  1. Job error: GET /api/v1/jobs/:iderror and recommended_action
  2. Per-task details: GET /api/v1/jobs/:id/taskserror, output (up to 10KB), structured failure result with failure_kind, exit_code, plus latest_progress, checkpoint, and resume_state for long-running jobs. During auto-retry, queued/running tasks keep the most recent failed attempt’s diagnostics visible until the current attempt emits its own output.
  3. Live stream: GET /api/v1/jobs/:id/stream — SSE updates
  4. Timeline: GET /api/v1/jobs/:id/events — state change events
  5. Project readiness: GET /api/v1/projects/:name/status
  6. Diagnostics: GET /api/v1/projects/:name/status/details
  7. Billing state: GET /api/v1/account/balance, GET /api/v1/account/holds, GET /api/v1/account/ledger
SymptomCauseFix
402 Payment Required on top-up or shortfall flowaccount needs more creditspay the returned x402 quote and retry
project init rejected before setup startsavailable balance below funded floortop up to at least $5, then retry init
”No native library found”Missing system libraryFix Dockerfile, push, invalidate, re-init
exit_code_1, useless errorTruncated errorCheck per-task output field (full 10KB).
task looks blank while retryingcurrent attempt has not emitted anything yetCheck GET /api/v1/jobs/:id/output or GET /api/v1/jobs/:id/tasks for the preserved last failed attempt diagnostics.
task failed before user code printed anythingworker/runtime preflight failed firstThe visible output / error can be platform stderr rather than user stdout.
Cargo/Rust toolchain brokenComputalot worker issueWait for auto-recovery, not your code
Tasks stuck in queuedCold start or capacity catch-upCheck project status and job diagnostics; the first job may be waiting while runtime preparation happens on demand
Project ready but tasks failDockerfile missing deps or imports not checkedFix Dockerfile, add manifest validation checks
401 / DB timeout after warmupCredentials issueAdd auth check to manifest validation

Artifact API

Content-addressed store for passing files between jobs. Supports controller-streamed local uploads (up to 2GB), presigned direct-to-object-store uploads, resumable multipart uploads for very large files, and external URL references.

# Upload through the controller (streaming, max 2GB) curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/octet-stream" \ -H "X-Artifact-Filename: dataset.parquet" \ --data-binary @dataset.parquet \ https://dev.computalot.com/api/v1/artifacts # Request a presigned direct upload URL for a large file curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/artifacts/direct \ -d '{"sha256":"<lowercase_sha256>","size":123456789,"filename":"dataset.parquet","content_type":"application/octet-stream"}' # Upload bytes directly to the returned upload.url, then finalize registration curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/artifacts/direct/complete \ -d '{"sha256":"<lowercase_sha256>","size":123456789,"filename":"dataset.parquet","content_type":"application/octet-stream"}' # Start a resumable multipart upload for a very large file curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/artifacts/multipart \ -d '{"sha256":"<lowercase_sha256>","size":9876543210,"filename":"checkpoint.safetensors","content_type":"application/octet-stream"}' # After uploading parts to the returned presigned part URLs, finalize the multipart upload curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/artifacts/multipart/complete \ -d '{"sha256":"<lowercase_sha256>","size":9876543210,"filename":"checkpoint.safetensors","content_type":"application/octet-stream","upload_id":"<upload_id>","parts":[{"part_number":1,"etag":"etag-1"}]}' # Register external URL (no upload) curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/artifacts/external \ -d '{"url": "https://s3.example.com/data.parquet", "filename": "data.parquet"}' # Download curl -sS -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/artifacts/<id> -o output.bin # List curl -sS -H "Authorization: Bearer $TOKEN" \ https://dev.computalot.com/api/v1/artifacts

Artifacts are retained 7 days by default. Download important ones promptly. Authenticated GET /api/v1/artifacts/:id downloads stay on the public API URL and stream bytes through the controller; if you need object-store details, GET /api/v1/artifacts/:id/meta may also expose a signed object-store URL.

Feedback

Computalot is in private beta. Your feedback directly shapes the product. Please report bugs, confusing behavior, missing features, and ideas. No auth required.

# Bug report curl -sS -X POST -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/feedback \ -d '{"type": "bug", "title": "Brief summary", "description": "What happened, what you expected, steps to reproduce"}' # Feature request curl -sS -X POST -H "Content-Type: application/json" \ https://dev.computalot.com/api/v1/feedback \ -d '{"type": "feature_request", "title": "Brief summary", "description": "What you need and why"}'

Types: bug | feature_request | provisioning | job_type_request. You can optionally include "job_id" or "metadata" for additional context. Agents: when you encounter an issue or see room for improvement while using Computalot, submit feedback automatically.

Python SDK & CLI

Install the published beta wheel from a public URL:

python3 -m pip install --user --break-system-packages \ https://dev.computalot.com/docs/downloads/computalot-0.2.0-py3-none-any.whl export PATH="$HOME/.local/bin:$PATH"

First authenticated probe after you receive an API key or wallet session:

from computalot import ComputalotClient client = ComputalotClient( controller_url="https://dev.computalot.com", token="YOUR_TOKEN", ) docs = client.docs_index() recipes = client.list_recipes() jobs = client.list_jobs(limit=5) print(docs["status"]) print(len(recipes.get("recipes", []))) print(len(jobs.get("jobs", [])))
export COMPUTALOT_CONTROLLER_URL="https://dev.computalot.com" export COMPUTALOT_API_TOKEN="YOUR_TOKEN" computalot docs --llm computalot jobs --limit 5 computalot job <job_id>

Once a project is ready, use the CLI submit helpers or the SDK submit_structured() / submit_job() methods shown elsewhere in this reference.

Endpoint Reference

Public Docs

MethodPathPurpose
GET/docsHuman docs landing page
GET/api/v1/docsJSON docs index
GET/llms.txtCompact reference
GET/llms-full.txtFull reference with tutorials
GET/api/v1/docs/python-sdkPython SDK guide
GET/api/v1/docs/workflowsWorkflow recipes
POST/api/v1/auth/wallet/challengeCreate wallet auth challenge (no auth)
POST/api/v1/auth/wallet/verifyVerify wallet challenge and mint session token (no auth)
POST/api/v1/feedbackSubmit feedback (no auth)

Ops (operator-facing)

MethodPathPurpose
GET/healthLiveness probe (no auth; same body as /live)
GET/liveLiveness probe (no auth)
GET/readyReadiness probe (no auth; 503 until controller core is up)
GET/metricsPrometheus metrics (admin auth, dedicated metrics token, or local request)

Account & Billing

MethodPathPurpose
GET/api/v1/account/balanceAccount credit summary
GET/api/v1/account/ledgerSettled ledger entries
GET/api/v1/account/holdsActive and historical holds
GET/api/v1/account/quotesFunding and shortfall quotes
POST/api/v1/account/quotes/topupCreate x402 top-up quote (402 Payment Required)
POST/api/v1/account/quotes/:quote_id/pay/x402Settle x402 quote and credit the account

Job API

MethodPathPurpose
POST/api/v1/jobsSubmit a job
POST/api/v1/jobs/batchSubmit up to 200 jobs
GET/api/v1/jobsList jobs (?status=&project=&tag=&limit=50&offset=0)
GET/api/v1/jobs/:idFull job state with feedback_summary and checkpoint summary
GET/api/v1/jobs/:id/outputStdout/stderr, including preserved last-failed-attempt diagnostics during retries
GET/api/v1/jobs/:id/tasksPer-task details, errors, live feedback, checkpoint state, and preserved retry diagnostics
GET/api/v1/jobs/:id/eventsLifecycle events
GET/api/v1/jobs/:id/metricsAggregate metrics
GET/api/v1/jobs/:id/streamSSE stream for one job, including task live_feedback.output_tail deltas
GET/api/v1/jobs/watch?ids=a,b,cSSE stream for multiple jobs (max 100, with ping keepalives, metadata, and persistence flags)
GET/api/v1/projects/:name/streamSSE stream for all jobs in a project
PUT/api/v1/jobs/:id/cancelCancel a job

Results & Artifacts

MethodPathPurpose
GET/api/v1/results/:job_idPer-task results plus metadata, aggregate_result, aggregate_aliases, completeness, and persistence flags; use job/task endpoints for live retry-loop diagnostics
GET/api/v1/resultsList recent terminal results (?limit=20&offset=0, paginated). Filters: job_id, ids, project, client_ref, tag, user_id, recipe_cache_*, group_by, include_tasks. Malformed limit/offset → 422.
POST/api/v1/artifactsUpload artifact (raw body, max 2GB)
POST/api/v1/artifacts/directGet a presigned direct-upload URL for object storage
POST/api/v1/artifacts/direct/completeFinalize a direct-uploaded object-store artifact
POST/api/v1/artifacts/multipartStart a resumable multipart direct upload
POST/api/v1/artifacts/multipart/partGet a presigned PUT URL for one multipart chunk
GET/api/v1/artifacts/multipart/partsList uploaded multipart chunks for resume
POST/api/v1/artifacts/multipart/completeFinalize a multipart direct upload and register the artifact
POST/api/v1/artifacts/multipart/abortAbort an in-flight multipart upload
POST/api/v1/artifacts/externalRegister external URL artifact
GET/api/v1/artifactsList artifacts
GET/api/v1/artifacts/:idDownload artifact (authenticated requests stay on the public API URL; metadata may expose a signed object-store URL)
GET/api/v1/artifacts/:id/metaArtifact metadata
DELETE/api/v1/artifacts/:idDelete artifact

Project API

MethodPathPurpose
POST/api/v1/projectsRegister project (name, remote_dir; optional env, setup_timeout_s)
GET/api/v1/projectsList projects
GET/api/v1/projects/:nameProject config + readiness status
PUT/api/v1/projects/:nameUpdate project metadata only (owner only)
DELETE/api/v1/projects/:nameDelete project (owner only)
POST/api/v1/projects/:name/pushUpload tarball (raw gzip, not multipart, max 100MB; returns 409 during active refresh, 422 for malformed tarballs or invalid manifest references, and may include tarball_diff on success)
PUT/api/v1/projects/:name/cancel-queuedCancel queued/planning jobs for one project (optional tag filter)
POST/api/v1/projects/:name/initPrepare currently available workers (async)
POST/api/v1/projects/:name/invalidateForce re-init
GET/api/v1/projects/:name/statusProject readiness
GET/api/v1/projects/:name/status/detailsReadiness + diagnostics
GET/api/v1/projects/:name/kvList project-scoped shared state entries
PUT/api/v1/projects/:name/kv/:keyWrite project-scoped shared state value
GET/api/v1/projects/:name/kv/:keyRead project-scoped shared state value
DELETE/api/v1/projects/:name/kv/:keyDelete project-scoped shared state value
GET/api/v1/projects/:name/streamSSE stream for project jobs

Other

MethodPathPurpose
POST/api/v1/auth/registerDisabled self-service API-key issuance (403) with waitlist + beta-access guidance
GET/api/v1/leases/:job_idReservation status
GET/api/v1/presetsResource presets
Last updated on