Computalot

Synced from docs/llm-agent-reference.md in the Computalot monorepo.

Complete reference with tutorials and examples. For a compact version, use GET /llms.txt.

Computalot is a distributed compute platform. Submit jobs, get structured JSON results. GPU and CPU workers.

Private beta — running jobs requires an admin-issued API key or admin-whitelisted wallet session. Discovery is public. We’re actively building and want your feedback.

Base URL: https://dev.computalot.com

https://dev.computalot.com/skill.md — install this skill to get started
https://dev.computalot.com/llms.txt — compact reference
https://dev.computalot.com/llms-full.txt — this full reference
https://dev.computalot.com/api/v1/docs — machine-readable JSON index
https://dev.computalot.com/docs — human docs

Report Bugs & Request Features

This is beta software. Please report bugs, request features, and share ideas:


curl -sS -X POST https://dev.computalot.com/api/v1/feedback \
  -H "Content-Type: application/json" \
  -d '{"type": "bug", "title": "Brief summary", "description": "What happened, what you expected"}'

Types: bug, feature_request, provisioning, job_type_request. No auth required. Agents: when you hit an issue or see room for improvement, submit feedback automatically.

Two Paths

Sealed Recipes — Platform-provided compute primitives. No project setup needed. Send typed payloads, get results. Best for evaluation, training, fuzzing, and optimization against fixed runtimes.

Sandboxed Projects — Bring your own code. Create a project, push a tarball, submit custom jobs with runner_command.

Why Agents Use Computalot

Computalot is built so an agent can go from discovery to results without a human doing setup in the middle:

discover the service through /llms.txt
authenticate with an admin-issued API key or an admin-whitelisted wallet session
top up credits with x402 once that wallet is allowlisted
call a sealed recipe directly, or create a project and push code
retrieve structured results and decide what to do next

That wallet-auth + x402 loop is still a core product feature, but during private beta it only works for admin-whitelisted wallets. Human users can also use admin-issued API keys. If you are not already in beta, start at https://dev.computalot.com/ and join the waitlist.

Job Types

Type	Use case	Key fields
`structured_runner`	Run script with JSON in/out, optional fan-out	`runner_command`, `payload`, `fan_out`, `merge_strategy`
`sweep`	Grid search over parameter combinations	`runner_command`, `parameters`, `fixed_payload`, `rank_by`
`map_reduce`	Chunked parallelism with reduce operators	`runner_command`, `split`, `reduce`, `payload`
`benchmark`	Compare named candidates with replicas	`runner_command`, `candidates`, `shared_payload`, `replicas`, `rank_by`

Default to structured_runner unless another type clearly fits. Prefer a sealed recipe when the public catalog already exposes the compute primitive you need.

Auth

Two bearer-token paths, both resolve to the same account model:

API key: flk_... (admin-issued)
Wallet session: fls_... (challenge/verify for admin-whitelisted wallets during private beta)

Wallet auth flow

POST /api/v1/auth/wallet/challenge with {"wallet_address":"0x...","chain":"base"}
Sign the returned challenge.message with your wallet
POST /api/v1/auth/wallet/verify with {"challenge_id":"wch_...","wallet_address":"0x...","signature":"0x..."}
Use the returned token as Authorization: Bearer fls_...

Wallet auth creates or reuses an account linked to chain + wallet_address. That account owns all projects, jobs, results, and credits.

API keys

API keys (flk_...) work on all endpoints. Admin-issued during private beta.

No auth required: /health, /docs, /llms.txt, /llms-full.txt, /api/v1/docs/*, POST /api/v1/feedback, POST /api/v1/auth/register (returns 403 with beta guidance), POST /api/v1/auth/wallet/challenge, POST /api/v1/auth/wallet/verify. GET /metrics is operator-gated: local requests, admin auth, or a dedicated metrics token only.

Sealed Recipes

Sealed recipes are platform-owned compute primitives with:

a fixed tarball-backed runtime bundle
a fixed entrypoint
a typed payload schema
optional artifact input slots

Use a recipe when you want a published evaluator or scorer and do not want to upload or run your own command.

Use a project when you need to bring your own code, setup, and runner_command.

Discovery endpoints:

GET /api/v1/recipes
GET /api/v1/recipes/:name
GET /api/v1/docs/recipes

Current public recipes:

prop_amm
- operations: eval, eval_chunk, validate, validate_full, bpf_eval, bpf_eval_chunk, build, concavity_check
- note: validate is the fast default and skips native/BPF parity; use validate_full when you need the slower parity check
- note: build publishes compiled native/BPF artifacts; eval_chunk / bpf_eval_chunk expose deterministic seed windows
- typical inputs: strategy_ref or rs_source_b64, plus seed/range fields
packing
- operations: eval, eval_batch, feasible_optimize, basin_hopping, differential_evolution
- typical inputs: inline candidate arrays or candidate_ref / candidates_ref, plus bounded search knobs for the optimization operations
lightgbm_train
- operations: train, cross_validate
- note: shared tabular LightGBM recipe; uploads model/metrics artifacts for train and cross-validation metrics for cross_validate
- typical inputs: immutable dataset_ref, target_column, and bounded LightGBM hyperparameters such as seed, n_estimators, num_leaves, and folds
echidna
- operations: foundry_prebuilt, hardhat_prebuilt
- typical inputs: immutable prebuilt project_ref, contract, and bounded fuzzing fields such as test_mode, test_limit, and seed

Recipe jobs do not take a user-controlled runner_command. The recipe determines the runtime and command. Do not send type for recipe submissions unless you have a specific reason to override it. When recipe is set or you use POST /api/v1/recipes/:name/jobs, the API infers type: "structured_runner" automatically.

Example:


POST /api/v1/recipes/prop_amm/jobs
{
  "payload": {
    "operation": "validate",
    "strategy_ref": "art_123"
  },
  "timeout_s": 900
}

Billing

Computalot uses account credits. Jobs reserve a hold on submission and settle to actual usage on completion.

Supported beta access today is either an admin-issued API key or an admin-whitelisted wallet session. Both authenticate the same account model and the same billing surfaces.

GET /api/v1/account/balance — check credits
GET /api/v1/account/ledger — transaction history
GET /api/v1/account/holds — active holds
Project init is free but requires $5 available balance
Fund via x402: POST /api/v1/account/quotes/topup → pay → POST /api/v1/account/quotes/:id/pay/x402

Account billing endpoints

Authenticated callers can inspect billing state with:

GET /api/v1/account/balance
GET /api/v1/account/ledger
GET /api/v1/account/holds
GET /api/v1/account/quotes

Billing truth lives on GET /api/v1/account/balance, GET /api/v1/account/holds, GET /api/v1/account/ledger, and GET /api/v1/account/quotes.

GET /api/v1/account/balance returns the main numbers a client should care about:

ledger_balance_usd: total credited minus settled debits
held_usd: funds currently reserved for active jobs
available_usd: spendable balance after holds
active_hold_count
open_quote_count

How pricing works in practice

Users should think in terms of quotes and holds, not hidden infrastructure details.

Before a job is admitted, Computalot derives a submit-time estimate from:

the requested job type
planned task count and fan-out shape
requested requirements
requested timeout_s
resolved reliability_mode

That estimate becomes the hold. If the account cannot cover it, the job is rejected before it starts. Once admitted, the job is allowed to finish inside that reserved exposure instead of being killed mid-flight for routine billing reasons.

Submit-time billing summary

Job submit responses can include:

summary.billing_estimate
summary.billing_admission
summary.billing_hold

Important fields include:

inferred resource_class
inferred runtime_class
resolved reliability_mode
estimated_hold_usd
whether the account had enough available balance to admit the job

For long-running or expensive jobs, treat the submit response as the authoritative pricing signal for that exact request.

x402 funding flow

x402 is the public funding rail for autonomous wallets.

POST /api/v1/account/quotes/topup with a requested amount such as { "amount_usd": 5.0 } — the per-top-up cap is $10,000, submit multiple smaller top-ups for larger funding
Computalot returns 402 Payment Required
The response includes a PAYMENT-REQUIRED header and a top-up quote
The client pays and retries POST /api/v1/account/quotes/:quote_id/pay/x402
On success, the internal account balance increases and the server returns PAYMENT-RESPONSE

If a job submit or project-init gate fails for insufficient balance, Computalot can return the same 402 shape with a shortfall quote so the client can fund the exact gap and retry.

If POST /api/v1/projects/:name/init returns that shortfall quote, fund the account and retry POST /api/v1/projects/:name/init.
If POST /api/v1/jobs returns that shortfall quote, fund the account and retry the same submit request.

In practice, this means an agent does not need:

a subscription
a stored credit card
a pre-issued public API key

It can discover, fund, and use compute through the API itself.

Reliability mode

reliability_mode is a public submission field:

best_effort
strict_complete

Use strict_complete for research-sensitive fan-out work, sweeps, benchmarks, CMA generations, and training/evaluation batches where missing one task would invalidate the outcome.

Execution policy and placement

Jobs land in one of two execution modes, each with two placement options:

sandboxed (default) — your uploaded project code runs inside a gVisor sandbox.
- placement_policy = "shared" (default): reuses warm workers with other sandboxed tenants.
- placement_policy = "dedicated": holds a worker exclusively for your project for the reservation window.
sealed — only available for platform recipes (see GET /api/v1/recipes). Runs on sealed workers using the recipe’s prebuilt runtime.
- placement_policy defaults to shared. dedicated is available for workloads that need warm reuse.
- You cannot submit execution_policy = "sealed" with a user project — sealed is a recipe-only mode.
- You cannot override a platform recipe’s execution_policy. Submitting any value other than sealed on a platform recipe returns 422.

Both modes use the same submission surface. The fields are optional — leave them out to accept the defaults (sandboxed + shared for user projects, sealed + shared for platform recipes).

Core Model

A job is the user-visible unit of work you submit
Tasks are the parallel execution units Computalot creates from your job
You do not target infrastructure directly — submit jobs with a project and optional resource requirements, and Computalot handles placement
Terminal jobs and results are queryable for 30 days

Resource Requirements & Guaranteed Capacity

Submit minimum resource needs with your job — Computalot places work on matching capacity:


{
  "type": "structured_runner",
  "project": "my-ml-project",
  "runner_command": ["python", "train.py"],
  "payload": {"epochs": 3},
  "requirements": {
    "cpu": 8,
    "memory_mb": 16384,
    "storage_gb": 40,
    "gpu_count": 1,
    "gpu_memory_mb": 12288,
    "profile": "gpu"
  },
  "reservation": {
    "mode": "guaranteed",
    "parallelism": 2,
    "guaranteed_for_s": 1800,
    "max_wait_s": 0
  }
}

requirements are minimums. Computalot may place on larger machines.
profile: "cpu" or "gpu". CPU jobs can spill onto idle GPU capacity.
reservation.mode = "best_effort" is default.
reservation.mode = "guaranteed" is immediate admit-or-reject. Returns 409 if capacity unavailable.
Inspect reservation state: GET /api/v1/leases/:job_id.

How to request capacity well:

Ask for minimum real requirements. Oversized requests shrink eligible capacity and increase queue time.
Use profile: "gpu" only when the task truly needs GPU compute.
Use guaranteed only for short-window reserved parallelism. For normal batch work, best_effort is usually enough; the first job on a newly pushed revision may simply pay a cold-start cost.
Use POST /api/v1/projects/:name/init only when you explicitly want to prepare currently available workers ahead of time before a burst.
Use reliability_mode: "strict_complete" when missing one task would corrupt the final result set.
Treat the submit response as the pricing signal for that exact request. The hold estimate is derived from this shape.

This walks through the public end-to-end path from zero to a completed job using wallet auth and account credits.

This is the canonical beta onboarding path once your wallet is allowlisted or you already have an admin-issued API key: wallet sign-in, billing checks, x402 funding, project setup, job submit, and result retrieval.

Prerequisites


# Base URL
export BASE_URL="https://dev.computalot.com"
 
# Your wallet address
export WALLET_ADDRESS="0x1234567890abcdef1234567890abcdef12345678"

If you already have an admin-issued API key, you can skip the wallet flow and set:


export TOKEN="flk_..."

Otherwise, use wallet auth.

Step 1: Authenticate with your wallet

Ask for a challenge:


CHALLENGE_JSON=$(curl -sS "$BASE_URL/api/v1/auth/wallet/challenge" \
  -X POST \
  -H "Content-Type: application/json" \
  -d "{\"wallet_address\":\"$WALLET_ADDRESS\",\"chain\":\"base\"}")
echo "$CHALLENGE_JSON"

Sign challenge.message with your wallet provider or SDK, then verify:


export CHALLENGE_ID="wch_..."
export SIGNATURE="0xSIGNED_MESSAGE"
 
VERIFY_JSON=$(curl -sS "$BASE_URL/api/v1/auth/wallet/verify" \
  -X POST \
  -H "Content-Type: application/json" \
  -d "{
    \"challenge_id\":\"$CHALLENGE_ID\",
    \"wallet_address\":\"$WALLET_ADDRESS\",
    \"signature\":\"$SIGNATURE\"
  }")
echo "$VERIFY_JSON"

Extract the returned session token and use it for the remaining steps:


export TOKEN="fls_..."

Step 2: Fund your account if needed

Check balance and inspect the same billing truth surfaces if you need more detail:


curl -sS "$BASE_URL/api/v1/account/balance" \
  -H "Authorization: Bearer $TOKEN"
 
curl -sS "$BASE_URL/api/v1/account/holds" \
  -H "Authorization: Bearer $TOKEN"
 
curl -sS "$BASE_URL/api/v1/account/ledger" \
  -H "Authorization: Bearer $TOKEN"
 
curl -sS "$BASE_URL/api/v1/account/quotes" \
  -H "Authorization: Bearer $TOKEN"

If available_usd is below the minimum funded floor, request a top-up quote:


curl -sS "$BASE_URL/api/v1/account/quotes/topup" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "amount_usd": 5.0,
    "description": "initial project setup and first job"
  }'

That returns 402 Payment Required. An x402-capable client should then pay and retry:


curl -sS "$BASE_URL/api/v1/account/quotes/<quote_id>/pay/x402" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "PAYMENT-SIGNATURE: <x402 payment payload>"

If project init or job submit later returns a shortfall quote instead of admitting the request, fund the account and retry the same blocked request.

Step 3: Register your project


curl -sS "$BASE_URL/api/v1/projects" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-ml-project",
    "remote_dir": "/root/my-ml-project",
    "env": {"CLICKHOUSE_HOST": "db.internal"},
    "setup_timeout_s": 1200
  }'

remote_dir is where Computalot extracts your code. env is an optional map of runtime env vars. setup_timeout_s overrides the default 600s setup timeout.

Step 4: Create your project files

Your project needs a Dockerfile, computalot.project.json, and your code:


my-ml-project/
├── Dockerfile
├── computalot.project.json
└── job.py

Projects run as sandboxed OCI containers. See the Project Manifest docs for the full manifest schema.


FROM python:3.11-slim
WORKDIR /workspace
COPY . .


# job.py
import json, os
 
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
result = {"status": "ok", "source": "getting-started"}
json.dump(result, open(os.environ["COMPUTALOT_TASK_RESULT"], "w"))

Minimal manifest:


{
  "version": 1,
  "runtime": {
    "kind": "oci",
    "sandbox": "gvisor",
    "workdir": "/workspace"
  },
  "entrypoint": {
    "command": ["python", "job.py"]
  }
}

Full schema and examples: https://dev.computalot.com/docs/projects/project-manifest

Step 5: Upload the project


cd my-ml-project
tar czf ../code.tar.gz .
 
# Upload the tarball as the raw request body
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/push" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  --data-binary @../code.tar.gz

After a successful push, the latest revision is published immediately. You can submit jobs right away; the first one may take longer while Computalot prepares runtime on demand.

Optional: if you want to prepare currently available workers ahead of time, call init manually:


curl -sS "$BASE_URL/api/v1/projects/my-ml-project/init" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

Check status when you want to inspect published vs warm state:


curl -sS "$BASE_URL/api/v1/projects/my-ml-project/status" \
  -H "Authorization: Bearer $TOKEN"

can_accept_new_jobs: true means the latest revision is published and can be submitted immediately. ready_for_jobs: true means Computalot finished platform-side runtime preparation. Neither field proves your application-level imports or credentials are valid — use manifest validation checks and run one small smoke job after setup changes.

GET /api/v1/projects/:name/status is the top-level readiness truth for the active revision. After a successful push, the new content hash is visible immediately and jobs can be submitted right away, even if that revision is still warming.

If you want the already-warm signal before a burst, wait for:

can_accept_new_jobs: true
init_state: "ready"
ready_for_jobs: true

Step 6: Submit a job


curl -sS "$BASE_URL/api/v1/jobs" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "structured_runner",
    "runner_command": ["python3", "job.py"],
    "payload": {"test_case": "getting-started"},
    "project": "my-ml-project",
    "timeout_s": 120,
    "requirements": {
      "cpu": 1,
      "memory_mb": 256,
      "profile": "cpu"
    },
    "reliability_mode": "strict_complete"
  }'

The submit response can include:

summary.billing_estimate
summary.billing_admission
summary.billing_hold

Treat that response as the authoritative estimate for that exact request.

Step 7: Read results and billing state


# Job status
curl -sS "$BASE_URL/api/v1/jobs/<job_id>" \
  -H "Authorization: Bearer $TOKEN"
 
# Structured results
curl -sS "$BASE_URL/api/v1/results/<job_id>" \
  -H "Authorization: Bearer $TOKEN"
 
# Aggregated stdout/stderr
curl -sS "$BASE_URL/api/v1/jobs/<job_id>/output" \
  -H "Authorization: Bearer $TOKEN"
 
# Billing state after the run
curl -sS "$BASE_URL/api/v1/account/balance" \
  -H "Authorization: Bearer $TOKEN"
 
curl -sS "$BASE_URL/api/v1/account/holds" \
  -H "Authorization: Bearer $TOKEN"
 
curl -sS "$BASE_URL/api/v1/account/ledger" \
  -H "Authorization: Bearer $TOKEN"

GET /api/v1/results/<job_id> is the canonical result surface. It returns raw per-task results plus top-level summary, aggregate_result, aggregate_aliases, completeness, result_persisted, and output_persisted. For weighted fan-out jobs, summary also carries alias fields like avg_edge directly plus coverage fields such as weight_field, expected_weight, completed_weight, and pending_weight. The response also preserves public submission metadata like meta and variant, and each task may include project_content_hash so you can confirm which project version produced it.

GET /api/v1/jobs/<job_id>/output is the live aggregated diagnostics surface. During auto-retry, it preserves the most recent failed attempt’s output and error until the next attempt emits its own diagnostics, so jobs do not go blank between retries. If a worker/runtime failure happens before your command starts, that visible text can be platform preflight stderr rather than user-process stdout.

Step 8: Update your project

Use PUT /api/v1/projects/:name for metadata only. For code changes, push a new tarball, invalidate, then init again:


tar czf ../code.tar.gz .
 
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/push" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  --data-binary @../code.tar.gz
 
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/invalidate" \
  -X POST \
  -H "Authorization: Bearer $TOKEN"
 
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/init" \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

If POST /api/v1/projects/:name/push returns 409, Computalot is already initializing or refreshing that project. Poll GET /api/v1/projects/:name/status until init_state is no longer initializing or refreshing, then retry the push.

If the uploaded body is gzip but not a valid tarball, POST /api/v1/projects/:name/push now returns 422 with error: "invalid tarball" instead of surfacing a generic server error.

If computalot.project.json references project files, command working directories, build inputs, or named cache mounts that do not exist, POST /api/v1/projects/:name/push now returns 422 before the new version is accepted.

When Computalot can read the previous tarball locally, a successful push response also includes tarball_diff with added_files, removed_files, and changed_files so clients can catch incomplete uploads immediately.

That same push response can already show the active-revision transition for the new code via fields such as ready_for_jobs: false, status_message, next_action, and init_status.init_state: "refreshing".

Journey 2: Fan-Out Parallelism

Use when: You want to run the same script on many different inputs in parallel — model evaluation, agent swarms, batch processing, CMA generations.

Runner script

Your script reads $COMPUTALOT_TASK_PAYLOAD and writes to $COMPUTALOT_TASK_RESULT:


# evaluate.py
import json, os
 
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
model_name = payload["model"]
score = run_evaluation(model_name, payload["dataset"])
 
with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f:
    json.dump({"model": model_name, "score": score}, f)

Option A: Fan-out by values

Split a list into one task per item:


curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d '{
    "type": "structured_runner",
    "runner_command": ["python", "evaluate.py"],
    "payload": {
      "models": ["model-a", "model-b", "model-c", "model-d"],
      "config": {"n_trials": 100}
    },
    "fan_out": {"by": "models", "batch_size": 2},
    "merge_strategy": "keyed",
    "project": "my-ml-project",
    "timeout_s": 600
  }'

Creates 4 tasks. Each gets: {"models": "model-a", "config": {"n_trials": 100}}. With batch_size (or batch_per_task), Computalot groups multiple fan-out items into one dispatched task and adds payload._batch metadata. For fan_out.by, the split field becomes that task’s sub-list. Resolved shared-state aliases land in payload._shared.values, and scalar aliases are also exposed to the runner as COMPUTALOT_SHARED_<NAME> env vars.

Option B: Fan-out by explicit items (CMA / evolutionary)

One explicit payload per candidate — you control the payloads exactly:


{
  "type": "structured_runner",
  "runner_command": ["python", "evaluate.py"],
  "fan_out": {"items": [
    {"params": [0.1, 0.5, 0.3], "generation": 12},
    {"params": [0.2, 0.4, 0.6], "generation": 12},
    {"params": [0.3, 0.3, 0.1], "generation": 12}
  ]},
  "project": "my-proj"
}

Creates 3 tasks, one per item. The client (your optimizer) owns state between generations.

Option C: Fan-out by chunks

Split a numeric range into N chunks:


curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d '{
    "type": "structured_runner",
    "runner_command": ["python", "simulate.py"],
    "payload": {"total_seeds": 10000},
    "fan_out": {"chunks": 20, "range_field": "total_seeds", "total": 10000},
    "merge_strategy": "collect",
    "project": "my-ml-project",
    "timeout_s": 1800
  }'

Creates 20 tasks, each with {"start": 0, "count": 500}, etc.

Fan-out contract

Supported public fan_out shapes are:

{"fan_out": {"by": "field"}}
{"fan_out": {"items": [{...}, {...}]}}
{"fan_out": {"chunks": N, "total": N, ...}}

These shapes are mutually exclusive. Mixing by, items, or chunks + total in one request returns 422; Choose exactly one shape before retrying the submit.

Merge strategies

"collect" (default) — all results in a list
"keyed" — results indexed by a key from each task’s payload (requires fan_out.by)
"weighted_avg" — weighted average of a numeric field (set both value_field and weight_field)

Result quality

Each task result includes result_quality (0.0–1.0) and result_warnings. You can define custom validation with result_schema:


{
  "result_schema": {
    "required_fields": ["score"],
    "field_types": {"score": "number"},
    "field_ranges": {"score": [0.0, 1.0]}
  }
}

Journey 3: Parameter Search (Sweep)

Use when: You want to try every combination of parameters and rank results. Grid search, hyperparameter tuning.


curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d '{
    "type": "sweep",
    "runner_command": ["python", "evaluate.py"],
    "project": "my-ml-project",
    "parameters": {
      "learning_rate": [0.001, 0.01, 0.1],
      "batch_size": [32, 64, 128]
    },
    "fixed_payload": {"dataset": "cifar10", "epochs": 5},
    "rank_by": "accuracy",
    "rank_order": "desc",
    "timeout_s": 3600
  }'

Creates 9 tasks (3x3 cartesian product). Each task’s payload:


{"learning_rate": 0.001, "batch_size": 32, "dataset": "cifar10", "epochs": 5, "_sweep_idx": 0, "_sweep_params": {"learning_rate": 0.001, "batch_size": 32}}

Your runner writes the rank_by field to $COMPUTALOT_TASK_RESULT:


import json, os
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
accuracy = train_and_evaluate(payload["learning_rate"], payload["batch_size"])
with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f:
    json.dump({"accuracy": accuracy}, f)

Result: a ranked leaderboard in the job summary:


{
  "results": [
    {"params": {"learning_rate": 0.01, "batch_size": 64}, "result": {"accuracy": 0.95}, "rank": 1},
    {"params": {"learning_rate": 0.001, "batch_size": 128}, "result": {"accuracy": 0.92}, "rank": 2}
  ],
  "best": {"params": {"learning_rate": 0.01, "batch_size": 64}, "result": {"accuracy": 0.95}, "rank": 1}
}

Key fields: parameters (map → value lists, max 1000 combos), fixed_payload, rank_by (required), rank_order ("desc" default or "asc").

Journey 4: GPU Training with Live Progress

Use when: You’re running a long training job and want real-time progress updates plus resumable checkpoints.


JOB_ID=$(curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d '{
    "type": "structured_runner",
    "runner_command": ["python", "train.py"],
    "payload": {"epochs": 100, "batch_size": 32},
    "project": "my-ml-project",
    "timeout_s": 7200,
    "requirements": {"profile": "gpu", "gpu_count": 1, "gpu_memory_mb": 16384},
    "checkpointing": {"enabled": true, "resume_from_latest": true}
  }' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
 
# Stream progress (SSE)
curl -sS -N -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/jobs/$JOB_ID/stream

The stream starts with a snapshot, then emits job, task, and event deltas. Running-task frames include live_feedback.output_tail, so you can show rolling stdout/stderr before the job finishes.

Report progress from your training script:


import json, os
 
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
resume_state = payload.get("_resume") or {}
start_epoch = resume_state.get("epoch", 0)
 
for epoch in range(start_epoch, 100):
    loss = train_one_epoch()
    # Computalot captures COMPUTALOT_PROGRESS lines and streams them to the SSE endpoint
    print(
        f"COMPUTALOT_PROGRESS:{json.dumps({'epoch': epoch, 'loss': loss, 'percent': epoch})}",
        flush=True,
    )

Save model artifacts to $COMPUTALOT_ARTIFACT_DIR:


import os, shutil
model_path = os.path.join(os.environ['COMPUTALOT_ARTIFACT_DIR'], 'model.pt')
torch.save(model.state_dict(), model_path)

Artifact IDs appear in the task result. Download: GET /api/v1/artifacts/:id.

If you include a checkpoint object in progress or result payloads, Computalot persists the latest checkpoint and injects it back into _resume on retry when checkpointing.resume_from_latest is enabled. When the checkpoint can be durably published as an artifact, task state also records fields like artifact_id, artifact_source, publish_status, and published_at, and retries rewrite _resume.checkpoint.path to the downloaded local checkpoint file automatically.

For live UIs, combine:

GET /api/v1/jobs/:id/stream — SSE updates for one job
GET /api/v1/jobs/watch?ids=id1,id2,... — one SSE connection for 2-100 jobs with terminal summaries, aggregate fields, and persistence flags
GET /api/v1/jobs/:id/tasks — per-task live_feedback, latest_progress, checkpoint, resume_state, runtime_s, health_status, plus the preserved last failed attempt while a retry is queued or running
GET /api/v1/results/:job_id — the canonical terminal result surface with per-task payload/result/output presence, completeness, and artifact IDs
GET /api/v1/jobs/:id/output — aggregated stdout/stderr that preserves the most recent failed attempt until the current attempt emits new diagnostics
GET /api/v1/jobs/:id — job-level feedback_summary and checkpoint summary

Public job/task/watch/result payloads keep submitted payload, meta, variant, aggregate fields, and artifact IDs, but they redact placement-only fields such as current_node, provider IDs, runtime paths, and image refs/digests.

Journey 5: Multi-Stage Pipelines

Use when: Step 2 depends on step 1’s output.


# Step 1: Train
JOB1=$(curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d '{
    "type": "structured_runner",
    "runner_command": ["python", "train.py"],
    "payload": {"epochs": 50, "lr": 0.001},
    "project": "my-ml-project",
    "timeout_s": 7200,
    "gpu_required": true
  }' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
 
# Step 2: Evaluate (waits for step 1)
curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d "{
    \"type\": \"structured_runner\",
    \"runner_command\": [\"python\", \"evaluate.py\"],
    \"payload\": {\"model_path\": \"/root/my-ml-project/model.pt\"},
    \"depends_on\": [\"$JOB1\"],
    \"project\": \"my-ml-project\",
    \"timeout_s\": 600
  }"

Step 2 stays queued until step 1 completes. If step 1 fails, step 2 auto-cancels.

Passing files between stages (Artifacts)


# Upload artifact after step 1
ART_ID=$(curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/octet-stream" \
  -H "X-Artifact-Filename: model.pt" \
  --data-binary @model.pt \
  https://dev.computalot.com/api/v1/artifacts | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
 
# Reference in step 2's payload

Or use _artifacts.download in the payload for automatic pre-task download:


{
  "payload": {
    "_artifacts": {"download": {"dataset": "art_abc123"}},
    "model_type": "base"
  }
}

Computalot resolves downloads before task execution (env var COMPUTALOT_ARTIFACT_<key> points to local path). For staged pipelines, downstream jobs can reference a named artifact produced by an upstream dependency instead of hardcoding an artifact ID:


{
  "depends_on": ["job_setup_123"],
  "payload": {
    "_artifacts": {
      "download": {
        "dataset": {"job_id": "job_setup_123", "artifact": "dataset"}
      }
    }
  }
}

For long-running ML or evaluation jobs, do not stop at _artifacts.download alone:

use manifest data_sources for immutable remote inputs that should be prepared before launch
use manifest cache_mounts for writable caches your code creates at runtime
if the input lives on Hugging Face and should stay read-only, prefer data_sources[].source = "huggingface" with delivery = "mount" so the worker uses hf-mount
if the runner downloads from Hugging Face itself, add a huggingface cache mount so HF_HOME and TRANSFORMERS_CACHE persist per worker; ad hoc runtime downloads do not use hf-mount automatically

For smaller coordination data, use project-scoped shared state plus dispatch-time injection:


curl -sS -X PUT \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/projects/my-project/kv/dataset_ready \
  -d '{"value": {"status": "ready"}}'


{
  "depends_on": ["job_setup_123"],
  "payload": {
    "_shared": {
      "resolve": {
        "dataset_ready": {"key": "dataset_ready"},
        "best_score": {"job_id": "job_setup_123", "path": "results.0.score"}
      }
    }
  }
}

Journey 6: Comparing Strategies (Benchmark)

Use when: You want to compare 2+ named strategies with replicas for statistical significance.


curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d '{
    "type": "benchmark",
    "runner_command": ["python", "evaluate.py"],
    "project": "my-project",
    "candidates": {
      "strategy_a": {"model": "gpt4", "temperature": 0.7},
      "strategy_b": {"model": "claude", "temperature": 0.5},
      "baseline": {"model": "random"}
    },
    "shared_payload": {"dataset": "test_set_v3", "n_trials": 100},
    "replicas": 3,
    "rank_by": "score",
    "timeout_s": 1800
  }'

Creates 9 tasks (3 candidates x 3 replicas). Each payload:


{"dataset": "test_set_v3", "n_trials": 100, "model": "gpt4", "temperature": 0.7, "_candidate": "strategy_a", "_replica": 1}

To vary a field per replica: "replica_vary": {"field": "seed_base", "stride": 1000}.

Result: leaderboard with per-candidate statistics:


{
  "leaderboard": [
    {"candidate": "strategy_a", "mean": 0.92, "std": 0.03, "min": 0.89, "max": 0.95, "count": 3, "rank": 1},
    {"candidate": "strategy_b", "mean": 0.85, "std": 0.02, "min": 0.83, "max": 0.87, "count": 3, "rank": 2},
    {"candidate": "baseline", "mean": 0.50, "std": 0.05, "min": 0.45, "max": 0.55, "count": 3, "rank": 3}
  ]
}

Key fields: candidates (map, min 2), shared_payload, replicas (default 1, max 100), rank_by (required), rank_order.

Sweep vs Benchmark: Use sweep for exploring a parameter grid. Use benchmark for comparing named alternatives with replicas for statistical confidence.

Journey 7: Monte Carlo / Simulations (Map-Reduce)

Use when: You want to split a range into chunks, process in parallel, and aggregate with operators.


curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs \
  -d '{
    "type": "map_reduce",
    "runner_command": ["python", "evaluate_seeds.py"],
    "project": "my-project",
    "payload": {"strategy": "momentum"},
    "split": {"field": "seed", "start": 0, "total": 10000, "chunks": 50},
    "reduce": {
      "total_pnl": "sum",
      "sharpe_ratio": "weighted_avg:sample_count",
      "max_drawdown": "max"
    },
    "timeout_s": 7200
  }'

Creates 50 tasks. Each gets: {"strategy": "momentum", "seed_start": 0, "seed_count": 200}.

For non-contiguous ranges, use explicit split.ranges instead of start + total + chunks:


{
  "type": "map_reduce",
  "runner_command": ["python", "evaluate_seeds.py"],
  "project": "my-project",
  "payload": {"strategy": "momentum"},
  "split": {
    "field": "seed",
    "ranges": [
      {"start": 860791000, "count": 1000},
      {"start": 200000000, "count": 1000},
      {"start": 500000000, "count": 1000}
    ]
  },
  "reduce": {
    "avg_edge": "mean"
  }
}

Your runner:


import json, os
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
results = run_simulation(payload["strategy"], payload["seed_start"], payload["seed_count"])
with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f:
    json.dump({"total_pnl": results.pnl, "sharpe_ratio": results.sharpe, "max_drawdown": results.drawdown, "sample_count": payload["seed_count"]}, f)

Result: reduced values in the job summary:


{
  "reduced": {
    "total_pnl": 15234.50,
    "sharpe_ratio": 1.87,
    "max_drawdown": 0.23
  }
}

Key fields: split ({field, start, total, chunks}), reduce (map of field → operator).

Reduce operators: sum, mean, max, min, weighted_avg:<weight_field>, concat, count, collect.

Heavy Jobs (Large Data, Checkpoints, Training)

For GB-scale datasets, large checkpoints, and long training runs:

Inputs:

Keep payload small — do not embed large data in JSON.
Use _artifacts.download for large inputs. Workers download and cache before launch.
Resolved paths: payload._artifacts.local_paths. Single-file downloads also get COMPUTALOT_ARTIFACT_<NAME> env vars.

Outputs:

Write checkpoints and files to $COMPUTALOT_ARTIFACT_DIR.
Use _artifacts.upload for named uploads or direct upload to external storage. Computalot-managed uploads now prefer direct or multipart object-store transfer and only fall back to controller relay if the direct path is unavailable.
If a JSON result is too large, Computalot spills it to an artifact and returns result_spilled, result_artifact_id, result_filename; those spill uploads also prefer the direct object-store path.

Operational:

Set timeout_s above expected runtime with margin.
Submit jobs after push; use POST /api/v1/projects/:name/init only if you want to prepare currently available workers ahead of time.
Write checkpoints and outputs to $COMPUTALOT_ARTIFACT_DIR. Use $COMPUTALOT_TASK_SCRATCH_DIR or $TMPDIR for temp files.
Use external/object storage for multi-GB datasets and model bundles.

Project Setup

Projects run as sandboxed OCI containers. The lifecycle is:

POST /api/v1/projects — register
POST /api/v1/projects/:name/push — upload tarball with Dockerfile + computalot.project.json + your code
POST /api/v1/jobs — submit work against the published revision
Optional: POST /api/v1/projects/:name/init — prepare currently available workers
GET /api/v1/projects/:name/status — inspect published vs warm state

Project init is free but requires $5 available balance
Init is asynchronous
After a push, can_accept_new_jobs can already be true even while ready_for_jobs is still false
After a code change, push the new tarball; use invalidate only if you want to discard old prepared runtimes

Project structure


my-project/
├── Dockerfile
├── computalot.project.json
├── requirements.txt
└── job.py

Dockerfile

Install dependencies in your Dockerfile:


FROM python:3.11-slim
WORKDIR /workspace
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

computalot.project.json

See the Project Manifest docs for the full schema.

Tips

Run one small smoke job after setup changes before submitting a large batch
Use manifest validation section for runtime checks (executables, files, commands)

Public project endpoints

POST /api/v1/projects
PUT /api/v1/projects/:name for metadata-only updates
POST /api/v1/projects/:name/push
POST /api/v1/projects/:name/init
POST /api/v1/projects/:name/invalidate
GET /api/v1/projects/:name/status
GET /api/v1/projects/:name/status/details
GET /api/v1/projects

Debugging failed setup


curl -sS -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/projects/my-project/status
 
curl -sS -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/projects/my-project/status/details

Fix your Dockerfile/manifest, push, invalidate, re-init:


tar czf code.tar.gz . && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" --data-binary @code.tar.gz \
  https://dev.computalot.com/api/v1/projects/my-project/push && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/projects/my-project/invalidate && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/projects/my-project/init -d '{}'

Runner Protocol (All Job Types)

All runner-based types use this contract:

Computalot launches your runner_command with task-specific payload
Payload written to temp file; path in $COMPUTALOT_TASK_PAYLOAD
Result file path in $COMPUTALOT_TASK_RESULT
Your script writes JSON result to $COMPUTALOT_TASK_RESULT
Progress: print COMPUTALOT_PROGRESS:{json} to stdout

COMPUTALOT_* env vars are the Computalot runtime protocol.

Normal stdout/stderr is surfaced live through live_feedback.output_tail on task APIs and SSE streams. If your runner wraps another process, keep the child unbuffered or flush explicitly so Computalot can forward logs promptly.

Task env order: base runtime → project env files (.computalot.env, computalot.env, .env) → project env map → meta.env overrides. If .venv/bin/python exists, Computalot prepends .venv/bin to PATH.

Exit codes: 0 = success, non-zero = failure. Last ~1000 chars captured as error (tail, not head — preserves tracebacks). Full output (up to 10KB) stored per task.

Payload varies by type:

structured_runner: the payload field. Chunk fan-out adds {start, count}.
sweep: fixed_payload + parameter combination + _sweep_idx + _sweep_params
map_reduce: payload + {field_start, field_count} chunk boundaries
benchmark: shared_payload + candidate config + {_candidate, _replica}

Allowed executables: python, python3, node, deno, bun, ruby, julia, Rscript, uv, pip, npm, npx, cargo, rustc. Shell executables (bash, sh, zsh) are blocked.

Job Lifecycle

Statuses: planning → queued → running → completed | partial | failed | cancelled

Terminal states: completed, partial, failed, cancelled
Poll: GET /api/v1/jobs/:id every 2-5s until terminal
Stream: GET /api/v1/jobs/:id/stream for SSE updates
Multi-job watch: GET /api/v1/jobs/watch?ids=id1,id2,... for one SSE stream covering 2-100 jobs
Canonical terminal results: GET /api/v1/results/:job_id
Per-task progress + retry continuity: GET /api/v1/jobs/:id/tasks
Aggregated output continuity: GET /api/v1/jobs/:id/output
Cancel: PUT /api/v1/jobs/:id/cancel with {"reason": "..."}
Auto-retry: set max_retries on submission
Jobs stuck in running with 0 active tasks auto-recover every 5 minutes
Jobs queued > 2x timeout_s (min 30 min) auto-cancel only when Computalot has live capacity; fleet-wide outages do not trigger queue-timeout cancellation by themselves
partial = some tasks failed/cancelled, OR all completed but some have low quality (< 0.5)
strict_complete is the recommended mode for research-sensitive runs where partial completion is not acceptable
Optional priority: "high" | "normal" | "low" biases scheduling between otherwise comparable jobs. Guaranteed reservations still take precedence.
Submit responses can include hold and admission metadata so the client can reason about cost before execution starts
Public job/task/watch/result payloads keep submitted payload, meta, variant, aggregate fields, and artifact IDs, but they redact placement-only fields such as current_node, provider IDs, runtime paths, and image refs/digests

Result Quality Validation

Computalot validates results and computes quality scores (0.0-1.0) per task.

Per-task: result_quality, result_warnings, result_present, output_present in GET /api/v1/results/:job_id
Per-job result surface: summary.aggregate_result, summary.aggregate_aliases, summary.completeness, and top-level aggregate_result / aggregate_aliases / completeness in GET /api/v1/results/:job_id
Per-job: summary.quality in GET /api/v1/jobs/:id with mean_quality, suspect_count, suspect_task_ids
Default schemas: sweep/benchmark require rank_by field as number; map_reduce requires all reduce fields
Custom: Add result_schema to job submission

Quality is advisory — all results are stored regardless.

Job Tags & Filtering


{"type": "sweep", "tags": ["experiment_42", "lr_search"], ...}

Filter: GET /api/v1/jobs?tag=experiment_42.

Job Priority


{"priority": "high"}

Use high for latency-sensitive fan-out or evaluation jobs that should start ahead of ordinary background work. Use low for background submissions. Jobs default to normal.

Batch Submission

Up to 200 jobs in one request:


curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs/batch \
  -d '{"jobs": [
    {"type": "structured_runner", "runner_command": ["python", "eval.py"], "project": "my-proj", "payload": {"lr": 0.001}, "tags": ["sweep_72"]},
    {"type": "structured_runner", "runner_command": ["python", "eval.py"], "project": "my-proj", "payload": {"lr": 0.01}, "tags": ["sweep_72"]}
  ]}'

Response: 201 (all ok) or 207 (partial): {jobs, submitted, errors, error_count}.

Webhook Callbacks


{"type": "structured_runner", "callback_url": "https://your-server.com/webhook", ...}

Computalot POSTs {event, job_id, status, project, type, output, error, tags, progress, completed_at} with 2 retries. The callback originates from the Computalot controller — localhost URLs only work if routable from the server.

Job Dependencies (DAG)


{"depends_on": ["job_20260312_143000_abc123"]}

Tasks not dispatched until all dependencies reach completed. If a dependency fails, dependent jobs auto-cancel.

Streaming Progress

SSE for one job


curl -sS -N -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/jobs/<job_id>/stream

Starts with snapshot, then incremental job, task, event frames. Ends with done.

Running-task frames include live_feedback.output_tail, which is the fastest public surface for live stdout/stderr.

SSE for multiple jobs


curl -sS -N -H "Authorization: Bearer $TOKEN" \
  "https://dev.computalot.com/api/v1/jobs/watch?ids=<id1>,<id2>,<id3>"

Max 100 jobs. Idle periods emit ping. Ends with done when all terminal.

Terminal job frames include client_ref, tags, meta, variant, summary, aggregate_result, aggregate_aliases, completeness, and result_persisted / output_persisted when available, so a client can often avoid a follow-up result fetch. For weighted fan-out jobs that means fields like avg_edge can be present directly in the terminal SSE payload.

SSE for a whole project


curl -sS -N -H "Authorization: Bearer $TOKEN" \
  "https://dev.computalot.com/api/v1/projects/<project>/stream"

One connection for all jobs in a project. Reconnect after 1h timeout.

Common Agent Patterns

Poll for completion:


while true; do
  STATUS=$(curl -sS -H "Authorization: Bearer $TOKEN" \
    https://dev.computalot.com/api/v1/jobs/<job_id> | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
  case $STATUS in
    completed|partial|failed|cancelled) break ;;
  esac
  sleep 5
done

Read structured results:


curl -sS -H "Authorization: Bearer $TOKEN" https://dev.computalot.com/api/v1/results/<job_id>

Cancel a job:


curl -sS -X PUT -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/jobs/<job_id>/cancel \
  -d '{"reason":"no longer needed"}'

Update project code:


tar czf code.tar.gz . && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" --data-binary @code.tar.gz \
  https://dev.computalot.com/api/v1/projects/my-project/push && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/projects/my-project/invalidate && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/projects/my-project/init -d '{}'

Debugging Failed Jobs

Job error: GET /api/v1/jobs/:id — error and recommended_action
Per-task details: GET /api/v1/jobs/:id/tasks — error, output (up to 10KB), structured failure result with failure_kind, exit_code, plus latest_progress, checkpoint, and resume_state for long-running jobs. During auto-retry, queued/running tasks keep the most recent failed attempt’s diagnostics visible until the current attempt emits its own output.
Live stream: GET /api/v1/jobs/:id/stream — SSE updates
Timeline: GET /api/v1/jobs/:id/events — state change events
Project readiness: GET /api/v1/projects/:name/status
Diagnostics: GET /api/v1/projects/:name/status/details
Billing state: GET /api/v1/account/balance, GET /api/v1/account/holds, GET /api/v1/account/ledger

Symptom	Cause	Fix
`402 Payment Required` on top-up or shortfall flow	account needs more credits	pay the returned x402 quote and retry
project init rejected before setup starts	available balance below funded floor	top up to at least `$5`, then retry init
”No native library found”	Missing system library	Fix Dockerfile, push, invalidate, re-init
exit_code_1, useless error	Truncated error	Check per-task `output` field (full 10KB).
task looks blank while retrying	current attempt has not emitted anything yet	Check `GET /api/v1/jobs/:id/output` or `GET /api/v1/jobs/:id/tasks` for the preserved last failed attempt diagnostics.
task failed before user code printed anything	worker/runtime preflight failed first	The visible `output` / `error` can be platform stderr rather than user stdout.
Cargo/Rust toolchain broken	Computalot worker issue	Wait for auto-recovery, not your code
Tasks stuck in queued	Cold start or capacity catch-up	Check project status and job diagnostics; the first job may be waiting while runtime preparation happens on demand
Project ready but tasks fail	Dockerfile missing deps or imports not checked	Fix Dockerfile, add manifest validation checks
401 / DB timeout after warmup	Credentials issue	Add auth check to manifest validation

Artifact API

Content-addressed store for passing files between jobs. Supports controller-streamed local uploads (up to 2GB), presigned direct-to-object-store uploads, resumable multipart uploads for very large files, and external URL references.


# Upload through the controller (streaming, max 2GB)
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/octet-stream" \
  -H "X-Artifact-Filename: dataset.parquet" \
  --data-binary @dataset.parquet \
  https://dev.computalot.com/api/v1/artifacts
 
# Request a presigned direct upload URL for a large file
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/artifacts/direct \
  -d '{"sha256":"<lowercase_sha256>","size":123456789,"filename":"dataset.parquet","content_type":"application/octet-stream"}'
 
# Upload bytes directly to the returned upload.url, then finalize registration
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/artifacts/direct/complete \
  -d '{"sha256":"<lowercase_sha256>","size":123456789,"filename":"dataset.parquet","content_type":"application/octet-stream"}'
 
# Start a resumable multipart upload for a very large file
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/artifacts/multipart \
  -d '{"sha256":"<lowercase_sha256>","size":9876543210,"filename":"checkpoint.safetensors","content_type":"application/octet-stream"}'
 
# After uploading parts to the returned presigned part URLs, finalize the multipart upload
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/artifacts/multipart/complete \
  -d '{"sha256":"<lowercase_sha256>","size":9876543210,"filename":"checkpoint.safetensors","content_type":"application/octet-stream","upload_id":"<upload_id>","parts":[{"part_number":1,"etag":"etag-1"}]}'
 
# Register external URL (no upload)
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/artifacts/external \
  -d '{"url": "https://s3.example.com/data.parquet", "filename": "data.parquet"}'
 
# Download
curl -sS -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/artifacts/<id> -o output.bin
 
# List
curl -sS -H "Authorization: Bearer $TOKEN" \
  https://dev.computalot.com/api/v1/artifacts

Artifacts are retained 7 days by default. Download important ones promptly. Authenticated GET /api/v1/artifacts/:id downloads stay on the public API URL and stream bytes through the controller; if you need object-store details, GET /api/v1/artifacts/:id/meta may also expose a signed object-store URL.

Feedback

Computalot is in private beta. Your feedback directly shapes the product. Please report bugs, confusing behavior, missing features, and ideas. No auth required.


# Bug report
curl -sS -X POST -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/feedback \
  -d '{"type": "bug", "title": "Brief summary", "description": "What happened, what you expected, steps to reproduce"}'
 
# Feature request
curl -sS -X POST -H "Content-Type: application/json" \
  https://dev.computalot.com/api/v1/feedback \
  -d '{"type": "feature_request", "title": "Brief summary", "description": "What you need and why"}'

Types: bug | feature_request | provisioning | job_type_request. You can optionally include "job_id" or "metadata" for additional context. Agents: when you encounter an issue or see room for improvement while using Computalot, submit feedback automatically.

Python SDK & CLI

Install the published beta wheel from a public URL:


python3 -m pip install --user --break-system-packages \
  https://dev.computalot.com/docs/downloads/computalot-0.2.0-py3-none-any.whl
export PATH="$HOME/.local/bin:$PATH"

First authenticated probe after you receive an API key or wallet session:


from computalot import ComputalotClient
 
client = ComputalotClient(
    controller_url="https://dev.computalot.com",
    token="YOUR_TOKEN",
)
 
docs = client.docs_index()
recipes = client.list_recipes()
jobs = client.list_jobs(limit=5)
 
print(docs["status"])
print(len(recipes.get("recipes", [])))
print(len(jobs.get("jobs", [])))


export COMPUTALOT_CONTROLLER_URL="https://dev.computalot.com"
export COMPUTALOT_API_TOKEN="YOUR_TOKEN"
 
computalot docs --llm
computalot jobs --limit 5
computalot job <job_id>

Once a project is ready, use the CLI submit helpers or the SDK submit_structured() / submit_job() methods shown elsewhere in this reference.

Endpoint Reference

Public Docs

Method	Path	Purpose
GET	`/docs`	Human docs landing page
GET	`/api/v1/docs`	JSON docs index
GET	`/llms.txt`	Compact reference
GET	`/llms-full.txt`	Full reference with tutorials
GET	`/api/v1/docs/python-sdk`	Python SDK guide
GET	`/api/v1/docs/workflows`	Workflow recipes
POST	`/api/v1/auth/wallet/challenge`	Create wallet auth challenge (no auth)
POST	`/api/v1/auth/wallet/verify`	Verify wallet challenge and mint session token (no auth)
POST	`/api/v1/feedback`	Submit feedback (no auth)

Ops (operator-facing)

Method	Path	Purpose
GET	`/health`	Liveness probe (no auth; same body as `/live`)
GET	`/live`	Liveness probe (no auth)
GET	`/ready`	Readiness probe (no auth; `503` until controller core is up)
GET	`/metrics`	Prometheus metrics (admin auth, dedicated metrics token, or local request)

Account & Billing

Method	Path	Purpose
GET	`/api/v1/account/balance`	Account credit summary
GET	`/api/v1/account/ledger`	Settled ledger entries
GET	`/api/v1/account/holds`	Active and historical holds
GET	`/api/v1/account/quotes`	Funding and shortfall quotes
POST	`/api/v1/account/quotes/topup`	Create x402 top-up quote (`402 Payment Required`)
POST	`/api/v1/account/quotes/:quote_id/pay/x402`	Settle x402 quote and credit the account

Job API

Method	Path	Purpose
POST	`/api/v1/jobs`	Submit a job
POST	`/api/v1/jobs/batch`	Submit up to 200 jobs
GET	`/api/v1/jobs`	List jobs (`?status=&project=&tag=&limit=50&offset=0`)
GET	`/api/v1/jobs/:id`	Full job state with `feedback_summary` and checkpoint summary
GET	`/api/v1/jobs/:id/output`	Stdout/stderr, including preserved last-failed-attempt diagnostics during retries
GET	`/api/v1/jobs/:id/tasks`	Per-task details, errors, live feedback, checkpoint state, and preserved retry diagnostics
GET	`/api/v1/jobs/:id/events`	Lifecycle events
GET	`/api/v1/jobs/:id/metrics`	Aggregate metrics
GET	`/api/v1/jobs/:id/stream`	SSE stream for one job, including task `live_feedback.output_tail` deltas
GET	`/api/v1/jobs/watch?ids=a,b,c`	SSE stream for multiple jobs (max 100, with ping keepalives, metadata, and persistence flags)
GET	`/api/v1/projects/:name/stream`	SSE stream for all jobs in a project
PUT	`/api/v1/jobs/:id/cancel`	Cancel a job

Results & Artifacts

Method	Path	Purpose
GET	`/api/v1/results/:job_id`	Per-task results plus metadata, aggregate_result, aggregate_aliases, completeness, and persistence flags; use job/task endpoints for live retry-loop diagnostics
GET	`/api/v1/results`	List recent terminal results (`?limit=20&offset=0`, paginated). Filters: `job_id`, `ids`, `project`, `client_ref`, `tag`, `user_id`, `recipe_cache_*`, `group_by`, `include_tasks`. Malformed limit/offset → 422.
POST	`/api/v1/artifacts`	Upload artifact (raw body, max 2GB)
POST	`/api/v1/artifacts/direct`	Get a presigned direct-upload URL for object storage
POST	`/api/v1/artifacts/direct/complete`	Finalize a direct-uploaded object-store artifact
POST	`/api/v1/artifacts/multipart`	Start a resumable multipart direct upload
POST	`/api/v1/artifacts/multipart/part`	Get a presigned PUT URL for one multipart chunk
GET	`/api/v1/artifacts/multipart/parts`	List uploaded multipart chunks for resume
POST	`/api/v1/artifacts/multipart/complete`	Finalize a multipart direct upload and register the artifact
POST	`/api/v1/artifacts/multipart/abort`	Abort an in-flight multipart upload
POST	`/api/v1/artifacts/external`	Register external URL artifact
GET	`/api/v1/artifacts`	List artifacts
GET	`/api/v1/artifacts/:id`	Download artifact (authenticated requests stay on the public API URL; metadata may expose a signed object-store URL)
GET	`/api/v1/artifacts/:id/meta`	Artifact metadata
DELETE	`/api/v1/artifacts/:id`	Delete artifact

Project API

Method	Path	Purpose
POST	`/api/v1/projects`	Register project (`name`, `remote_dir`; optional `env`, `setup_timeout_s`)
GET	`/api/v1/projects`	List projects
GET	`/api/v1/projects/:name`	Project config + readiness status
PUT	`/api/v1/projects/:name`	Update project metadata only (owner only)
DELETE	`/api/v1/projects/:name`	Delete project (owner only)
POST	`/api/v1/projects/:name/push`	Upload tarball (raw gzip, not multipart, max 100MB; returns 409 during active refresh, 422 for malformed tarballs or invalid manifest references, and may include tarball_diff on success)
PUT	`/api/v1/projects/:name/cancel-queued`	Cancel queued/planning jobs for one project (optional tag filter)
POST	`/api/v1/projects/:name/init`	Prepare currently available workers (async)
POST	`/api/v1/projects/:name/invalidate`	Force re-init
GET	`/api/v1/projects/:name/status`	Project readiness
GET	`/api/v1/projects/:name/status/details`	Readiness + diagnostics
GET	`/api/v1/projects/:name/kv`	List project-scoped shared state entries
PUT	`/api/v1/projects/:name/kv/:key`	Write project-scoped shared state value
GET	`/api/v1/projects/:name/kv/:key`	Read project-scoped shared state value
DELETE	`/api/v1/projects/:name/kv/:key`	Delete project-scoped shared state value
GET	`/api/v1/projects/:name/stream`	SSE stream for project jobs

Other

Method	Path	Purpose
POST	`/api/v1/auth/register`	Disabled self-service API-key issuance (`403`) with waitlist + beta-access guidance
GET	`/api/v1/leases/:job_id`	Reservation status
GET	`/api/v1/presets`	Resource presets