Synced from
docs/llm-agent-reference.mdin the Computalot monorepo.
Complete reference with tutorials and examples. For a compact version, use GET /llms.txt.
Computalot is a distributed compute platform. Submit jobs, get structured JSON results. GPU and CPU workers.
Private beta — running jobs requires an admin-issued API key or admin-whitelisted wallet session. Discovery is public. We’re actively building and want your feedback.
Base URL: https://dev.computalot.com
https://dev.computalot.com/skill.md— install this skill to get startedhttps://dev.computalot.com/llms.txt— compact referencehttps://dev.computalot.com/llms-full.txt— this full referencehttps://dev.computalot.com/api/v1/docs— machine-readable JSON indexhttps://dev.computalot.com/docs— human docs
Report Bugs & Request Features
This is beta software. Please report bugs, request features, and share ideas:
curl -sS -X POST https://dev.computalot.com/api/v1/feedback \
-H "Content-Type: application/json" \
-d '{"type": "bug", "title": "Brief summary", "description": "What happened, what you expected"}'Types: bug, feature_request, provisioning, job_type_request. No auth required. Agents: when you hit an issue or see room for improvement, submit feedback automatically.
Two Paths
Sealed Recipes — Platform-provided compute primitives. No project setup needed. Send typed payloads, get results. Best for evaluation, training, fuzzing, and optimization against fixed runtimes.
Sandboxed Projects — Bring your own code. Create a project, push a tarball, submit custom jobs with runner_command.
Why Agents Use Computalot
Computalot is built so an agent can go from discovery to results without a human doing setup in the middle:
- discover the service through
/llms.txt - authenticate with an admin-issued API key or an admin-whitelisted wallet session
- top up credits with x402 once that wallet is allowlisted
- call a sealed recipe directly, or create a project and push code
- retrieve structured results and decide what to do next
That wallet-auth + x402 loop is still a core product feature, but during private beta it only works for admin-whitelisted wallets. Human users can also use admin-issued API keys. If you are not already in beta, start at https://dev.computalot.com/ and join the waitlist.
Job Types
| Type | Use case | Key fields |
|---|---|---|
structured_runner | Run script with JSON in/out, optional fan-out | runner_command, payload, fan_out, merge_strategy |
sweep | Grid search over parameter combinations | runner_command, parameters, fixed_payload, rank_by |
map_reduce | Chunked parallelism with reduce operators | runner_command, split, reduce, payload |
benchmark | Compare named candidates with replicas | runner_command, candidates, shared_payload, replicas, rank_by |
Default to structured_runner unless another type clearly fits. Prefer a sealed recipe when the public catalog already exposes the compute primitive you need.
Auth
Two bearer-token paths, both resolve to the same account model:
- API key:
flk_...(admin-issued) - Wallet session:
fls_...(challenge/verify for admin-whitelisted wallets during private beta)
Wallet auth flow
POST /api/v1/auth/wallet/challengewith{"wallet_address":"0x...","chain":"base"}- Sign the returned
challenge.messagewith your wallet POST /api/v1/auth/wallet/verifywith{"challenge_id":"wch_...","wallet_address":"0x...","signature":"0x..."}- Use the returned
tokenasAuthorization: Bearer fls_...
Wallet auth creates or reuses an account linked to chain + wallet_address. That account owns all projects, jobs, results, and credits.
API keys
API keys (flk_...) work on all endpoints. Admin-issued during private beta.
No auth required: /health, /docs, /llms.txt, /llms-full.txt, /api/v1/docs/*, POST /api/v1/feedback, POST /api/v1/auth/register (returns 403 with beta guidance), POST /api/v1/auth/wallet/challenge, POST /api/v1/auth/wallet/verify.
GET /metrics is operator-gated: local requests, admin auth, or a dedicated metrics token only.
Sealed Recipes
Sealed recipes are platform-owned compute primitives with:
- a fixed tarball-backed runtime bundle
- a fixed entrypoint
- a typed payload schema
- optional artifact input slots
Use a recipe when you want a published evaluator or scorer and do not want to upload or run your own command.
Use a project when you need to bring your own code, setup, and runner_command.
Discovery endpoints:
GET /api/v1/recipesGET /api/v1/recipes/:nameGET /api/v1/docs/recipes
Current public recipes:
prop_amm- operations:
eval,eval_chunk,validate,validate_full,bpf_eval,bpf_eval_chunk,build,concavity_check - note:
validateis the fast default and skips native/BPF parity; usevalidate_fullwhen you need the slower parity check - note:
buildpublishes compiled native/BPF artifacts;eval_chunk/bpf_eval_chunkexpose deterministic seed windows - typical inputs:
strategy_reforrs_source_b64, plus seed/range fields
- operations:
packing- operations:
eval,eval_batch,feasible_optimize,basin_hopping,differential_evolution - typical inputs: inline candidate arrays or
candidate_ref/candidates_ref, plus bounded search knobs for the optimization operations
- operations:
lightgbm_train- operations:
train,cross_validate - note: shared tabular LightGBM recipe; uploads model/metrics artifacts for
trainand cross-validation metrics forcross_validate - typical inputs: immutable
dataset_ref,target_column, and bounded LightGBM hyperparameters such asseed,n_estimators,num_leaves, andfolds
- operations:
echidna- operations:
foundry_prebuilt,hardhat_prebuilt - typical inputs: immutable prebuilt
project_ref,contract, and bounded fuzzing fields such astest_mode,test_limit, andseed
- operations:
Recipe jobs do not take a user-controlled runner_command. The recipe determines the runtime and command.
Do not send type for recipe submissions unless you have a specific reason to override it. When recipe is set or you use POST /api/v1/recipes/:name/jobs, the API infers type: "structured_runner" automatically.
Example:
POST /api/v1/recipes/prop_amm/jobs
{
"payload": {
"operation": "validate",
"strategy_ref": "art_123"
},
"timeout_s": 900
}Billing
Computalot uses account credits. Jobs reserve a hold on submission and settle to actual usage on completion.
Supported beta access today is either an admin-issued API key or an admin-whitelisted wallet session. Both authenticate the same account model and the same billing surfaces.
GET /api/v1/account/balance— check creditsGET /api/v1/account/ledger— transaction historyGET /api/v1/account/holds— active holds- Project init is free but requires $5 available balance
- Fund via x402:
POST /api/v1/account/quotes/topup→ pay →POST /api/v1/account/quotes/:id/pay/x402
Account billing endpoints
Authenticated callers can inspect billing state with:
GET /api/v1/account/balanceGET /api/v1/account/ledgerGET /api/v1/account/holdsGET /api/v1/account/quotes
Billing truth lives on GET /api/v1/account/balance, GET /api/v1/account/holds, GET /api/v1/account/ledger, and GET /api/v1/account/quotes.
GET /api/v1/account/balance returns the main numbers a client should care about:
ledger_balance_usd: total credited minus settled debitsheld_usd: funds currently reserved for active jobsavailable_usd: spendable balance after holdsactive_hold_countopen_quote_count
How pricing works in practice
Users should think in terms of quotes and holds, not hidden infrastructure details.
Before a job is admitted, Computalot derives a submit-time estimate from:
- the requested job type
- planned task count and fan-out shape
- requested
requirements - requested
timeout_s - resolved
reliability_mode
That estimate becomes the hold. If the account cannot cover it, the job is rejected before it starts. Once admitted, the job is allowed to finish inside that reserved exposure instead of being killed mid-flight for routine billing reasons.
Submit-time billing summary
Job submit responses can include:
summary.billing_estimatesummary.billing_admissionsummary.billing_hold
Important fields include:
- inferred
resource_class - inferred
runtime_class - resolved
reliability_mode estimated_hold_usd- whether the account had enough available balance to admit the job
For long-running or expensive jobs, treat the submit response as the authoritative pricing signal for that exact request.
x402 funding flow
x402 is the public funding rail for autonomous wallets.
POST /api/v1/account/quotes/topupwith a requested amount such as{ "amount_usd": 5.0 }— the per-top-up cap is$10,000, submit multiple smaller top-ups for larger funding- Computalot returns
402 Payment Required - The response includes a
PAYMENT-REQUIREDheader and a top-upquote - The client pays and retries
POST /api/v1/account/quotes/:quote_id/pay/x402 - On success, the internal account balance increases and the server returns
PAYMENT-RESPONSE
If a job submit or project-init gate fails for insufficient balance, Computalot can return the same 402 shape with a shortfall quote so the client can fund the exact gap and retry.
- If
POST /api/v1/projects/:name/initreturns that shortfall quote, fund the account and retryPOST /api/v1/projects/:name/init. - If
POST /api/v1/jobsreturns that shortfall quote, fund the account and retry the same submit request.
In practice, this means an agent does not need:
- a subscription
- a stored credit card
- a pre-issued public API key
It can discover, fund, and use compute through the API itself.
Reliability mode
reliability_mode is a public submission field:
best_effortstrict_complete
Use strict_complete for research-sensitive fan-out work, sweeps, benchmarks, CMA generations, and training/evaluation batches where missing one task would invalidate the outcome.
Execution policy and placement
Jobs land in one of two execution modes, each with two placement options:
sandboxed(default) — your uploaded project code runs inside a gVisor sandbox.placement_policy = "shared"(default): reuses warm workers with other sandboxed tenants.placement_policy = "dedicated": holds a worker exclusively for your project for the reservation window.
sealed— only available for platform recipes (seeGET /api/v1/recipes). Runs on sealed workers using the recipe’s prebuilt runtime.placement_policydefaults toshared.dedicatedis available for workloads that need warm reuse.- You cannot submit
execution_policy = "sealed"with a user project — sealed is a recipe-only mode. - You cannot override a platform recipe’s
execution_policy. Submitting any value other thansealedon a platform recipe returns422.
Both modes use the same submission surface. The fields are optional — leave them out to accept the defaults (sandboxed + shared for user projects, sealed + shared for platform recipes).
Core Model
- A job is the user-visible unit of work you submit
- Tasks are the parallel execution units Computalot creates from your job
- You do not target infrastructure directly — submit jobs with a project and optional resource requirements, and Computalot handles placement
- Terminal jobs and results are queryable for 30 days
Resource Requirements & Guaranteed Capacity
Submit minimum resource needs with your job — Computalot places work on matching capacity:
{
"type": "structured_runner",
"project": "my-ml-project",
"runner_command": ["python", "train.py"],
"payload": {"epochs": 3},
"requirements": {
"cpu": 8,
"memory_mb": 16384,
"storage_gb": 40,
"gpu_count": 1,
"gpu_memory_mb": 12288,
"profile": "gpu"
},
"reservation": {
"mode": "guaranteed",
"parallelism": 2,
"guaranteed_for_s": 1800,
"max_wait_s": 0
}
}requirementsare minimums. Computalot may place on larger machines.profile:"cpu"or"gpu". CPU jobs can spill onto idle GPU capacity.reservation.mode = "best_effort"is default.reservation.mode = "guaranteed"is immediate admit-or-reject. Returns 409 if capacity unavailable.- Inspect reservation state:
GET /api/v1/leases/:job_id.
How to request capacity well:
- Ask for minimum real requirements. Oversized requests shrink eligible capacity and increase queue time.
- Use
profile: "gpu"only when the task truly needs GPU compute. - Use
guaranteedonly for short-window reserved parallelism. For normal batch work,best_effortis usually enough; the first job on a newly pushed revision may simply pay a cold-start cost. - Use
POST /api/v1/projects/:name/initonly when you explicitly want to prepare currently available workers ahead of time before a burst. - Use
reliability_mode: "strict_complete"when missing one task would corrupt the final result set. - Treat the submit response as the pricing signal for that exact request. The hold estimate is derived from this shape.
Journey 1: Sign Up → First Job → Results
This walks through the public end-to-end path from zero to a completed job using wallet auth and account credits.
This is the canonical beta onboarding path once your wallet is allowlisted or you already have an admin-issued API key: wallet sign-in, billing checks, x402 funding, project setup, job submit, and result retrieval.
Prerequisites
# Base URL
export BASE_URL="https://dev.computalot.com"
# Your wallet address
export WALLET_ADDRESS="0x1234567890abcdef1234567890abcdef12345678"If you already have an admin-issued API key, you can skip the wallet flow and set:
export TOKEN="flk_..."Otherwise, use wallet auth.
Step 1: Authenticate with your wallet
Ask for a challenge:
CHALLENGE_JSON=$(curl -sS "$BASE_URL/api/v1/auth/wallet/challenge" \
-X POST \
-H "Content-Type: application/json" \
-d "{\"wallet_address\":\"$WALLET_ADDRESS\",\"chain\":\"base\"}")
echo "$CHALLENGE_JSON"Sign challenge.message with your wallet provider or SDK, then verify:
export CHALLENGE_ID="wch_..."
export SIGNATURE="0xSIGNED_MESSAGE"
VERIFY_JSON=$(curl -sS "$BASE_URL/api/v1/auth/wallet/verify" \
-X POST \
-H "Content-Type: application/json" \
-d "{
\"challenge_id\":\"$CHALLENGE_ID\",
\"wallet_address\":\"$WALLET_ADDRESS\",
\"signature\":\"$SIGNATURE\"
}")
echo "$VERIFY_JSON"Extract the returned session token and use it for the remaining steps:
export TOKEN="fls_..."Step 2: Fund your account if needed
Check balance and inspect the same billing truth surfaces if you need more detail:
curl -sS "$BASE_URL/api/v1/account/balance" \
-H "Authorization: Bearer $TOKEN"
curl -sS "$BASE_URL/api/v1/account/holds" \
-H "Authorization: Bearer $TOKEN"
curl -sS "$BASE_URL/api/v1/account/ledger" \
-H "Authorization: Bearer $TOKEN"
curl -sS "$BASE_URL/api/v1/account/quotes" \
-H "Authorization: Bearer $TOKEN"If available_usd is below the minimum funded floor, request a top-up quote:
curl -sS "$BASE_URL/api/v1/account/quotes/topup" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"amount_usd": 5.0,
"description": "initial project setup and first job"
}'That returns 402 Payment Required. An x402-capable client should then pay and retry:
curl -sS "$BASE_URL/api/v1/account/quotes/<quote_id>/pay/x402" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
-H "PAYMENT-SIGNATURE: <x402 payment payload>"If project init or job submit later returns a shortfall quote instead of admitting the request, fund the account and retry the same blocked request.
Step 3: Register your project
curl -sS "$BASE_URL/api/v1/projects" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-ml-project",
"remote_dir": "/root/my-ml-project",
"env": {"CLICKHOUSE_HOST": "db.internal"},
"setup_timeout_s": 1200
}'remote_dir is where Computalot extracts your code. env is an optional map of runtime env vars. setup_timeout_s overrides the default 600s setup timeout.
Step 4: Create your project files
Your project needs a Dockerfile, computalot.project.json, and your code:
my-ml-project/
├── Dockerfile
├── computalot.project.json
└── job.pyProjects run as sandboxed OCI containers. See the Project Manifest docs for the full manifest schema.
FROM python:3.11-slim
WORKDIR /workspace
COPY . .# job.py
import json, os
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
result = {"status": "ok", "source": "getting-started"}
json.dump(result, open(os.environ["COMPUTALOT_TASK_RESULT"], "w"))Minimal manifest:
{
"version": 1,
"runtime": {
"kind": "oci",
"sandbox": "gvisor",
"workdir": "/workspace"
},
"entrypoint": {
"command": ["python", "job.py"]
}
}Full schema and examples: https://dev.computalot.com/docs/projects/project-manifest
Step 5: Upload the project
cd my-ml-project
tar czf ../code.tar.gz .
# Upload the tarball as the raw request body
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/push" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
--data-binary @../code.tar.gzAfter a successful push, the latest revision is published immediately. You can submit jobs right away; the first one may take longer while Computalot prepares runtime on demand.
Optional: if you want to prepare currently available workers ahead of time, call init manually:
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/init" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{}'Check status when you want to inspect published vs warm state:
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/status" \
-H "Authorization: Bearer $TOKEN"can_accept_new_jobs: true means the latest revision is published and can be submitted immediately. ready_for_jobs: true means Computalot finished platform-side runtime preparation. Neither field proves your application-level imports or credentials are valid — use manifest validation checks and run one small smoke job after setup changes.
GET /api/v1/projects/:name/status is the top-level readiness truth for the active revision. After a successful push, the new content hash is visible immediately and jobs can be submitted right away, even if that revision is still warming.
If you want the already-warm signal before a burst, wait for:
can_accept_new_jobs: trueinit_state: "ready"ready_for_jobs: true
Step 6: Submit a job
curl -sS "$BASE_URL/api/v1/jobs" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "structured_runner",
"runner_command": ["python3", "job.py"],
"payload": {"test_case": "getting-started"},
"project": "my-ml-project",
"timeout_s": 120,
"requirements": {
"cpu": 1,
"memory_mb": 256,
"profile": "cpu"
},
"reliability_mode": "strict_complete"
}'The submit response can include:
summary.billing_estimatesummary.billing_admissionsummary.billing_hold
Treat that response as the authoritative estimate for that exact request.
Step 7: Read results and billing state
# Job status
curl -sS "$BASE_URL/api/v1/jobs/<job_id>" \
-H "Authorization: Bearer $TOKEN"
# Structured results
curl -sS "$BASE_URL/api/v1/results/<job_id>" \
-H "Authorization: Bearer $TOKEN"
# Aggregated stdout/stderr
curl -sS "$BASE_URL/api/v1/jobs/<job_id>/output" \
-H "Authorization: Bearer $TOKEN"
# Billing state after the run
curl -sS "$BASE_URL/api/v1/account/balance" \
-H "Authorization: Bearer $TOKEN"
curl -sS "$BASE_URL/api/v1/account/holds" \
-H "Authorization: Bearer $TOKEN"
curl -sS "$BASE_URL/api/v1/account/ledger" \
-H "Authorization: Bearer $TOKEN"GET /api/v1/results/<job_id> is the canonical result surface. It returns raw per-task results plus top-level summary, aggregate_result, aggregate_aliases, completeness, result_persisted, and output_persisted. For weighted fan-out jobs, summary also carries alias fields like avg_edge directly plus coverage fields such as weight_field, expected_weight, completed_weight, and pending_weight. The response also preserves public submission metadata like meta and variant, and each task may include project_content_hash so you can confirm which project version produced it.
GET /api/v1/jobs/<job_id>/output is the live aggregated diagnostics surface. During auto-retry, it preserves the most recent failed attempt’s output and error until the next attempt emits its own diagnostics, so jobs do not go blank between retries. If a worker/runtime failure happens before your command starts, that visible text can be platform preflight stderr rather than user-process stdout.
Step 8: Update your project
Use PUT /api/v1/projects/:name for metadata only. For code changes, push a new tarball, invalidate, then init again:
tar czf ../code.tar.gz .
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/push" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
--data-binary @../code.tar.gz
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/invalidate" \
-X POST \
-H "Authorization: Bearer $TOKEN"
curl -sS "$BASE_URL/api/v1/projects/my-ml-project/init" \
-X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{}'If POST /api/v1/projects/:name/push returns 409, Computalot is already initializing or refreshing that project. Poll GET /api/v1/projects/:name/status until init_state is no longer initializing or refreshing, then retry the push.
If the uploaded body is gzip but not a valid tarball, POST /api/v1/projects/:name/push now returns 422 with error: "invalid tarball" instead of surfacing a generic server error.
If computalot.project.json references project files, command working directories, build inputs, or named cache mounts that do not exist, POST /api/v1/projects/:name/push now returns 422 before the new version is accepted.
When Computalot can read the previous tarball locally, a successful push response also includes tarball_diff with added_files, removed_files, and changed_files so clients can catch incomplete uploads immediately.
That same push response can already show the active-revision transition for the new code via fields such as ready_for_jobs: false, status_message, next_action, and init_status.init_state: "refreshing".
Journey 2: Fan-Out Parallelism
Use when: You want to run the same script on many different inputs in parallel — model evaluation, agent swarms, batch processing, CMA generations.
Runner script
Your script reads $COMPUTALOT_TASK_PAYLOAD and writes to $COMPUTALOT_TASK_RESULT:
# evaluate.py
import json, os
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
model_name = payload["model"]
score = run_evaluation(model_name, payload["dataset"])
with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f:
json.dump({"model": model_name, "score": score}, f)Option A: Fan-out by values
Split a list into one task per item:
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d '{
"type": "structured_runner",
"runner_command": ["python", "evaluate.py"],
"payload": {
"models": ["model-a", "model-b", "model-c", "model-d"],
"config": {"n_trials": 100}
},
"fan_out": {"by": "models", "batch_size": 2},
"merge_strategy": "keyed",
"project": "my-ml-project",
"timeout_s": 600
}'Creates 4 tasks. Each gets: {"models": "model-a", "config": {"n_trials": 100}}.
With batch_size (or batch_per_task), Computalot groups multiple fan-out items into one dispatched task and adds payload._batch metadata. For fan_out.by, the split field becomes that task’s sub-list.
Resolved shared-state aliases land in payload._shared.values, and scalar aliases are also exposed to the runner as COMPUTALOT_SHARED_<NAME> env vars.
Option B: Fan-out by explicit items (CMA / evolutionary)
One explicit payload per candidate — you control the payloads exactly:
{
"type": "structured_runner",
"runner_command": ["python", "evaluate.py"],
"fan_out": {"items": [
{"params": [0.1, 0.5, 0.3], "generation": 12},
{"params": [0.2, 0.4, 0.6], "generation": 12},
{"params": [0.3, 0.3, 0.1], "generation": 12}
]},
"project": "my-proj"
}Creates 3 tasks, one per item. The client (your optimizer) owns state between generations.
Option C: Fan-out by chunks
Split a numeric range into N chunks:
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d '{
"type": "structured_runner",
"runner_command": ["python", "simulate.py"],
"payload": {"total_seeds": 10000},
"fan_out": {"chunks": 20, "range_field": "total_seeds", "total": 10000},
"merge_strategy": "collect",
"project": "my-ml-project",
"timeout_s": 1800
}'Creates 20 tasks, each with {"start": 0, "count": 500}, etc.
Fan-out contract
Supported public fan_out shapes are:
{"fan_out": {"by": "field"}}{"fan_out": {"items": [{...}, {...}]}}{"fan_out": {"chunks": N, "total": N, ...}}
These shapes are mutually exclusive. Mixing by, items, or chunks + total in one request returns 422; Choose exactly one shape before retrying the submit.
Merge strategies
"collect"(default) — all results in a list"keyed"— results indexed by a key from each task’s payload (requiresfan_out.by)"weighted_avg"— weighted average of a numeric field (set bothvalue_fieldandweight_field)
Result quality
Each task result includes result_quality (0.0–1.0) and result_warnings. You can define custom validation with result_schema:
{
"result_schema": {
"required_fields": ["score"],
"field_types": {"score": "number"},
"field_ranges": {"score": [0.0, 1.0]}
}
}Journey 3: Parameter Search (Sweep)
Use when: You want to try every combination of parameters and rank results. Grid search, hyperparameter tuning.
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d '{
"type": "sweep",
"runner_command": ["python", "evaluate.py"],
"project": "my-ml-project",
"parameters": {
"learning_rate": [0.001, 0.01, 0.1],
"batch_size": [32, 64, 128]
},
"fixed_payload": {"dataset": "cifar10", "epochs": 5},
"rank_by": "accuracy",
"rank_order": "desc",
"timeout_s": 3600
}'Creates 9 tasks (3x3 cartesian product). Each task’s payload:
{"learning_rate": 0.001, "batch_size": 32, "dataset": "cifar10", "epochs": 5, "_sweep_idx": 0, "_sweep_params": {"learning_rate": 0.001, "batch_size": 32}}Your runner writes the rank_by field to $COMPUTALOT_TASK_RESULT:
import json, os
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
accuracy = train_and_evaluate(payload["learning_rate"], payload["batch_size"])
with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f:
json.dump({"accuracy": accuracy}, f)Result: a ranked leaderboard in the job summary:
{
"results": [
{"params": {"learning_rate": 0.01, "batch_size": 64}, "result": {"accuracy": 0.95}, "rank": 1},
{"params": {"learning_rate": 0.001, "batch_size": 128}, "result": {"accuracy": 0.92}, "rank": 2}
],
"best": {"params": {"learning_rate": 0.01, "batch_size": 64}, "result": {"accuracy": 0.95}, "rank": 1}
}Key fields: parameters (map → value lists, max 1000 combos), fixed_payload, rank_by (required), rank_order ("desc" default or "asc").
Journey 4: GPU Training with Live Progress
Use when: You’re running a long training job and want real-time progress updates plus resumable checkpoints.
JOB_ID=$(curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d '{
"type": "structured_runner",
"runner_command": ["python", "train.py"],
"payload": {"epochs": 100, "batch_size": 32},
"project": "my-ml-project",
"timeout_s": 7200,
"requirements": {"profile": "gpu", "gpu_count": 1, "gpu_memory_mb": 16384},
"checkpointing": {"enabled": true, "resume_from_latest": true}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
# Stream progress (SSE)
curl -sS -N -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/jobs/$JOB_ID/streamThe stream starts with a snapshot, then emits job, task, and event deltas. Running-task frames include live_feedback.output_tail, so you can show rolling stdout/stderr before the job finishes.
Report progress from your training script:
import json, os
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
resume_state = payload.get("_resume") or {}
start_epoch = resume_state.get("epoch", 0)
for epoch in range(start_epoch, 100):
loss = train_one_epoch()
# Computalot captures COMPUTALOT_PROGRESS lines and streams them to the SSE endpoint
print(
f"COMPUTALOT_PROGRESS:{json.dumps({'epoch': epoch, 'loss': loss, 'percent': epoch})}",
flush=True,
)Save model artifacts to $COMPUTALOT_ARTIFACT_DIR:
import os, shutil
model_path = os.path.join(os.environ['COMPUTALOT_ARTIFACT_DIR'], 'model.pt')
torch.save(model.state_dict(), model_path)Artifact IDs appear in the task result. Download: GET /api/v1/artifacts/:id.
If you include a checkpoint object in progress or result payloads, Computalot persists the latest checkpoint and injects it back into _resume on retry when checkpointing.resume_from_latest is enabled. When the checkpoint can be durably published as an artifact, task state also records fields like artifact_id, artifact_source, publish_status, and published_at, and retries rewrite _resume.checkpoint.path to the downloaded local checkpoint file automatically.
For live UIs, combine:
GET /api/v1/jobs/:id/stream— SSE updates for one jobGET /api/v1/jobs/watch?ids=id1,id2,...— one SSE connection for 2-100 jobs with terminal summaries, aggregate fields, and persistence flagsGET /api/v1/jobs/:id/tasks— per-tasklive_feedback,latest_progress,checkpoint,resume_state,runtime_s,health_status, plus the preserved last failed attempt while a retry is queued or runningGET /api/v1/results/:job_id— the canonical terminal result surface with per-task payload/result/output presence, completeness, and artifact IDsGET /api/v1/jobs/:id/output— aggregated stdout/stderr that preserves the most recent failed attempt until the current attempt emits new diagnosticsGET /api/v1/jobs/:id— job-levelfeedback_summaryand checkpoint summary
Public job/task/watch/result payloads keep submitted payload, meta, variant, aggregate fields, and artifact IDs, but they redact placement-only fields such as current_node, provider IDs, runtime paths, and image refs/digests.
Journey 5: Multi-Stage Pipelines
Use when: Step 2 depends on step 1’s output.
# Step 1: Train
JOB1=$(curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d '{
"type": "structured_runner",
"runner_command": ["python", "train.py"],
"payload": {"epochs": 50, "lr": 0.001},
"project": "my-ml-project",
"timeout_s": 7200,
"gpu_required": true
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
# Step 2: Evaluate (waits for step 1)
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d "{
\"type\": \"structured_runner\",
\"runner_command\": [\"python\", \"evaluate.py\"],
\"payload\": {\"model_path\": \"/root/my-ml-project/model.pt\"},
\"depends_on\": [\"$JOB1\"],
\"project\": \"my-ml-project\",
\"timeout_s\": 600
}"Step 2 stays queued until step 1 completes. If step 1 fails, step 2 auto-cancels.
Passing files between stages (Artifacts)
# Upload artifact after step 1
ART_ID=$(curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/octet-stream" \
-H "X-Artifact-Filename: model.pt" \
--data-binary @model.pt \
https://dev.computalot.com/api/v1/artifacts | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
# Reference in step 2's payloadOr use _artifacts.download in the payload for automatic pre-task download:
{
"payload": {
"_artifacts": {"download": {"dataset": "art_abc123"}},
"model_type": "base"
}
}Computalot resolves downloads before task execution (env var COMPUTALOT_ARTIFACT_<key> points to local path).
For staged pipelines, downstream jobs can reference a named artifact produced by an upstream dependency instead of hardcoding an artifact ID:
{
"depends_on": ["job_setup_123"],
"payload": {
"_artifacts": {
"download": {
"dataset": {"job_id": "job_setup_123", "artifact": "dataset"}
}
}
}
}For long-running ML or evaluation jobs, do not stop at _artifacts.download alone:
- use manifest
data_sourcesfor immutable remote inputs that should be prepared before launch - use manifest
cache_mountsfor writable caches your code creates at runtime - if the input lives on Hugging Face and should stay read-only, prefer
data_sources[].source = "huggingface"withdelivery = "mount"so the worker useshf-mount - if the runner downloads from Hugging Face itself, add a
huggingfacecache mount soHF_HOMEandTRANSFORMERS_CACHEpersist per worker; ad hoc runtime downloads do not usehf-mountautomatically
For smaller coordination data, use project-scoped shared state plus dispatch-time injection:
curl -sS -X PUT \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/projects/my-project/kv/dataset_ready \
-d '{"value": {"status": "ready"}}'{
"depends_on": ["job_setup_123"],
"payload": {
"_shared": {
"resolve": {
"dataset_ready": {"key": "dataset_ready"},
"best_score": {"job_id": "job_setup_123", "path": "results.0.score"}
}
}
}
}Journey 6: Comparing Strategies (Benchmark)
Use when: You want to compare 2+ named strategies with replicas for statistical significance.
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d '{
"type": "benchmark",
"runner_command": ["python", "evaluate.py"],
"project": "my-project",
"candidates": {
"strategy_a": {"model": "gpt4", "temperature": 0.7},
"strategy_b": {"model": "claude", "temperature": 0.5},
"baseline": {"model": "random"}
},
"shared_payload": {"dataset": "test_set_v3", "n_trials": 100},
"replicas": 3,
"rank_by": "score",
"timeout_s": 1800
}'Creates 9 tasks (3 candidates x 3 replicas). Each payload:
{"dataset": "test_set_v3", "n_trials": 100, "model": "gpt4", "temperature": 0.7, "_candidate": "strategy_a", "_replica": 1}To vary a field per replica: "replica_vary": {"field": "seed_base", "stride": 1000}.
Result: leaderboard with per-candidate statistics:
{
"leaderboard": [
{"candidate": "strategy_a", "mean": 0.92, "std": 0.03, "min": 0.89, "max": 0.95, "count": 3, "rank": 1},
{"candidate": "strategy_b", "mean": 0.85, "std": 0.02, "min": 0.83, "max": 0.87, "count": 3, "rank": 2},
{"candidate": "baseline", "mean": 0.50, "std": 0.05, "min": 0.45, "max": 0.55, "count": 3, "rank": 3}
]
}Key fields: candidates (map, min 2), shared_payload, replicas (default 1, max 100), rank_by (required), rank_order.
Sweep vs Benchmark: Use sweep for exploring a parameter grid. Use benchmark for comparing named alternatives with replicas for statistical confidence.
Journey 7: Monte Carlo / Simulations (Map-Reduce)
Use when: You want to split a range into chunks, process in parallel, and aggregate with operators.
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs \
-d '{
"type": "map_reduce",
"runner_command": ["python", "evaluate_seeds.py"],
"project": "my-project",
"payload": {"strategy": "momentum"},
"split": {"field": "seed", "start": 0, "total": 10000, "chunks": 50},
"reduce": {
"total_pnl": "sum",
"sharpe_ratio": "weighted_avg:sample_count",
"max_drawdown": "max"
},
"timeout_s": 7200
}'Creates 50 tasks. Each gets: {"strategy": "momentum", "seed_start": 0, "seed_count": 200}.
For non-contiguous ranges, use explicit split.ranges instead of start + total + chunks:
{
"type": "map_reduce",
"runner_command": ["python", "evaluate_seeds.py"],
"project": "my-project",
"payload": {"strategy": "momentum"},
"split": {
"field": "seed",
"ranges": [
{"start": 860791000, "count": 1000},
{"start": 200000000, "count": 1000},
{"start": 500000000, "count": 1000}
]
},
"reduce": {
"avg_edge": "mean"
}
}Your runner:
import json, os
payload = json.load(open(os.environ["COMPUTALOT_TASK_PAYLOAD"]))
results = run_simulation(payload["strategy"], payload["seed_start"], payload["seed_count"])
with open(os.environ["COMPUTALOT_TASK_RESULT"], "w") as f:
json.dump({"total_pnl": results.pnl, "sharpe_ratio": results.sharpe, "max_drawdown": results.drawdown, "sample_count": payload["seed_count"]}, f)Result: reduced values in the job summary:
{
"reduced": {
"total_pnl": 15234.50,
"sharpe_ratio": 1.87,
"max_drawdown": 0.23
}
}Key fields: split ({field, start, total, chunks}), reduce (map of field → operator).
Reduce operators: sum, mean, max, min, weighted_avg:<weight_field>, concat, count, collect.
Heavy Jobs (Large Data, Checkpoints, Training)
For GB-scale datasets, large checkpoints, and long training runs:
Inputs:
- Keep
payloadsmall — do not embed large data in JSON. - Use
_artifacts.downloadfor large inputs. Workers download and cache before launch. - Resolved paths:
payload._artifacts.local_paths. Single-file downloads also getCOMPUTALOT_ARTIFACT_<NAME>env vars.
Outputs:
- Write checkpoints and files to
$COMPUTALOT_ARTIFACT_DIR. - Use
_artifacts.uploadfor named uploads or direct upload to external storage. Computalot-managed uploads now prefer direct or multipart object-store transfer and only fall back to controller relay if the direct path is unavailable. - If a JSON result is too large, Computalot spills it to an artifact and returns
result_spilled,result_artifact_id,result_filename; those spill uploads also prefer the direct object-store path.
Operational:
- Set
timeout_sabove expected runtime with margin. - Submit jobs after push; use
POST /api/v1/projects/:name/initonly if you want to prepare currently available workers ahead of time. - Write checkpoints and outputs to
$COMPUTALOT_ARTIFACT_DIR. Use$COMPUTALOT_TASK_SCRATCH_DIRor$TMPDIRfor temp files. - Use external/object storage for multi-GB datasets and model bundles.
Project Setup
Projects run as sandboxed OCI containers. The lifecycle is:
POST /api/v1/projects— registerPOST /api/v1/projects/:name/push— upload tarball with Dockerfile +computalot.project.json+ your codePOST /api/v1/jobs— submit work against the published revision- Optional:
POST /api/v1/projects/:name/init— prepare currently available workers GET /api/v1/projects/:name/status— inspect published vs warm state
- Project init is free but requires $5 available balance
- Init is asynchronous
- After a push,
can_accept_new_jobscan already be true even whileready_for_jobsis still false - After a code change, push the new tarball; use
invalidateonly if you want to discard old prepared runtimes
Project structure
my-project/
├── Dockerfile
├── computalot.project.json
├── requirements.txt
└── job.pyDockerfile
Install dependencies in your Dockerfile:
FROM python:3.11-slim
WORKDIR /workspace
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .computalot.project.json
See the Project Manifest docs for the full schema.
Tips
- Run one small smoke job after setup changes before submitting a large batch
- Use manifest
validationsection for runtime checks (executables, files, commands)
Public project endpoints
POST /api/v1/projectsPUT /api/v1/projects/:namefor metadata-only updatesPOST /api/v1/projects/:name/pushPOST /api/v1/projects/:name/initPOST /api/v1/projects/:name/invalidateGET /api/v1/projects/:name/statusGET /api/v1/projects/:name/status/detailsGET /api/v1/projects
Debugging failed setup
curl -sS -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/projects/my-project/status
curl -sS -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/projects/my-project/status/detailsFix your Dockerfile/manifest, push, invalidate, re-init:
tar czf code.tar.gz . && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" --data-binary @code.tar.gz \
https://dev.computalot.com/api/v1/projects/my-project/push && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/projects/my-project/invalidate && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/projects/my-project/init -d '{}'Runner Protocol (All Job Types)
All runner-based types use this contract:
- Computalot launches your
runner_commandwith task-specific payload - Payload written to temp file; path in
$COMPUTALOT_TASK_PAYLOAD - Result file path in
$COMPUTALOT_TASK_RESULT - Your script writes JSON result to
$COMPUTALOT_TASK_RESULT - Progress: print
COMPUTALOT_PROGRESS:{json}to stdout
COMPUTALOT_*env vars are the Computalot runtime protocol.
Normal stdout/stderr is surfaced live through live_feedback.output_tail on task APIs and SSE streams. If your runner wraps another process, keep the child unbuffered or flush explicitly so Computalot can forward logs promptly.
Task env order: base runtime → project env files (.computalot.env, computalot.env, .env) → project env map → meta.env overrides. If .venv/bin/python exists, Computalot prepends .venv/bin to PATH.
Exit codes: 0 = success, non-zero = failure. Last ~1000 chars captured as error (tail, not head — preserves tracebacks). Full output (up to 10KB) stored per task.
Payload varies by type:
- structured_runner: the
payloadfield. Chunk fan-out adds{start, count}. - sweep:
fixed_payload+ parameter combination +_sweep_idx+_sweep_params - map_reduce:
payload+{field_start, field_count}chunk boundaries - benchmark:
shared_payload+ candidate config +{_candidate, _replica}
Allowed executables: python, python3, node, deno, bun, ruby, julia, Rscript, uv, pip, npm, npx, cargo, rustc. Shell executables (bash, sh, zsh) are blocked.
Job Lifecycle
Statuses: planning → queued → running → completed | partial | failed | cancelled
- Terminal states:
completed,partial,failed,cancelled - Poll:
GET /api/v1/jobs/:idevery 2-5s until terminal - Stream:
GET /api/v1/jobs/:id/streamfor SSE updates - Multi-job watch:
GET /api/v1/jobs/watch?ids=id1,id2,...for one SSE stream covering 2-100 jobs - Canonical terminal results:
GET /api/v1/results/:job_id - Per-task progress + retry continuity:
GET /api/v1/jobs/:id/tasks - Aggregated output continuity:
GET /api/v1/jobs/:id/output - Cancel:
PUT /api/v1/jobs/:id/cancelwith{"reason": "..."} - Auto-retry: set
max_retrieson submission - Jobs stuck in
runningwith 0 active tasks auto-recover every 5 minutes - Jobs queued > 2x
timeout_s(min 30 min) auto-cancel only when Computalot has live capacity; fleet-wide outages do not trigger queue-timeout cancellation by themselves partial= some tasks failed/cancelled, OR all completed but some have low quality (< 0.5)strict_completeis the recommended mode for research-sensitive runs where partial completion is not acceptable- Optional
priority: "high" | "normal" | "low"biases scheduling between otherwise comparable jobs. Guaranteed reservations still take precedence. - Submit responses can include hold and admission metadata so the client can reason about cost before execution starts
- Public job/task/watch/result payloads keep submitted
payload,meta,variant, aggregate fields, and artifact IDs, but they redact placement-only fields such ascurrent_node, provider IDs, runtime paths, and image refs/digests
Result Quality Validation
Computalot validates results and computes quality scores (0.0-1.0) per task.
- Per-task:
result_quality,result_warnings,result_present,output_presentinGET /api/v1/results/:job_id - Per-job result surface:
summary.aggregate_result,summary.aggregate_aliases,summary.completeness, and top-levelaggregate_result/aggregate_aliases/completenessinGET /api/v1/results/:job_id - Per-job:
summary.qualityinGET /api/v1/jobs/:idwithmean_quality,suspect_count,suspect_task_ids - Default schemas: sweep/benchmark require
rank_byfield as number; map_reduce requires allreducefields - Custom: Add
result_schemato job submission
Quality is advisory — all results are stored regardless.
Job Tags & Filtering
{"type": "sweep", "tags": ["experiment_42", "lr_search"], ...}Filter: GET /api/v1/jobs?tag=experiment_42.
Job Priority
{"priority": "high"}Use high for latency-sensitive fan-out or evaluation jobs that should start ahead of ordinary background work. Use low for background submissions. Jobs default to normal.
Batch Submission
Up to 200 jobs in one request:
curl -sS -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs/batch \
-d '{"jobs": [
{"type": "structured_runner", "runner_command": ["python", "eval.py"], "project": "my-proj", "payload": {"lr": 0.001}, "tags": ["sweep_72"]},
{"type": "structured_runner", "runner_command": ["python", "eval.py"], "project": "my-proj", "payload": {"lr": 0.01}, "tags": ["sweep_72"]}
]}'Response: 201 (all ok) or 207 (partial): {jobs, submitted, errors, error_count}.
Webhook Callbacks
{"type": "structured_runner", "callback_url": "https://your-server.com/webhook", ...}Computalot POSTs {event, job_id, status, project, type, output, error, tags, progress, completed_at} with 2 retries. The callback originates from the Computalot controller — localhost URLs only work if routable from the server.
Job Dependencies (DAG)
{"depends_on": ["job_20260312_143000_abc123"]}Tasks not dispatched until all dependencies reach completed. If a dependency fails, dependent jobs auto-cancel.
Streaming Progress
SSE for one job
curl -sS -N -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/jobs/<job_id>/streamStarts with snapshot, then incremental job, task, event frames. Ends with done.
Running-task frames include live_feedback.output_tail, which is the fastest public surface for live stdout/stderr.
SSE for multiple jobs
curl -sS -N -H "Authorization: Bearer $TOKEN" \
"https://dev.computalot.com/api/v1/jobs/watch?ids=<id1>,<id2>,<id3>"Max 100 jobs. Idle periods emit ping. Ends with done when all terminal.
Terminal job frames include client_ref, tags, meta, variant, summary, aggregate_result, aggregate_aliases, completeness, and result_persisted / output_persisted when available, so a client can often avoid a follow-up result fetch. For weighted fan-out jobs that means fields like avg_edge can be present directly in the terminal SSE payload.
SSE for a whole project
curl -sS -N -H "Authorization: Bearer $TOKEN" \
"https://dev.computalot.com/api/v1/projects/<project>/stream"One connection for all jobs in a project. Reconnect after 1h timeout.
Common Agent Patterns
Poll for completion:
while true; do
STATUS=$(curl -sS -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/jobs/<job_id> | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
case $STATUS in
completed|partial|failed|cancelled) break ;;
esac
sleep 5
doneRead structured results:
curl -sS -H "Authorization: Bearer $TOKEN" https://dev.computalot.com/api/v1/results/<job_id>Cancel a job:
curl -sS -X PUT -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/jobs/<job_id>/cancel \
-d '{"reason":"no longer needed"}'Update project code:
tar czf code.tar.gz . && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" --data-binary @code.tar.gz \
https://dev.computalot.com/api/v1/projects/my-project/push && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/projects/my-project/invalidate && \
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/projects/my-project/init -d '{}'Debugging Failed Jobs
- Job error:
GET /api/v1/jobs/:id—errorandrecommended_action - Per-task details:
GET /api/v1/jobs/:id/tasks—error,output(up to 10KB), structured failureresultwithfailure_kind,exit_code, pluslatest_progress,checkpoint, andresume_statefor long-running jobs. During auto-retry, queued/running tasks keep the most recent failed attempt’s diagnostics visible until the current attempt emits its own output. - Live stream:
GET /api/v1/jobs/:id/stream— SSE updates - Timeline:
GET /api/v1/jobs/:id/events— state change events - Project readiness:
GET /api/v1/projects/:name/status - Diagnostics:
GET /api/v1/projects/:name/status/details - Billing state:
GET /api/v1/account/balance,GET /api/v1/account/holds,GET /api/v1/account/ledger
| Symptom | Cause | Fix |
|---|---|---|
402 Payment Required on top-up or shortfall flow | account needs more credits | pay the returned x402 quote and retry |
| project init rejected before setup starts | available balance below funded floor | top up to at least $5, then retry init |
| ”No native library found” | Missing system library | Fix Dockerfile, push, invalidate, re-init |
| exit_code_1, useless error | Truncated error | Check per-task output field (full 10KB). |
| task looks blank while retrying | current attempt has not emitted anything yet | Check GET /api/v1/jobs/:id/output or GET /api/v1/jobs/:id/tasks for the preserved last failed attempt diagnostics. |
| task failed before user code printed anything | worker/runtime preflight failed first | The visible output / error can be platform stderr rather than user stdout. |
| Cargo/Rust toolchain broken | Computalot worker issue | Wait for auto-recovery, not your code |
| Tasks stuck in queued | Cold start or capacity catch-up | Check project status and job diagnostics; the first job may be waiting while runtime preparation happens on demand |
| Project ready but tasks fail | Dockerfile missing deps or imports not checked | Fix Dockerfile, add manifest validation checks |
| 401 / DB timeout after warmup | Credentials issue | Add auth check to manifest validation |
Artifact API
Content-addressed store for passing files between jobs. Supports controller-streamed local uploads (up to 2GB), presigned direct-to-object-store uploads, resumable multipart uploads for very large files, and external URL references.
# Upload through the controller (streaming, max 2GB)
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/octet-stream" \
-H "X-Artifact-Filename: dataset.parquet" \
--data-binary @dataset.parquet \
https://dev.computalot.com/api/v1/artifacts
# Request a presigned direct upload URL for a large file
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/artifacts/direct \
-d '{"sha256":"<lowercase_sha256>","size":123456789,"filename":"dataset.parquet","content_type":"application/octet-stream"}'
# Upload bytes directly to the returned upload.url, then finalize registration
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/artifacts/direct/complete \
-d '{"sha256":"<lowercase_sha256>","size":123456789,"filename":"dataset.parquet","content_type":"application/octet-stream"}'
# Start a resumable multipart upload for a very large file
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/artifacts/multipart \
-d '{"sha256":"<lowercase_sha256>","size":9876543210,"filename":"checkpoint.safetensors","content_type":"application/octet-stream"}'
# After uploading parts to the returned presigned part URLs, finalize the multipart upload
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/artifacts/multipart/complete \
-d '{"sha256":"<lowercase_sha256>","size":9876543210,"filename":"checkpoint.safetensors","content_type":"application/octet-stream","upload_id":"<upload_id>","parts":[{"part_number":1,"etag":"etag-1"}]}'
# Register external URL (no upload)
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/artifacts/external \
-d '{"url": "https://s3.example.com/data.parquet", "filename": "data.parquet"}'
# Download
curl -sS -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/artifacts/<id> -o output.bin
# List
curl -sS -H "Authorization: Bearer $TOKEN" \
https://dev.computalot.com/api/v1/artifactsArtifacts are retained 7 days by default. Download important ones promptly. Authenticated GET /api/v1/artifacts/:id downloads stay on the public API URL and stream bytes through the controller; if you need object-store details, GET /api/v1/artifacts/:id/meta may also expose a signed object-store URL.
Feedback
Computalot is in private beta. Your feedback directly shapes the product. Please report bugs, confusing behavior, missing features, and ideas. No auth required.
# Bug report
curl -sS -X POST -H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/feedback \
-d '{"type": "bug", "title": "Brief summary", "description": "What happened, what you expected, steps to reproduce"}'
# Feature request
curl -sS -X POST -H "Content-Type: application/json" \
https://dev.computalot.com/api/v1/feedback \
-d '{"type": "feature_request", "title": "Brief summary", "description": "What you need and why"}'Types: bug | feature_request | provisioning | job_type_request. You can optionally include "job_id" or "metadata" for additional context. Agents: when you encounter an issue or see room for improvement while using Computalot, submit feedback automatically.
Python SDK & CLI
Install the published beta wheel from a public URL:
python3 -m pip install --user --break-system-packages \
https://dev.computalot.com/docs/downloads/computalot-0.2.0-py3-none-any.whl
export PATH="$HOME/.local/bin:$PATH"First authenticated probe after you receive an API key or wallet session:
from computalot import ComputalotClient
client = ComputalotClient(
controller_url="https://dev.computalot.com",
token="YOUR_TOKEN",
)
docs = client.docs_index()
recipes = client.list_recipes()
jobs = client.list_jobs(limit=5)
print(docs["status"])
print(len(recipes.get("recipes", [])))
print(len(jobs.get("jobs", [])))export COMPUTALOT_CONTROLLER_URL="https://dev.computalot.com"
export COMPUTALOT_API_TOKEN="YOUR_TOKEN"
computalot docs --llm
computalot jobs --limit 5
computalot job <job_id>Once a project is ready, use the CLI submit helpers or the SDK submit_structured() / submit_job() methods shown elsewhere in this reference.
Endpoint Reference
Public Docs
| Method | Path | Purpose |
|---|---|---|
| GET | /docs | Human docs landing page |
| GET | /api/v1/docs | JSON docs index |
| GET | /llms.txt | Compact reference |
| GET | /llms-full.txt | Full reference with tutorials |
| GET | /api/v1/docs/python-sdk | Python SDK guide |
| GET | /api/v1/docs/workflows | Workflow recipes |
| POST | /api/v1/auth/wallet/challenge | Create wallet auth challenge (no auth) |
| POST | /api/v1/auth/wallet/verify | Verify wallet challenge and mint session token (no auth) |
| POST | /api/v1/feedback | Submit feedback (no auth) |
Ops (operator-facing)
| Method | Path | Purpose |
|---|---|---|
| GET | /health | Liveness probe (no auth; same body as /live) |
| GET | /live | Liveness probe (no auth) |
| GET | /ready | Readiness probe (no auth; 503 until controller core is up) |
| GET | /metrics | Prometheus metrics (admin auth, dedicated metrics token, or local request) |
Account & Billing
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/account/balance | Account credit summary |
| GET | /api/v1/account/ledger | Settled ledger entries |
| GET | /api/v1/account/holds | Active and historical holds |
| GET | /api/v1/account/quotes | Funding and shortfall quotes |
| POST | /api/v1/account/quotes/topup | Create x402 top-up quote (402 Payment Required) |
| POST | /api/v1/account/quotes/:quote_id/pay/x402 | Settle x402 quote and credit the account |
Job API
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/jobs | Submit a job |
| POST | /api/v1/jobs/batch | Submit up to 200 jobs |
| GET | /api/v1/jobs | List jobs (?status=&project=&tag=&limit=50&offset=0) |
| GET | /api/v1/jobs/:id | Full job state with feedback_summary and checkpoint summary |
| GET | /api/v1/jobs/:id/output | Stdout/stderr, including preserved last-failed-attempt diagnostics during retries |
| GET | /api/v1/jobs/:id/tasks | Per-task details, errors, live feedback, checkpoint state, and preserved retry diagnostics |
| GET | /api/v1/jobs/:id/events | Lifecycle events |
| GET | /api/v1/jobs/:id/metrics | Aggregate metrics |
| GET | /api/v1/jobs/:id/stream | SSE stream for one job, including task live_feedback.output_tail deltas |
| GET | /api/v1/jobs/watch?ids=a,b,c | SSE stream for multiple jobs (max 100, with ping keepalives, metadata, and persistence flags) |
| GET | /api/v1/projects/:name/stream | SSE stream for all jobs in a project |
| PUT | /api/v1/jobs/:id/cancel | Cancel a job |
Results & Artifacts
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/results/:job_id | Per-task results plus metadata, aggregate_result, aggregate_aliases, completeness, and persistence flags; use job/task endpoints for live retry-loop diagnostics |
| GET | /api/v1/results | List recent terminal results (?limit=20&offset=0, paginated). Filters: job_id, ids, project, client_ref, tag, user_id, recipe_cache_*, group_by, include_tasks. Malformed limit/offset → 422. |
| POST | /api/v1/artifacts | Upload artifact (raw body, max 2GB) |
| POST | /api/v1/artifacts/direct | Get a presigned direct-upload URL for object storage |
| POST | /api/v1/artifacts/direct/complete | Finalize a direct-uploaded object-store artifact |
| POST | /api/v1/artifacts/multipart | Start a resumable multipart direct upload |
| POST | /api/v1/artifacts/multipart/part | Get a presigned PUT URL for one multipart chunk |
| GET | /api/v1/artifacts/multipart/parts | List uploaded multipart chunks for resume |
| POST | /api/v1/artifacts/multipart/complete | Finalize a multipart direct upload and register the artifact |
| POST | /api/v1/artifacts/multipart/abort | Abort an in-flight multipart upload |
| POST | /api/v1/artifacts/external | Register external URL artifact |
| GET | /api/v1/artifacts | List artifacts |
| GET | /api/v1/artifacts/:id | Download artifact (authenticated requests stay on the public API URL; metadata may expose a signed object-store URL) |
| GET | /api/v1/artifacts/:id/meta | Artifact metadata |
| DELETE | /api/v1/artifacts/:id | Delete artifact |
Project API
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/projects | Register project (name, remote_dir; optional env, setup_timeout_s) |
| GET | /api/v1/projects | List projects |
| GET | /api/v1/projects/:name | Project config + readiness status |
| PUT | /api/v1/projects/:name | Update project metadata only (owner only) |
| DELETE | /api/v1/projects/:name | Delete project (owner only) |
| POST | /api/v1/projects/:name/push | Upload tarball (raw gzip, not multipart, max 100MB; returns 409 during active refresh, 422 for malformed tarballs or invalid manifest references, and may include tarball_diff on success) |
| PUT | /api/v1/projects/:name/cancel-queued | Cancel queued/planning jobs for one project (optional tag filter) |
| POST | /api/v1/projects/:name/init | Prepare currently available workers (async) |
| POST | /api/v1/projects/:name/invalidate | Force re-init |
| GET | /api/v1/projects/:name/status | Project readiness |
| GET | /api/v1/projects/:name/status/details | Readiness + diagnostics |
| GET | /api/v1/projects/:name/kv | List project-scoped shared state entries |
| PUT | /api/v1/projects/:name/kv/:key | Write project-scoped shared state value |
| GET | /api/v1/projects/:name/kv/:key | Read project-scoped shared state value |
| DELETE | /api/v1/projects/:name/kv/:key | Delete project-scoped shared state value |
| GET | /api/v1/projects/:name/stream | SSE stream for project jobs |
Other
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/auth/register | Disabled self-service API-key issuance (403) with waitlist + beta-access guidance |
| GET | /api/v1/leases/:job_id | Reservation status |
| GET | /api/v1/presets | Resource presets |