Jobs
A job is a unit of work you submit. Computalot breaks it into tasks scheduled across workers.
For sealed recipes, submit jobs via POST /api/v1/recipes/:name/jobs with a typed payload — no project or runner_command needed.
For projects, submit jobs via POST /api/v1/jobs with a project name, runner command, and payload.
Choosing a job type
| I want to… | Use |
|---|---|
| Run one script with JSON input/output | structured_runner |
| Run code across a list of inputs | structured_runner + fan_out.by |
| Evaluate many tiny inputs per worker task | structured_runner + fan_out.by / fan_out.items + batch_size |
| Evaluate CMA/evolutionary candidates | structured_runner + fan_out.items |
| Train a model on a GPU | structured_runner + profile: "gpu" |
| Search a parameter grid | sweep |
| Run simulations and reduce results | map_reduce |
| Compare named strategies | benchmark |
| Submit many jobs at once | POST /api/v1/jobs/batch |
When in doubt, use structured_runner.
Submitting a job
curl -sS https://computalot.com/api/v1/jobs \
-X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "structured_runner",
"runner_command": ["python3", "evaluate.py"],
"payload": {"model": "gpt-4", "dataset": "test_v3"},
"project": "my-project",
"timeout_s": 600
}'Sizing Requirements
requirements are minimums, not exact machine picks. storage_gb should cover more than dataset size: include your runtime footprint, writable caches, temp files, checkpoints, and sandbox/runtime overhead on the worker. Heavy ML runtimes often need tens of GB of free disk even before model weights are downloaded.
If one project revision runs on both lightweight CPU jobs and heavyweight GPU training jobs, split those runtimes when possible instead of sending one oversized environment everywhere. Smaller runtime footprints improve placement and reduce disk pressure on workers.
Runner protocol
- Computalot writes the task payload to a temp file
- Your script reads
$COMPUTALOT_TASK_PAYLOAD(JSON input) - Your script writes JSON to
$COMPUTALOT_TASK_RESULT(output) - Exit 0 = success, non-zero = failure
Report progress by printing to stdout:
print(f"COMPUTALOT_PROGRESS:{json.dumps({'step': 42, 'loss': 0.05})}")Normal stdout/stderr now feeds task live_feedback.output_tail and the SSE job stream promptly while the task is still running instead of waiting for the 30s heartbeat window. If your runner wraps another process, keep that child unbuffered or flush explicitly so Computalot can forward logs as they are produced.
Job lifecycle
planning - queued - running - completed | partial | failed | cancelled
- Poll:
GET /api/v1/jobs/:idevery 2-5s - Stream:
GET /api/v1/jobs/:id/streamfor SSE updates, including running-task output tails viatask.live_feedback.output_tail - Batch watch:
GET /api/v1/jobs/watch?ids=...for one SSE stream carryingclient_ref,tags,meta,variant, aggregate summary fields, and persistence flags - Project stream:
GET /api/v1/projects/:name/streamfor one project-wide feed instead of per-job polling - Per-task detail:
GET /api/v1/jobs/:id/tasksforlive_feedback,latest_progress, checkpoint/resume state, and preserved last-failed-attempt diagnostics while a retry is queued or running - Canonical terminal results:
GET /api/v1/results/:job_idfor per-task results, aggregate fields, completeness, and artifact IDs - Output continuity:
GET /api/v1/jobs/:id/outputfor aggregated stdout/stderr; during retries it preserves the most recent failed attempt until the current attempt emits new diagnostics - Artifacts:
GET /api/v1/artifactsto list files from your jobs, thenGET /api/v1/artifacts/:idto download them - Cancel:
PUT /api/v1/jobs/:id/cancel
Supported fan-out shapes
{"fan_out": {"by": "models"}}— split one payload array field into one task per item{"fan_out": {"items": [{...}, {...}]}}— provide the exact payload object for each task{"fan_out": {"chunks": 20, "range_field": "total_seeds", "total": 10000}}— split a numeric range into chunk tasks
These shapes are mutually exclusive. Mixing by, items, or chunks + total in one request returns 422; Choose exactly one fan-out shape per job. Add batch_size or batch_per_task when each fan-out item is tiny and you want one dispatched task to process multiple items locally. Batched tasks receive payload._batch metadata.
Public payload contract
Public job, task, watch, and result payloads keep your submitted payload, meta, variant, aggregate fields, and artifact IDs, but they redact placement-only fields such as current_node, provider IDs, runtime paths, and image refs/digests. Treat those public surfaces as the user contract; infrastructure placement stays internal to Computalot.
Tags, batch, webhooks, and dependencies
- Tags:
"tags": ["experiment_42"]then filter withGET /api/v1/jobs?tag=experiment_42 - Batch:
POST /api/v1/jobs/batchfor up to 200 jobs in one request; successful entries preserveindex,payload,meta, andvariant - Webhooks:
"callback_url": "https://..."for completion notifications - DAG:
"depends_on": ["job_id"]to chain jobs - Dependency artifact handoff: downstream jobs can use
_artifacts.download: {"dataset": {"job_id": "job_id", "artifact": "dataset"}}to fetch a named artifact from a completed dependency - Shared coordination:
PUT /api/v1/projects/:name/kv/:keystores small project-scoped JSON values, andpayload._shared.resolveinjects them or dependency job result paths intopayload._shared.valuesbefore dispatch