Jobs

A job is a unit of work you submit. Computalot breaks it into tasks scheduled across workers.

For sealed recipes, submit jobs via POST /api/v1/recipes/:name/jobs with a typed payload — no project or runner_command needed.

For projects, submit jobs via POST /api/v1/jobs with a project name, runner command, and payload.

Choosing a job type

I want to…	Use
Run one script with JSON input/output	`structured_runner`
Run code across a list of inputs	`structured_runner` + `fan_out.by`
Evaluate many tiny inputs per worker task	`structured_runner` + `fan_out.by` / `fan_out.items` + `batch_size`
Evaluate CMA/evolutionary candidates	`structured_runner` + `fan_out.items`
Train a model on a GPU	`structured_runner` + `profile: "gpu"`
Search a parameter grid	`sweep`
Run simulations and reduce results	`map_reduce`
Compare named strategies	`benchmark`
Submit many jobs at once	`POST /api/v1/jobs/batch`

When in doubt, use structured_runner.

Submitting a job


curl -sS https://computalot.com/api/v1/jobs \
  -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "structured_runner",
    "runner_command": ["python3", "evaluate.py"],
    "payload": {"model": "gpt-4", "dataset": "test_v3"},
    "project": "my-project",
    "timeout_s": 600
  }'

Sizing Requirements

requirements are minimums, not exact machine picks. storage_gb should cover more than dataset size: include your runtime footprint, writable caches, temp files, checkpoints, and sandbox/runtime overhead on the worker. Heavy ML runtimes often need tens of GB of free disk even before model weights are downloaded.

If one project revision runs on both lightweight CPU jobs and heavyweight GPU training jobs, split those runtimes when possible instead of sending one oversized environment everywhere. Smaller runtime footprints improve placement and reduce disk pressure on workers.

Runner protocol

Computalot writes the task payload to a temp file
Your script reads $COMPUTALOT_TASK_PAYLOAD (JSON input)
Your script writes JSON to $COMPUTALOT_TASK_RESULT (output)
Exit 0 = success, non-zero = failure

Report progress by printing to stdout:


print(f"COMPUTALOT_PROGRESS:{json.dumps({'step': 42, 'loss': 0.05})}")

Normal stdout/stderr now feeds task live_feedback.output_tail and the SSE job stream promptly while the task is still running instead of waiting for the 30s heartbeat window. If your runner wraps another process, keep that child unbuffered or flush explicitly so Computalot can forward logs as they are produced.

Job lifecycle

planning - queued - running - completed | partial | failed | cancelled

Poll: GET /api/v1/jobs/:id every 2-5s
Stream: GET /api/v1/jobs/:id/stream for SSE updates, including running-task output tails via task.live_feedback.output_tail
Batch watch: GET /api/v1/jobs/watch?ids=... for one SSE stream carrying client_ref, tags, meta, variant, aggregate summary fields, and persistence flags
Project stream: GET /api/v1/projects/:name/stream for one project-wide feed instead of per-job polling
Per-task detail: GET /api/v1/jobs/:id/tasks for live_feedback, latest_progress, checkpoint/resume state, and preserved last-failed-attempt diagnostics while a retry is queued or running
Canonical terminal results: GET /api/v1/results/:job_id for per-task results, aggregate fields, completeness, and artifact IDs
Output continuity: GET /api/v1/jobs/:id/output for aggregated stdout/stderr; during retries it preserves the most recent failed attempt until the current attempt emits new diagnostics
Artifacts: GET /api/v1/artifacts to list files from your jobs, then GET /api/v1/artifacts/:id to download them
Cancel: PUT /api/v1/jobs/:id/cancel

Supported fan-out shapes

{"fan_out": {"by": "models"}} — split one payload array field into one task per item
{"fan_out": {"items": [{...}, {...}]}} — provide the exact payload object for each task
{"fan_out": {"chunks": 20, "range_field": "total_seeds", "total": 10000}} — split a numeric range into chunk tasks

These shapes are mutually exclusive. Mixing by, items, or chunks + total in one request returns 422; Choose exactly one fan-out shape per job. Add batch_size or batch_per_task when each fan-out item is tiny and you want one dispatched task to process multiple items locally. Batched tasks receive payload._batch metadata.

Public payload contract

Public job, task, watch, and result payloads keep your submitted payload, meta, variant, aggregate fields, and artifact IDs, but they redact placement-only fields such as current_node, provider IDs, runtime paths, and image refs/digests. Treat those public surfaces as the user contract; infrastructure placement stays internal to Computalot.

Tags, batch, webhooks, and dependencies

Tags: "tags": ["experiment_42"] then filter with GET /api/v1/jobs?tag=experiment_42
Batch: POST /api/v1/jobs/batch for up to 200 jobs in one request; successful entries preserve index, payload, meta, and variant
Webhooks: "callback_url": "https://..." for completion notifications
DAG: "depends_on": ["job_id"] to chain jobs
Dependency artifact handoff: downstream jobs can use _artifacts.download: {"dataset": {"job_id": "job_id", "artifact": "dataset"}} to fetch a named artifact from a completed dependency
Shared coordination: PUT /api/v1/projects/:name/kv/:key stores small project-scoped JSON values, and payload._shared.resolve injects them or dependency job result paths into payload._shared.values before dispatch