Motivation#
Some API work must run asynchronously (LROs), but LRO-based on-demand methods leave three problems:
- Configuration management: callers must supply full configuration on every invocation; this becomes hard to track as parameters grow.
- Separation of duties / permissions: you may want different people to be able to configure a task vs execute it.
- Server-side scheduling: you may want the service itself to invoke work on a schedule rather than rely on fragile external schedulers.
Goal: provide a standard for configurable, rerunnable units of work that persist their configuration and can be re-run (manually or scheduled).
Overview / Core idea#
Job = a resource that stores configuration for a unit of work.
- Step A: create/configure a Job resource (store parameters once).
- Step B: execute the job later by calling a custom
run
method on the Job (no runtime parameters).
Benefits:
- Configuration is authored and versioned once on the Job.
- Permissions can be given separately for creating/updating Jobs vs running them.
- Scheduling becomes trivial: call the job’s run method on a schedule, with no per-invocation parameters.
Job resources (design)#
Jobs look like any other resource: they have a unique
id
(preferably assigned by the service) and fields that encode configuration.Example: turning an on-demand backup method into a Job:
On-demand request would carry fields like
destination
,compressionFormat
,encryptionKey
.Move those fields onto a
BackupChatRoomJob
resource:interface BackupChatRoomJob { id: string; chatRoom: string; destination: string; compressionFormat: string; // encryptionKey etc. }
Standard (synchronous) resource methods should be implemented for Jobs:
CreateBackupChatRoomJob
,GetBackupChatRoomJob
,ListBackupChatRoomJobs
,DeleteBackupChatRoomJob
UpdateBackupChatRoomJob
may be omitted if jobs are treated as immutable (delete+create instead of update to avoid concurrency issues).
The custom run
method#
Each Job exposes a custom
run
method (e.g.POST /{job}:run
).Input: only the Job identifier (no execution-time configuration).
Return: an Operation (LRO) that tracks the async work.
When the Operation completes it should resolve to a meaningful result — either:
- a standard resource created by the job, or
- an Execution resource representing the job’s output (see below), or
- an ephemeral response (but ephemeral responses risk being lost if LRO retention is limited).
Example signature:
@post("/{id=backupChatRoomJobs/*}:run") RunBackupChatRoomJob(req: RunBackupChatRoomJobRequest): Operation<RunBackupChatRoomJobResponse, RunBackupChatRoomJobMetadata>;
Why not pass config on run?#
All relevant config must be persisted on the Job resource to:
- avoid repeated large request messages,
- enforce separate permissions for config vs run,
- allow scheduled runs without client-supplied config.
Job execution results — options & trade-offs#
When a job runs, possible types of results:
Job run creates or updates a standard business resource (e.g., import created
ChatRoom
objects, backup created a file resource).- Best: return / expose the actual created resource (Operation resolves to that resource).
- Store job id / snapshot in that newly created resource (or in its metadata) so the provenance is traceable.
Job run produces analysis/metrics/ephemeral outputs that are not standard resources.
- Problem: if you only rely on the Operation (LRO) to carry results, the results’ durability depends on the API’s Operation retention policy (could be short).
- Solution recommended by the book: create Execution resources to store run output permanently.
Job run both creates standard resources and produces analysis data.
- Create the business resources as normal and also create an Execution resource for the analysis outputs.
Trade-offs for Execution vs keeping LROs forever
Keep LROs forever:
- Simpler (no extra resource type).
- But filtering LROs for a specific Job is awkward and retention policies may make results disappear.
Execution resources:
- Explicit, queryable, durable child resources of a Job.
- Clear semantics and discoverability.
- Slightly more API surface (extra resource type + endpoints).
Execution resources (detailed)#
Execution = a child resource under a specific Job that represents a single run’s output.
- Immutable, system-created (not user-created).
- Has its own
id
. - Contains a snapshot of the Job config as used for that run (for reproducibility and provenance).
- Stores the run’s result data (analysis metrics, report links, etc.).
API implications:
RunJob
still returns an Operation. When the Operation completes, the system creates anExecution
resource and the Operation resolves to (or references) that Execution.- Standard Execution endpoints:
ListExecutions(parent=job)
,GetExecution(id)
, (DeleteExecution
optional). Do not implementCreateExecution
(internal only) orUpdateExecution
(immutable).
Example:
interface AnalyzeChatRoomJobExecution { id: string; job: AnalyzeChatRoomJob; // snapshot of job config sentenceComplexity: number; sentiment: number; abuseScore: number; }
Relationship between LRO and Execution#
LRO (Operation):
- Tracks the asynchronous process: start → progress → done/error.
- Useful for status and intermediate progress.
- May be subject to retention/expiry rules.
Execution:
- Stores the durable result and the job configuration snapshot.
- Intended for long-term retention and query.
Typical flow:
- Client calls
RunJob(jobId)
→ server returns anOperation
. - Server performs work; when done, server creates an
Execution
(or a business resource). - Server updates the
Operation
todone
and sets theresult
to the createdExecution
(or resource). - Client may then
Get
the Execution or list executions for the job.
- Client calls
Resource layout / scoping#
Executions are scoped to a single Job type and are best placed as child resources of the Job:
- e.g.
GET /analyzeChatRoomJobs/{jobId}/executions
andGET /analyzeChatRoomJobs/{jobId}/executions/{executionId}
- e.g.
This layout answers the common question: “What executions have happened for this specific job?”
Permissions model#
Jobs allow natural separation:
- Give some users permission to
Create
/Update
Jobs (configure). - Give other users permission to
Run
Jobs (execute), but not to change configuration.
- Give some users permission to
An alternative (more complex) approach is a fine-grained permission system that inspects runtime parameters — more powerful but more complex to design and maintain.
Decision guide (practical)#
When designing a rerunnable job, choose between Execution vs direct resource creation:
- If the run produces a standard business resource (resource with lifecycle beyond the run), create that resource and record Job provenance in it. No separate Execution needed.
- If the run produces analysis / ephemeral data / results that are not an existing resource type and these results must be durable/queryable, create an Execution resource and store the job snapshot there.
- If both happen, create both the business resources and an Execution for the analytical outputs.
- Avoid relying solely on LROs to store durable results because LRO retention policies may vary or be limited.
Example signatures (conceptual)#
// Create job
@post("/analyzeChatRoomJobs")
CreateAnalyzeChatRoomJob(req: CreateAnalyze...): AnalyzeChatRoomJob
// Run job returns an Operation that resolves to an Execution
@post("/{id=analyzeChatRoomJobs/*}:run")
RunAnalyzeChatRoomJob(req: RunAnalyze...):
Operation<AnalyzeChatRoomJobExecution, RunAnalyzeChatRoomJobMetadata>
// List executions
@get("/{parent=analyzeChatRoomJobs/*}/executions")
ListAnalyzeChatRoomJobExecutions(...): ListAnalyzeChatRoomJobExecutionsResponse
Key takeaways (brief)#
Rerunnable jobs separate configuration (Job resource) from execution (run method).
run
returns an LRO to track asynchronous work; the LRO should resolve to either:- the newly created business resource, or
- an Execution resource when the output is an analysis/report that must be persisted.
Execution resources are immutable, system-created children of Jobs and store both results and the job config snapshot, ensuring durability beyond LRO retention policies.
Design choice (Execution vs no-Execution) depends on whether the run output is a business resource or a non-resource analysis that needs long-term retention.