Skip to main content
Background Image

Rerunnable Jobs - Summary

·1128 words·6 mins

Motivation
#

  • Some API work must run asynchronously (LROs), but LRO-based on-demand methods leave three problems:

    1. Configuration management: callers must supply full configuration on every invocation; this becomes hard to track as parameters grow.
    2. Separation of duties / permissions: you may want different people to be able to configure a task vs execute it.
    3. Server-side scheduling: you may want the service itself to invoke work on a schedule rather than rely on fragile external schedulers.
  • Goal: provide a standard for configurable, rerunnable units of work that persist their configuration and can be re-run (manually or scheduled).


Overview / Core idea
#

  • Job = a resource that stores configuration for a unit of work.

    • Step A: create/configure a Job resource (store parameters once).
    • Step B: execute the job later by calling a custom run method on the Job (no runtime parameters).
  • Benefits:

    • Configuration is authored and versioned once on the Job.
    • Permissions can be given separately for creating/updating Jobs vs running them.
    • Scheduling becomes trivial: call the job’s run method on a schedule, with no per-invocation parameters.

Job resources (design)
#

  • Jobs look like any other resource: they have a unique id (preferably assigned by the service) and fields that encode configuration.

  • Example: turning an on-demand backup method into a Job:

    • On-demand request would carry fields like destination, compressionFormat, encryptionKey.

    • Move those fields onto a BackupChatRoomJob resource:

      interface BackupChatRoomJob {
        id: string;
        chatRoom: string;
        destination: string;
        compressionFormat: string;
        // encryptionKey etc.
      }
      
  • Standard (synchronous) resource methods should be implemented for Jobs:

    • CreateBackupChatRoomJob, GetBackupChatRoomJob, ListBackupChatRoomJobs, DeleteBackupChatRoomJob
    • UpdateBackupChatRoomJob may be omitted if jobs are treated as immutable (delete+create instead of update to avoid concurrency issues).

The custom run method
#

  • Each Job exposes a custom run method (e.g. POST /{job}:run).

    • Input: only the Job identifier (no execution-time configuration).

    • Return: an Operation (LRO) that tracks the async work.

    • When the Operation completes it should resolve to a meaningful result — either:

      • a standard resource created by the job, or
      • an Execution resource representing the job’s output (see below), or
      • an ephemeral response (but ephemeral responses risk being lost if LRO retention is limited).
  • Example signature:

    @post("/{id=backupChatRoomJobs/*}:run")
    RunBackupChatRoomJob(req: RunBackupChatRoomJobRequest):
      Operation<RunBackupChatRoomJobResponse, RunBackupChatRoomJobMetadata>;
    

Why not pass config on run?
#

  • All relevant config must be persisted on the Job resource to:

    • avoid repeated large request messages,
    • enforce separate permissions for config vs run,
    • allow scheduled runs without client-supplied config.

Job execution results — options & trade-offs
#

When a job runs, possible types of results:

  1. Job run creates or updates a standard business resource (e.g., import created ChatRoom objects, backup created a file resource).

    • Best: return / expose the actual created resource (Operation resolves to that resource).
    • Store job id / snapshot in that newly created resource (or in its metadata) so the provenance is traceable.
  2. Job run produces analysis/metrics/ephemeral outputs that are not standard resources.

    • Problem: if you only rely on the Operation (LRO) to carry results, the results’ durability depends on the API’s Operation retention policy (could be short).
    • Solution recommended by the book: create Execution resources to store run output permanently.
  3. Job run both creates standard resources and produces analysis data.

    • Create the business resources as normal and also create an Execution resource for the analysis outputs.

Trade-offs for Execution vs keeping LROs forever

  • Keep LROs forever:

    • Simpler (no extra resource type).
    • But filtering LROs for a specific Job is awkward and retention policies may make results disappear.
  • Execution resources:

    • Explicit, queryable, durable child resources of a Job.
    • Clear semantics and discoverability.
    • Slightly more API surface (extra resource type + endpoints).

Execution resources (detailed)
#

  • Execution = a child resource under a specific Job that represents a single run’s output.

    • Immutable, system-created (not user-created).
    • Has its own id.
    • Contains a snapshot of the Job config as used for that run (for reproducibility and provenance).
    • Stores the run’s result data (analysis metrics, report links, etc.).
  • API implications:

    • RunJob still returns an Operation. When the Operation completes, the system creates an Execution resource and the Operation resolves to (or references) that Execution.
    • Standard Execution endpoints: ListExecutions(parent=job), GetExecution(id), (DeleteExecution optional). Do not implement CreateExecution (internal only) or UpdateExecution (immutable).
  • Example:

    interface AnalyzeChatRoomJobExecution {
      id: string;
      job: AnalyzeChatRoomJob; // snapshot of job config
      sentenceComplexity: number;
      sentiment: number;
      abuseScore: number;
    }
    

Relationship between LRO and Execution
#

  • LRO (Operation):

    • Tracks the asynchronous process: start → progress → done/error.
    • Useful for status and intermediate progress.
    • May be subject to retention/expiry rules.
  • Execution:

    • Stores the durable result and the job configuration snapshot.
    • Intended for long-term retention and query.
  • Typical flow:

    1. Client calls RunJob(jobId) → server returns an Operation.
    2. Server performs work; when done, server creates an Execution (or a business resource).
    3. Server updates the Operation to done and sets the result to the created Execution (or resource).
    4. Client may then Get the Execution or list executions for the job.

Resource layout / scoping
#

  • Executions are scoped to a single Job type and are best placed as child resources of the Job:

    • e.g. GET /analyzeChatRoomJobs/{jobId}/executions and GET /analyzeChatRoomJobs/{jobId}/executions/{executionId}
  • This layout answers the common question: “What executions have happened for this specific job?”


Permissions model
#

  • Jobs allow natural separation:

    • Give some users permission to Create/Update Jobs (configure).
    • Give other users permission to Run Jobs (execute), but not to change configuration.
  • An alternative (more complex) approach is a fine-grained permission system that inspects runtime parameters — more powerful but more complex to design and maintain.


Decision guide (practical)
#

When designing a rerunnable job, choose between Execution vs direct resource creation:

  • If the run produces a standard business resource (resource with lifecycle beyond the run), create that resource and record Job provenance in it. No separate Execution needed.
  • If the run produces analysis / ephemeral data / results that are not an existing resource type and these results must be durable/queryable, create an Execution resource and store the job snapshot there.
  • If both happen, create both the business resources and an Execution for the analytical outputs.
  • Avoid relying solely on LROs to store durable results because LRO retention policies may vary or be limited.

Example signatures (conceptual)
#

// Create job
@post("/analyzeChatRoomJobs")
CreateAnalyzeChatRoomJob(req: CreateAnalyze...): AnalyzeChatRoomJob

// Run job returns an Operation that resolves to an Execution
@post("/{id=analyzeChatRoomJobs/*}:run")
RunAnalyzeChatRoomJob(req: RunAnalyze...):
  Operation<AnalyzeChatRoomJobExecution, RunAnalyzeChatRoomJobMetadata>

// List executions
@get("/{parent=analyzeChatRoomJobs/*}/executions")
ListAnalyzeChatRoomJobExecutions(...): ListAnalyzeChatRoomJobExecutionsResponse

Key takeaways (brief)
#

  • Rerunnable jobs separate configuration (Job resource) from execution (run method).

  • run returns an LRO to track asynchronous work; the LRO should resolve to either:

    • the newly created business resource, or
    • an Execution resource when the output is an analysis/report that must be persisted.
  • Execution resources are immutable, system-created children of Jobs and store both results and the job config snapshot, ensuring durability beyond LRO retention policies.

  • Design choice (Execution vs no-Execution) depends on whether the run output is a business resource or a non-resource analysis that needs long-term retention.