Skip to main content
Background Image

Cross References - Summary

·530 words·3 mins

Motivation
#

Resources often need to point to other resources (local, cross-API, or remote URIs). Although the idea of referencing by an identifier is simple, the behavioral details (naming, deletion rules, caching, consistency) must be defined to avoid inconsistent implementations and surprising client behavior.

Key ideas
#

  • Use a simple string identifier as the reference value so a field can point to:

    • resources in the same API,
    • resources in other APIs from the same provider,
    • or arbitrary external resources (URI).
  • Keep references decoupled from the target resource to avoid circular-locks and to scale when many resources point to one target.

  • Consumers must expect references can become stale or invalid (target moved or deleted).

Naming fields
#

  • Prefer the field name to indicate type and purpose. Example: authorId in a Book makes it clear the field is the unique id of an Author.
  • For dynamic target types (the referenced resource type can vary), add a companion type field such as targetType alongside targetId.

Data integrity (reference behavior)
#

Three common strategies when a referenced resource is deleted:

  1. Prohibit deletion while references exist — prevents dangling pointers but is often impractical (cascade deletes / many updates).
  2. Allow deletion and reset references (e.g., set to zero/null) — avoids locks but requires updating potentially huge numbers of referencing records (may not be atomic).
  3. Allow deletion and let pointers go stale — require clients to check references at runtime. This is the most practical/scalable in distributed systems because it preserves delete atomicity and avoids massive, non-atomic updates.

Recommendation: expect and design for invalid references rather than trying to enforce global referential integrity in a large/distributed system.

Value vs. reference
#

  • Reference (store authorId)

    • Ensures you always have the authoritative data if you fetch the target.
    • Requires additional fetch(es) to get related data (multiple API calls).
  • Value / inline copy (store author object inside Book)

    • Immediate access to data without another request.
    • Introduces cache / consistency problems: who updates the cached copy when the source changes?
    • Response sizes grow as nested resources grow; modify semantics become ambiguous (can you update the author by updating the book?).
  • Typical compromise: store references and use tooling (e.g., GraphQL) on the server to join/compose related resources for clients that need combined data.

Why GraphQL
#

  • GraphQL allows clients to request exactly the nested shape they need in one query (e.g., book + author name), avoiding multiple round trips while still keeping the underlying storage model reference-based and consistent.

Final API example
#

interface Book {
  id: string;
  authorId: string;
  title: string;
}

interface Author {
  id: string;
  name: string;
}

interface ChangeLogEntry {
  id: string;
  targetId: string;
  targetType: string;
  description: string;
}

Trade-offs (short)
#

  • References keep schemas small and consistent but need extra fetches or a composition layer (GraphQL) to assemble related data.
  • Inline values improve read convenience but cause consistency, size, and update-ambiguity issues.

Practical guidance
#

  • Name reference fields clearly (<resource>Id) and use companion type fields for polymorphic references.
  • Do not rely on the API to maintain global referential integrity in large/distributed systems; design clients to handle missing/invalid targets.
  • Use composition layers (GraphQL or backend joins) when clients need aggregated or nested data in a single call.