Pipeline Architecture - Summary

Table of Contents

Overview
#

The pipeline (pipes-and-filters) architecture splits functionality into a chain of independent processing stages (filters) connected by unidirectional channels (pipes). It appears across low-level shells (Unix pipelines), functional programming constructs, MapReduce-style systems, and higher-level integration/ETL/orchestration tools.

Topology
#

Pipes: point-to-point, typically unidirectional channels carrying payloads (smaller payloads preferred for performance).
Filters: self-contained, usually stateless components that perform a single concern and forward results along a pipe.

Filter types
#

Producer (Source) — start point, outbound only.
Transformer — accepts input, transforms some/all data, forwards it (map-like).
Tester — evaluates criteria and optionally emits output (filtering / reduce-like behavior).
Consumer — pipeline termination; persists or displays the final result.

Example (summary)
#

A telemetry pipeline: a Service Info Capture filter subscribes to a Kafka topic and forwards captured messages to tester filters (e.g., a Duration Filter). If the message matches duration metrics it goes to a Duration Calculator (transformer); otherwise it may be routed to an Uptime Filter and then to an Uptime Calculator. Transformed results are written by a Database Output consumer (e.g., MongoDB). The design cleanly separates concerns and allows adding new tester/transformer filters (extensibility).

Key properties & ratings (summary)
#

Partitioning: Technically partitioned — logic is divided by filter role (producer/tester/transformer/consumer).
Architectural quantum: Typically a single monolithic deployment (quantum = 1).
Strengths
- Simplicity & low cost: Easier to understand and cheaper to build/maintain than many distributed styles.
- Modularity / separation of concerns: Filters can be modified or replaced independently (within the monolith).
- Composability: Filters compose naturally (Unix pipes, shell scripts, ETL flows).
Weaknesses
- Scalability & elasticity: Very limited (rated low) because monolithic deployments make fine-grained scaling hard without complex internal parallelism.
- Fault tolerance: Poor — a fault in one part can crash the whole monolith.
- Deployability & testability: Medium — modular at code level but still requires testing/deploying the whole monolith for many changes.
- Reliability & availability: Medium reliability but higher MTTR due to large monolithic startup/recovery times (small apps ≈ 2 min; large apps ≈ 15+ min).

Extensibility
#

Easy to extend at the filter level: adding a new tester or transformer (for a new metric or transformation) is straightforward within the pipeline design.

Typical use cases / examples
#

Shell command pipelines (Unix |)
ETL/data pipelines (extract → transform → load)
Stream processing / telemetry ingestion (Kafka → filters → DB)
Compiler stages (lexer → parser → optimizer → codegen)
Machine-learning preprocessing pipelines (load → clean → scale → feature selection → train)

When to choose this style
#

Choose pipeline architecture when you need a simple, composable, one-way processing flow with clear separation of concerns and low operational complexity. Avoid it when you require high elasticity, independent fault domains, or fine-grained horizontal scaling—those needs favor distributed or service-oriented styles.

Overview#

Topology#

Filter types#

Example (summary)#

Key properties & ratings (summary)#

Extensibility#

Typical use cases / examples#

When to choose this style#