Overview#
The pipeline (pipes-and-filters) architecture splits functionality into a chain of independent processing stages (filters) connected by unidirectional channels (pipes). It appears across low-level shells (Unix pipelines), functional programming constructs, MapReduce-style systems, and higher-level integration/ETL/orchestration tools.
Topology#
- Pipes: point-to-point, typically unidirectional channels carrying payloads (smaller payloads preferred for performance).
- Filters: self-contained, usually stateless components that perform a single concern and forward results along a pipe.
Filter types#
- Producer (Source) — start point, outbound only.
- Transformer — accepts input, transforms some/all data, forwards it (map-like).
- Tester — evaluates criteria and optionally emits output (filtering / reduce-like behavior).
- Consumer — pipeline termination; persists or displays the final result.
Example (summary)#
A telemetry pipeline: a Service Info Capture filter subscribes to a Kafka topic and forwards captured messages to tester filters (e.g., a Duration Filter). If the message matches duration metrics it goes to a Duration Calculator (transformer); otherwise it may be routed to an Uptime Filter and then to an Uptime Calculator. Transformed results are written by a Database Output consumer (e.g., MongoDB). The design cleanly separates concerns and allows adding new tester/transformer filters (extensibility).
Key properties & ratings (summary)#
Partitioning: Technically partitioned — logic is divided by filter role (producer/tester/transformer/consumer).
Architectural quantum: Typically a single monolithic deployment (quantum = 1).
Strengths
- Simplicity & low cost: Easier to understand and cheaper to build/maintain than many distributed styles.
- Modularity / separation of concerns: Filters can be modified or replaced independently (within the monolith).
- Composability: Filters compose naturally (Unix pipes, shell scripts, ETL flows).
Weaknesses
- Scalability & elasticity: Very limited (rated low) because monolithic deployments make fine-grained scaling hard without complex internal parallelism.
- Fault tolerance: Poor — a fault in one part can crash the whole monolith.
- Deployability & testability: Medium — modular at code level but still requires testing/deploying the whole monolith for many changes.
- Reliability & availability: Medium reliability but higher MTTR due to large monolithic startup/recovery times (small apps ≈ 2 min; large apps ≈ 15+ min).
Extensibility#
- Easy to extend at the filter level: adding a new tester or transformer (for a new metric or transformation) is straightforward within the pipeline design.
Typical use cases / examples#
- Shell command pipelines (Unix
|
) - ETL/data pipelines (extract → transform → load)
- Stream processing / telemetry ingestion (Kafka → filters → DB)
- Compiler stages (lexer → parser → optimizer → codegen)
- Machine-learning preprocessing pipelines (load → clean → scale → feature selection → train)
When to choose this style#
Choose pipeline architecture when you need a simple, composable, one-way processing flow with clear separation of concerns and low operational complexity. Avoid it when you require high elasticity, independent fault domains, or fine-grained horizontal scaling—those needs favor distributed or service-oriented styles.