Pipeline
The Pipeline struct groups related steps into a named container with declared input/output dataset contracts.
Definition
pub struct Pipeline<S: StepInfo, Input: NodeInput, Output: NodeOutput> {
pub name: &'static str,
pub steps: S,
pub input: Input,
pub output: Output,
}
nameβ label for logging, hooks, and visualizationstepsβ a tuple of nodes (and/or nested pipelines)inputβ datasets this pipeline expects to be available when it runsoutputβ datasets this pipeline guarantees to produce
Pipelines are containers β runners never call them directly. Instead, they recurse into the pipelineβs steps. The PipelineInfo::is_leaf() method returns false for pipelines, signaling the runner to descend into children.
Example
fn pipeline<'a>(cat: &'a Catalog, params: &'a Params) -> impl Steps<PondError> + 'a {
(
Node { name: "load", func: |v| (v,), input: (¶ms.input,), output: (&cat.raw,) },
Pipeline {
name: "processing",
steps: (
Node { name: "clean", func: |v| (v,), input: (&cat.raw,), output: (&cat.clean,) },
Node { name: "transform", func: |v| (v * 2,), input: (&cat.clean,), output: (&cat.result,) },
),
input: (&cat.raw,),
output: (&cat.result,),
},
)
}
Input/output contracts
The input and output declarations are validated by check():
- Every declared input must be consumed by at least one child node
- Every declared output must be produced by at least one child node
If these contracts are violated, check() returns CheckError::UnusedPipelineInput or CheckError::UnproducedPipelineOutput.
Nesting
Pipelines can be nested arbitrarily. The validator and runners recurse through the tree:
Pipeline {
name: "outer",
steps: (
Pipeline {
name: "inner",
steps: (/* nodes */),
input: (/* ... */),
output: (/* ... */),
},
Node { /* ... */ },
),
input: (/* ... */),
output: (/* ... */),
}
Hooks and visualization
Pipeline boundaries fire their own hook events (before_pipeline_run, after_pipeline_run, on_pipeline_error) distinct from node events. In the visualization, pipelines appear as expandable containers that group their child nodes.
When to use Pipeline vs flat tuples
Use a flat tuple of nodes when the pipeline is simple and linear. Use Pipeline when you want to:
- Name a group of related steps for logging and visualization
- Declare input/output contracts that are validated by
check() - Organize large pipelines into logical sections