Parallel Runner
The ParallelRunner executes independent pipeline nodes concurrently using scoped threads.
Requires the std feature.
Overview
#[derive(Default)]
pub struct ParallelRunner;
impl Runner for ParallelRunner {
fn name(&self) -> &'static str { "parallel" }
// ...
}
How it works
- Builds a dependency graph from the pipeline using
build_pipeline_graph() - Identifies source datasets — params and external inputs that are available immediately
- Schedules nodes as soon as all their input datasets have been produced
- Tracks produced datasets — when a node completes, its output datasets become available, potentially unblocking other nodes
- Uses
std::thread::scopefor safe scoped threads — no'staticbounds needed
Usage
$ my_app run --runner parallel
Or programmatically:
use pondrs::runners::ParallelRunner;
App::new(catalog, params)
.with_runners((SequentialRunner, ParallelRunner))
.execute(pipeline)?;
Dependency analysis
The parallel runner determines which nodes can run concurrently by analyzing dataset dependencies:
param
/ \
[a] [b] ← a and b can run in parallel (both read param)
\ /
[c] ← c waits for both a and b to complete
Only data dependencies matter — the tuple ordering in your pipeline function is irrelevant to the parallel runner.
Error handling
On the first node failure:
on_node_errorfires for the failed nodeon_pipeline_errorfires for all ancestor pipelines- No new nodes are scheduled
- Already-running nodes are allowed to complete (drain)
- The first error is returned
Pipeline hooks
Pipeline hooks behave differently than in the sequential runner:
before_pipeline_runfires when all of the pipeline’s declared inputs are availableafter_pipeline_runfires when all of the pipeline’s declared outputs have been produced
This means pipeline hooks may fire at different points in wall-clock time compared to sequential execution.
Thread safety requirements
Because nodes run on different threads:
- Hooks must be
Sync(useMutexor atomics for mutable state) - Datasets used concurrently must support concurrent access.
MemoryDatasetusesArc<Mutex<_>>and is safe.CellDatasetis not safe for parallel use.
When to use
Use the parallel runner when:
- Your pipeline has independent branches that can run concurrently
- Nodes involve I/O (file reads, network calls) where parallelism helps
- You’re in a
stdenvironment with thread support