Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Datasets

Datasets are the data abstraction in pondrs. Every piece of data flowing through a pipeline β€” whether it’s a CSV file, an in-memory value, or a hardware register β€” is a dataset.

The Dataset trait

pub trait Dataset: serde::Serialize {
    type LoadItem;
    type SaveItem;
    type Error;

    fn load(&self) -> Result<Self::LoadItem, Self::Error>;
    fn save(&self, output: Self::SaveItem) -> Result<(), Self::Error>;
    fn is_param(&self) -> bool { false }
}
  • LoadItem β€” the type produced when loading (e.g. DataFrame, String, f64)
  • SaveItem β€” the type accepted when saving (often the same as LoadItem)
  • Error β€” the error type for I/O operations. Use core::convert::Infallible for datasets that never fail (like Param)
  • is_param() β€” returns true for read-only parameter datasets. The pipeline validator uses this to prevent writing to params.
  • Serialize supertrait β€” enables automatic YAML serialization of dataset configuration for the viz and catalog indexer.

Datasets in the minimal example

The catalog uses three dataset types:

#[derive(Serialize, Deserialize)]
struct Catalog {
    readings: PolarsCsvDataset,
    summary: MemoryDataset<f64>,
    report: JsonDataset,
}

PolarsCsvDataset

Reads and writes CSV files as Polars DataFrames. Requires the polars feature. Configured with a file path and optional CSV options like separator:

readings:
  path: data/readings.csv
  separator: ","

MemoryDataset<T>

Thread-safe in-memory storage for intermediate values. Starts empty β€” loading before any save returns DatasetNotLoaded. Requires the std feature. Uses Arc<Mutex<Option<T>>> internally, so it works safely with the ParallelRunner.

summary: {}

JsonDataset

Reads and writes JSON files as serde_json::Value. Requires the json feature.

report:
  path: data/report.json

Further reading