Datasets
Datasets are the data abstraction in pondrs. Every piece of data flowing through a pipeline β whether itβs a CSV file, an in-memory value, or a hardware register β is a dataset.
The Dataset trait
pub trait Dataset: serde::Serialize {
type LoadItem;
type SaveItem;
type Error;
fn load(&self) -> Result<Self::LoadItem, Self::Error>;
fn save(&self, output: Self::SaveItem) -> Result<(), Self::Error>;
fn is_param(&self) -> bool { false }
}
LoadItemβ the type produced when loading (e.g.DataFrame,String,f64)SaveItemβ the type accepted when saving (often the same asLoadItem)Errorβ the error type for I/O operations. Usecore::convert::Infalliblefor datasets that never fail (likeParam)is_param()β returnstruefor read-only parameter datasets. The pipeline validator uses this to prevent writing to params.Serializesupertrait β enables automatic YAML serialization of dataset configuration for the viz and catalog indexer.
Datasets in the minimal example
The catalog uses three dataset types:
#[derive(Serialize, Deserialize)]
struct Catalog {
readings: PolarsCsvDataset,
summary: MemoryDataset<f64>,
report: JsonDataset,
}
PolarsCsvDataset
Reads and writes CSV files as Polars DataFrames. Requires the polars feature. Configured with a file path and optional CSV options like separator:
readings:
path: data/readings.csv
separator: ","
MemoryDataset<T>
Thread-safe in-memory storage for intermediate values. Starts empty β loading before any save returns DatasetNotLoaded. Requires the std feature. Uses Arc<Mutex<Option<T>>> internally, so it works safely with the ParallelRunner.
summary: {}
JsonDataset
Reads and writes JSON files as serde_json::Value. Requires the json feature.
report:
path: data/report.json
Further reading
- Custom Datasets β how to implement your own dataset type
- List of Datasets β all built-in dataset types and their feature flags
- Error handling β how dataset errors are handled
- no_std Datasets β datasets available without the standard library