Catalog
The catalog is a plain Rust struct that groups all datasets used by a pipeline. It is not a special type — any struct that derives Serialize and Deserialize and contains dataset fields works as a catalog.
In the minimal example
#[derive(Serialize, Deserialize)]
struct Catalog {
readings: PolarsCsvDataset,
summary: MemoryDataset<f64>,
report: JsonDataset,
}
Each field is a dataset. The field names become the dataset names used in logging, visualization, and error messages — the framework discovers them automatically via serde serialization.
YAML configuration
The catalog struct is deserialized from a YAML file. Each field maps to a YAML key, and the dataset type determines what configuration is needed:
# catalog.yml
readings:
path: data/readings.csv
separator: ","
summary: {}
report:
path: data/report.json
- File-backed datasets (like
PolarsCsvDataset,JsonDataset) need at least apath. - In-memory datasets (like
MemoryDataset) use an empty mapping{}— they have no persistent configuration. - Parameters live in a separate params struct and file, not in the catalog.
Loading the catalog
When using App::from_yaml or App::from_args, the catalog is loaded and deserialized automatically:
pondrs::app::App::from_yaml(
dir.join("catalog.yml").to_str().unwrap(),
dir.join("params.yml").to_str().unwrap(),
)?
.with_args(std::env::args_os())?
.dispatch(pipeline)
You can also load it manually:
let contents = std::fs::read_to_string("catalog.yml")?;
let catalog: Catalog = serde_yaml::from_str(&contents)?;
For nested catalogs, naming conventions, and catalog overrides, see the Params & Catalog chapter.