This mini-guide explains how to use the programmatic entry points that turn your split YAML metadata (dataset + template + resources) into a single OEMetadata JSON document.
If you’re looking for how to author the YAML files and how templating works, see the main Assembly Guide in the
creationmodule directory. This page just shows how to call the entry points.
The functions in omi.create wrap the full assembly pipeline:
- Discover / load your YAML parts (dataset, optional template, resources).
- Apply the template to each resource (deep merge; resource wins; keywords/topics/languages concatenate).
- Generate & validate the final OEMetadata JSON using the official schema (via
OEMetadataCreator). - Write the result to disk (
build_from_yaml) or many results to a directory (build_many_from_yaml).
from omi.create import build_from_yaml, build_many_from_yamlAssemble one dataset and write <output_file> (JSON).
-
base_dir(str | Path): Root that contains:datasets/<dataset_id>.dataset.yamldatasets/<dataset_id>.template.yaml(optional)resources/<dataset_id>/*.resource.yaml
-
dataset_id(str): Logical dataset name (e.g."powerplants"). -
output_file(str | Path): Path to write the generated OEMetadata JSON. -
index_file(str | Path | None): Optional explicit mapping file (metadata_index.yaml). If provided, paths are taken from the index instead of convention.
Assemble multiple datasets and write each as <output_dir>/<dataset_id>.json.
-
base_dir(str | Path): Same as above. -
output_dir(str | Path): Destination directory for one JSON file per dataset. -
dataset_ids(list[str] | None): Limit to specific datasets. IfNone, we:- Use keys from
index_filewhen provided, else - Discover all
datasets/*.dataset.yamlinbase_dir.
- Use keys from
-
index_file(str | Path | None): Optionalmetadata_index.yaml.
from omi.create import build_from_yaml
build_from_yaml(
base_dir="./metadata",
dataset_id="powerplants",
output_file="./out/powerplants.json",
)Directory layout:
metadata/
datasets/
powerplants.dataset.yaml
powerplants.template.yaml # optional
resources/
powerplants/
*.resource.yamlfrom omi.create import build_from_yaml
build_from_yaml(
base_dir="./metadata",
dataset_id="powerplants",
output_file="./out/powerplants.json",
index_file="./metadata/metadata_index.yaml",
)from omi.create import build_many_from_yaml
build_many_from_yaml(
base_dir="./metadata",
output_dir="./out",
)
# writes ./out/<dataset_id>.json for each dataset foundfrom omi.create import build_many_from_yaml
build_many_from_yaml(
base_dir="./metadata",
output_dir="./out",
dataset_ids=["powerplants", "households"],
index_file="./metadata/metadata_index.yaml",
)- Output JSON is written with
indent=2andensure_ascii=Falseto preserve characters like©. - Validation happens via
OEMetadataCreatorusing the official schema provided byoemetadata(imported throughomi.base.get_metadata_specification). - If a dataset YAML is missing,
FileNotFoundErroris raised. - If schema validation fails, you’ll get an exception from
omi.validation. Catch it where you call the entry point if you want to handle/report errors.
from pathlib import Path
from omi.create import build_from_yaml
def build_oemetadata_callable(**context):
base = Path("/project/metadata")
out = Path("/project/metadata/out/powerplants.json")
build_from_yaml(base, "powerplants", out)
# optionally push to airflow XCom, publish, upload, etc.- For unit tests of
omi.create, patchomi.create.assemble_metadata_dict/assemble_many_metadataand verify files are written. - For integration tests, put real example YAMLs under
tests/test_data/create/metadata/and callbuild_from_yamlend-to-end.
-
“Dataset YAML not found” Check
base_dir/datasets/<dataset_id>.dataset.yamlexists, or supply the correctindex_file. -
Unicode characters appear escaped (
\u00a9) Ensure you’re not re-writing the JSON elsewhere withensure_ascii=True. -
Template not applied Confirm your template file name matches
<dataset_id>.template.yaml(or is correctly referenced from the index), and the keys you expect to inherit aren’t already set in the resource (resource values win).