A reusable framework for migrating legacy data into SAP-style target structures, with built-in data profiling, rule-driven validation, and reconciliation reporting. Demonstrated on an SAP Material Master object.
This is the open-tools version of enterprise data-migration work: take a messy legacy extract, cleanse and map it to the target schema, validate every record against business rules, hold back the bad ones, and produce a reconciliation report that proves the load was complete and correct.
- Profiling — assesses source completeness, blanks, duplicates per field.
- Cleansing — trims whitespace, standardizes case, removes exact duplicates.
- Validation — config-driven rules: required fields, allowed value sets, uniqueness, length limits, numeric/range checks.
- Reconciliation — accounts for every source record (loaded vs. rejected), confirms counts balance, and writes a sign-off report.
All field mappings and validation rules live in config/mapping_material_master.yaml — no code changes needed to adjust the schema, add a rule, or migrate a different object. Legacy fields map to SAP targets (e.g. material_number -> MATNR, material_type -> MTART).
pip install -r requirements.txt
python generate_legacy_data.py # creates a messy legacy extract
python migrate.py # runs the full migrationOutputs land in output/: the loaded records, the rejected records (held for review), and reconciliation_report.md.
On an 80-record sample seeded with realistic quality issues, the framework removed 1 duplicate, flagged 9 problem records (invalid material types, missing required fields, duplicate material numbers, a negative weight, a non-numeric weight), loaded 70 clean records, and confirmed the counts balanced — an 88.6% load rate with a full audit trail of every rejection.
Built on four years of SAP data-migration experience at Accenture, delivering master and transactional objects (GL, Material Master, BOM, Customer Master) with SQL validation frameworks that reduced post-migration defects ~30%. This project recreates that validation-and-reconciliation discipline in open tools.
The framework is object-agnostic — point it at a new mapping config to migrate Customer Master, GL, or any other object. Each new object is a YAML file, not new code.