Skip to content

feat: Add automated data validation script + GitHub Actions workflow#58

Open
sxryadipta wants to merge 3 commits into
statsbomb:masterfrom
sxryadipta:master
Open

feat: Add automated data validation script + GitHub Actions workflow#58
sxryadipta wants to merge 3 commits into
statsbomb:masterfrom
sxryadipta:master

Conversation

@sxryadipta

Copy link
Copy Markdown

Closes #57

What this PR adds

  • scripts/validate_data.py — validates all JSON data files against the StatsBomb Data Specification v1.1
  • .github/workflows/validate.yml — runs the script automatically on every push and weekly schedule
  • Note: Tested on a shallow clone; the workflow runs successfully. Full data validation will execute against the complete dataset on merge.

Here are the checks covered:

Corruption & structure

Match files

Event files

Lineup files

  • Duplicate player IDs within same team

Cross-file

  • Match IDs with no corresponding events or lineups file

No dependencies
Pure Python standard library only, no pip installs needed.

This script validates JSON files for common data quality issues in StatsBomb open data, covering events, matches, lineups, and competitions data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant