A hands-on reference architecture for fully automated SQL code quality pipelines using SonarQube, GitHub Actions, and Snowflake.
In large, federated organizations, scaling analytics isn't (just) a tech challenge — it's an operational one.
From a technological and operational perspective, automation, governance and consistency are vital for scaling analytics in large, federated organisations. With agile methodology and modularisation, deployment volume can rise to hundreds or thousands per day, so manual QA simply cannot keep pace. For example, if 15 analytics teams were to deploy changes daily, the number of full-time reviewers required for manual reviews would be prohibitively high, resulting in bottlenecks, missed checks and an increased risk of inconsistent standards and data incidents. DataOpsBackbone addresses these issues by automating every critical step:
Quality Gates leverage SonarQube code scans to enforce SQL and customisable coding rules based on regular expressions, which are applied automatically with every git push request. For example, forgetting to prefix a schema name or hardcoding a database name triggers an automated block and feedback, thereby enforcing standards before code can be shipped.
- Releases are versioned and deployed via DCM (Database Change Management) with plan/deploy semantics, then validated through SQL tests. This keeps production safe from changes that have not been properly tested.
- Governance is built in: rules such as 'no grants to PUBLIC' or 'only UTC TIMESTAMP allowed' are continuously enforced, and all compliance-relevant data (such as SQL code scans and test results) is logged for auditing purposes.
- Teams have full transparency and can be agile and reduce technical debt themselves. Automation enforces rules and monitors testing over time, so centralised approvals no longer become a bottleneck.
This setup offers repeatability, auditability and peace of mind, enabling new teams to get up and running quickly and allowing developers to focus on creating value rather than policing standards. The showcased projects are practical blueprints for achieving reliable, scalable analytics operations with Snowflake and GitHub Actions, not just demos.
A DataOps pipeline that automates:
- Syncing changes from GitHub
- SQL linting & validation (SonarQube + regex rules)
- Declarative schema deployment via Snowflake DCM (Database Change Management)
- SQL validation testing against deployed objects (CTRF JSON reports)
- Test trend reporting via UnitTestHistory v3.0
- Packaging deployable artifacts
It combines:
- GitHub Actions — reusable workflow called by all consumer repos
- Self-hosted runners (2 org-level runners on
zbrainiac-labs) - SonarQube extended with SQL & Text plugins
- Docker Compose for local stack orchestration
- Snowflake CLI with DCM for declarative deployment (
DEFINEsyntax + Jinja templating) - SQL Validation with CTRF JSON test reports
- UnitTestHistory for HTML trend dashboards
DataOpsBackbone provides a single reusable GitHub Actions workflow that all consumer repos call:
# In each consumer repo (.github/workflows/pipeline.yml):
permissions:
id-token: write
contents: read
jobs:
pipeline:
uses: zbrainiac-labs/DataOpsBackbone/.github/workflows/dataops-pipeline.yml@main
with:
SOURCE_DATABASE: <DB_NAME>
SOURCE_SCHEMA: <SCHEMA_NAME>
DCM_PROJECT_IDENTIFIER: <DB.SCHEMA.PROJECT>
DCM_TARGET: DEV
secrets: inherit| Job | Timeout | Purpose |
|---|---|---|
| prepare | 10 min | Validate inputs, OIDC auth, pre-deploy SQL |
| scan | 30 min | Extract deps, SQLFluff, SonarQube, Quality Gate |
| deploy | 20 min | Clone (optional), DCM deploy, post-deploy, custom scripts |
| validate | 15 min | SQL validation tests, CTRF-to-SonarQube conversion |
| cleanup | 10 min | Drop clone schema (always runs if clone enabled) |
| release | 15 min | Deploy to original, zip, GitHub release, summary |
The pipeline uses OIDC Workload Identity Federation by default (secretless). GitHub issues a short-lived token per job that Snowflake validates directly. No PAT or stored secrets needed.
Fallback to SNOW_CONFIG_B64 (PAT-based) is available via USE_OIDC: false.
- OIDC auth — secretless, short-lived tokens per job (no stored PAT)
- Multi-job architecture — parallelization, resumability, clear failure boundaries
- Environment protection —
environment:on deploy/release for approval gates - Concurrency control — prevents parallel deploys to the same schema
- Input validation — regex-validated database/schema/project identifiers
- Shell strict mode —
set -Eeuo pipefailcatches silent failures - Retry logic — 3 attempts with exponential backoff on Snowflake operations
- Timeouts — per-job timeouts prevent runaway execution
- Pinned actions — all GitHub Actions pinned to commit SHAs
- Identifier quoting — Snowflake
IDENTIFIER()for defense-in-depth - DRY_RUN mode — scan without deploying (for PR validation)
To block deployments on SonarQube quality gate failure, set QUALITY_GATE_ENFORCED: true in your consumer repo's workflow caller:
jobs:
pipeline:
uses: zbrainiac-labs/DataOpsBackbone/.github/workflows/dataops-pipeline.yml@main
with:
SOURCE_DATABASE: MY_DB
SOURCE_SCHEMA: MY_SCHEMA
DCM_PROJECT_IDENTIFIER: MY_DB.MY_SCHEMA.MY_PROJECT
QUALITY_GATE_ENFORCED: true
secrets: inheritWhen disabled (default), quality gate failures are reported but do not block the pipeline.
| Repo | Database | DCM Schema | Data Schemas | Clone per Build |
|---|---|---|---|---|
| mother-of-all-Projects | OPS_DEV | OPS_DCM | OPS_RAW_v001 | ✅ |
| project-one | ONE_DEV | ONE_DCM | ONE_RAW_v001 | ✅ |
| MasterDataManagement | MDM_DEV | MDM_DCM | MDM_RAW_v001, MDM_AGG_v001, MDM_SRV_v001 | |
| crm_dcm_project | CRM_DEV | CRM_DCM | CRM_RAW_v001, CRM_CUR_v001 | |
| SyntheticRetailBank | AAA_DEV_SYNTHETIC_BANK | AAA_DCM | CRM_RAW_v001, PAY_RAW_v001, ... | ✅ |
| sharing_any_objects | ECO_DEV | ECO_DCM | ECO_RAW_v001 | ✅ |
| crew-asset-management | SAM_DEMO | SAM_DCM | SAM_RAW_v001 |
DataOpsBackbone/
├── .github/workflows/
│ ├── dataops-pipeline.yml # Reusable pipeline (called by all repos)
│ └── docker-publish.yml # Docker image CI
├── github-runner/
│ ├── Dockerfile # Self-hosted runner image (incl. SQLFluff)
│ ├── entrypoint.sh # Runner registration (org/repo scope)
│ ├── sonar-rules-setup.sh # Auto-create SonarQube quality profile (40 txt: rules)
│ ├── sonar-token-init.sh # Auto-generate SONAR_TOKEN per runner
│ ├── sonar-scanner_v2.sh # Run sonar-scanner + import SQLFluff issues
│ ├── sqlfluff-to-sonar.sh # Run SQLFluff → SonarQube Generic Issue format
│ ├── sqlfluff_to_sonar.py # JSON converter (SQLFluff → SonarQube)
│ ├── sqlfluff_sonar.cfg # SQLFluff config (non-overlapping rules only)
│ ├── ddl_uppercase_keywords.py # Normalize GET_DDL() output
│ ├── sql_validation_v4.sh # SQL tests → CTRF JSON
│ ├── convert_junit_to_ctrf.py # Legacy XML→JSON migration
│ ├── snowflake-deploy-dcm_v1.sh
│ ├── snowflake-extract-dependencies_v1.sh
│ ├── render-sql_v1.sh # Jinja-style template rendering
│ ├── unitth.jar # UnitTestHistory v3.0
│ └── tests.sqltest # Sample test file
├── sqlfluff/ # Standalone SQLFluff linter (alternative to SonarQube)
│ ├── .sqlfluff # SQLFluff config (all rules)
│ ├── lint.py # Combined runner: SQLFluff + custom regex rules
│ ├── plugins/dataops_rules/ # 28 custom regex rules (DO01–DO28)
│ └── test_sql/ # Good/bad SQL examples for testing
├── sonarqube/Dockerfile # Custom SonarQube image
├── nginx/default.conf # Nginx for test report serving
├── backup/ # SonarQube quality profile backups
├── images/ # Documentation images
├── docker-compose.yml # Full stack (SonarQube + 2 runners + nginx)
└── start.sh # One-command startup
The showcase is built around two distinct data domains, each represented as an individual database within the same Snowflake account. This approach allows for logical isolation and independent management of domain-specific data assets.
Within each domain (database), schemas are strategically utilized to achieve two key objectives:
- Maturity Levels: Schemas separate data objects based on their maturity level (e.g., raw, curated, conformed). This provides a clear path for data as it progresses through various transformation stages.
- Versioning: Schemas also incorporate versioning for underlying database objects like tables, views, stages and procedures. This ensures traceability, facilitates rollbacks, and supports agile development by allowing iterative changes without disrupting existing consumers.
- Improved Organization: Data assets are logically grouped by business domain, making them easier to discover and manage.
- Enhanced Data Governance: Clear maturity levels and versioning promote better control over data quality and evolution.
- Scalability & Maintainability: The modular design reduces interdependencies, simplifying development, testing, and maintenance.
- Demonstrates Best Practices: Provides a practical example of implementing a domain-driven data architecture in Snowflake.
We not only use static source code analysis to review new code coming into the environment, but also check the existing setup and enforce isolation more effectively. By isolating domains and versions, the impact of changes or failures in one area on others is minimised, thereby enhancing overall system stability and aiding regression testing.
All Snowflake object names use UPPERCASE with underscore separators.
| Position | Field | Values |
|---|---|---|
| 1-3 | Domain | 3-char business domain (IOT, CLR, PAY, CRM, REF) |
| 4-7 | Environment | _DEV, _TE1, _PER, _PRD |
Examples: CLR_DEV, PAY_PRD, IOT_TE1
| Position | Field | Values |
|---|---|---|
| 1-3 | Domain | Same 3-char domain code |
| 4-8 | Maturity | _RAW_, _CUR_, _AGG_, _GOL_ |
| 9-12 | Version | v001 -- v999 |
DCM schemas use {DOMAIN}_DCM (unversioned) — one per database, holds the DCM project definition.
Examples: CRM_RAW_v001, IOT_AGG_v012, REF_CUR_v003, OPS_DCM, CRM_DCM
| Position | Field | Description | Values |
|---|---|---|---|
| 1-3 | Domain | 3-char business domain | IOT, CLR, PAY, CRM, REF |
| 4 | Component | Sub-component (GitHub repo) | Single char: I, A, T, P, etc. |
| 5-8 | Maturity | Data maturity level | _RAW, _CUR, _AGG, _GOL |
| 9-12 | Object type | Snowflake object type | _TB_, _VW_, _DT_, _ST_, _FF_, _SP_, _TK_ |
| 13+ | Free text | Business-meaningful name | Uppercase, underscores allowed |
Examples:
ICGI_RAW_TB_SWIFT_MESSAGES-- ICG domain, Ingestion repo, raw tableICGA_AGG_DT_SWIFT_PACS008-- ICG domain, Aggregation repo, aggregated dynamic tableIOTI_RAW_VW_SENSOR_GEOLOC-- IOT domain, Ingestion repo, raw viewICGI_RAW_ST_SWIFT_INBOUND-- ICG domain, Ingestion repo, raw stageICGI_RAW_FF_XML-- ICG domain, Ingestion repo, raw file format
This list provides a few examples of SQL validation rules, each of which is paired with a regular expression (regex) that can be used to identify non-compliant code using the Community Text plugin of SonarQube.
Backups of these rules, which can be restored as a Quality Profile, are available in the repository (link). Rules are also auto-created at runner startup via sonar-rules-setup.sh.
(?i)^\s*CREATE\s+(?!OR\s+REPLACE\b)(?!.*\bIF\s+NOT\s+EXISTS\b).*?\bSCHEMA\b(?is)^(?!\s*--).*CREATE\s+(?!OR\s+REPLACE\b|.*IF\s+NOT\s+EXISTS\b).*TABLE\b(?i)^(?!\s*--)\s*create\s+(or\s+replace\s+)?(table|view|schema)\s+(if\s+not\s+exists\s+)?[a-z0-9_]+\.[a-z0-9_]+(\.[a-z0-9_]+)?(?i)^(?!\s*--).*grant\s+.*\s+to\s+public\b(?i)^\s*DROP\s+(SCHEMA|TABLE|VIEW|DYNAMIC\s+TABLE|STAGE|FILE\s+FORMAT|PROCEDURE|FUNCTION|TASK)\s+(?!IF\s+EXISTS\b)(?i)^(?!\s*--)\s*USE\s+(DATABASE|SCHEMA|ROLE)\b^(?!\s*--).*\bTIMESTAMP_(NTZ|LTZ)(\s*\(\s*\d+\s*\))?\b(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?SCHEMA\s+(IF\s+NOT\s+EXISTS\s+)?(?:[a-z0-9_]+\.)?(?!RAW_|CUR_|AGG_|GOL_|REF_)[a-z0-9_]+;(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?SCHEMA\s+(IF\s+NOT\s+EXISTS\s+)?(?:[a-z0-9_]+\.)?[a-z0-9_]+(?<!_v\d{3});(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?(?!DYNAMIC\s)TABLE\s+(IF\s+NOT\s+EXISTS\s+)?(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_TB_)[A-Z_][A-Z0-9_]*(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?VIEW\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_VW_)[A-Z_][A-Z0-9_]*(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?DYNAMIC\s+TABLE\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_DT_)[A-Z_][A-Z0-9_]*(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?STAGE\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_ST_)[A-Z_][A-Z0-9_]*^.*cross_db_true.*$^.*cross_schema_true.*$(?i)^(?!\s*--)\s*GRANT\s+ALL\s+(PRIVILEGES\s+)?ON\b(?i)^(?!\s*--)\s*(USE\s+ROLE|SET\s+ROLE|GRANT\s+.*TO\s+ROLE|GRANT\s+ROLE)\s+.*\bACCOUNTADMIN\b(?i)^(?!\s*--)\s*.*PASSWORD\s*=\s*'[^']+'(?i)^(?!\s*--)\s*SELECT\s+\*\s+FROM\bDisabled: Redundant — already covered by SQLFluff AM04 (
SELECT *unknown columns) and SQLCC C002 (SELECT *used).
^(?!\s*--).*\b(FLOAT|DOUBLE|REAL)\b^(?!\s*--).*\bVARCHAR\s*[^(](?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?TABLE\s+(?!.*\bCOMMENT\b).*;\s*$(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?VIEW\b[\s\S]*?\bORDER\s+BY\b(?i)^(?!\s*--)\s*COPY\s+INTO\s+(?!.*\bON_ERROR\b).*;\s*$(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?DYNAMIC\s+TABLE\s+(?!.*\bTARGET_LAG\b)(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?FILE\s+FORMAT\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_FF_)[A-Z_][A-Z0-9_]*(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?PROCEDURE\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_SP_)[A-Z_][A-Z0-9_]*(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?TASK\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_TK_)[A-Z_][A-Z0-9_]*Both tools run in the CI/CD pipeline. SQLFluff issues are imported into SonarQube as external issues via sonar.externalIssuesReportPaths, giving one dashboard for everything.
Pipeline: ... → SQLFluff lint → sonar-scanner (imports sqlfluff_issues.json) → Quality Gate → ...
The scanner uses sonar.language=sql to ensure the SQL Code Checker plugin claims .sql files. The text plugin (txt:) runs alongside via its own sensor. Only .git/** is excluded — all other files are indexed.
Each rule runs in exactly one tool to avoid double-counting:
| Tool | Responsibility | Rules |
|---|---|---|
| SonarQube txt: | Safety, security, naming, data types, deps, style | 40 rules |
| SonarQube SQLCC: | Structural SQL (views, joins, nulls, ORDER BY) | 19 rules |
| SQLFluff (external) | Formatting, AST-based style, implicit aliases | 23 rules |
| Total | 82 rules |
| Category | Rules | Examples |
|---|---|---|
| Safety | 6 | CREATE without IF NOT EXISTS, DROP without IF EXISTS, USE statements, ALTER TABLE DROP COLUMN, TRUNCATE |
| Security | 4 | GRANT PUBLIC, ACCOUNTADMIN, GRANT ALL, plaintext passwords |
| Naming conventions | 15 | Table/View/DT/Stage/Schema/FF/SP/Task/Stream/Semantic View naming, maturity-level code enforcement |
| Data types | 3 | TIMESTAMP_NTZ/LTZ, FLOAT/DOUBLE/REAL, VARCHAR without length |
| Quality | 6 | SELECT *, TABLE COMMENT, DEFINE COMMENT, ORDER BY in views, COPY ON_ERROR, DT TARGET_LAG |
| Dependencies | 2 | Cross-database, cross-schema |
| Style | 4 | Keywords UPPER, implicit alias, JOIN without ON, ELSE NULL |
Regex fix applied: Rules 7 (TIMESTAMP), 20 (FLOAT), 21 (VARCHAR) use ^(?!\s*--).* lookahead instead of the broken (?<!--.*) lookbehind which silently failed in Java's regex engine.
| Rule | Description |
|---|---|
| C002 | SELECT * used |
| C003 | INSERT without column list |
| C009 | Non-sargable statement |
| C012 | NULL comparison with = |
| C017 | ORDER BY without ASC/DESC |
| C022 | Non-materialised view |
| C023 | Cartesian join |
| Rule | Description | Severity |
|---|---|---|
| LT01 | Unnecessary whitespace | INFO |
| LT02 | Indentation | INFO |
| LT06 | Function name spacing | INFO |
| LT08 | CTE bracket newline | INFO |
| LT09 | Select targets formatting | INFO |
| LT10 | SELECT modifiers placement | INFO |
| LT12 | EOF newline | INFO |
| LT14 | Inconsistent line endings | INFO |
| CP02 | Identifier casing | MINOR |
| CP04 | Boolean casing | MINOR |
| AL01 | Missing AS keyword (implicit alias) | MINOR |
| AL02 | Implicit column alias | MINOR |
| AL08 | Column alias in GROUP BY | MINOR |
| AM03 | Ambiguous ORDER BY | MINOR |
| AM04 | SELECT * unknown columns | MINOR |
| AM05 | JOIN without ON clause | MAJOR |
| AM09 | LIMIT without ORDER BY | MINOR |
| RF02 | Unnecessary qualified references | MINOR |
| RF03 | Single CASE to IF | MINOR |
| RF04 | Keywords as identifiers | MAJOR |
| ST06 | Unnecessary ELSE NULL | MINOR |
| ST07 | USING vs ON in joins | MINOR |
| ST09 | Nested CASE | MINOR |
- PRS — parse errors on DCM
DEFINEsyntax (SonarQube text plugin handles these files) - CP01 — keywords UPPER (handled by
txt:Keywords_must_be_UPPER)
The dependencies/ddl.sql file is auto-generated by Snowflake's GET_DDL() which outputs lowercase keywords and tab indentation. The ddl_uppercase_keywords.py filter normalizes the output:
- Uppercases all unquoted identifiers and SQL keywords
- Converts tabs to 4-space indentation
- Adds space before
(in object definitions - Expands inline
SELECT ... FROMonto multiple lines - Preserves string literals and comments
Each consumer repo has its own pre_deploy.sql that creates the database, schema, and DCM project. The CICD role and service user must be created once manually:
USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS CICD;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE CICD;
GRANT CREATE ROLE ON ACCOUNT TO ROLE CICD;
GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE CICD;
GRANT MANAGE WAREHOUSES ON ACCOUNT TO ROLE CICD;
GRANT EXECUTE TASK ON ACCOUNT TO ROLE CICD;
GRANT EXECUTE MANAGED TASK ON ACCOUNT TO ROLE CICD;
GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE CICD;
GRANT ROLE CICD TO USER SVC_CICD;ALTER USER IF EXISTS SVC_CICD ADD PROGRAMMATIC ACCESS TOKEN CICD_PAT
ROLE_RESTRICTION = CICD
DAYS_TO_EXPIRY = 365
COMMENT = 'CI/CD pipeline PAT';
-- copy <your token>All configuration lives in one file. start.sh auto-generates SNOW_CONFIG_B64 and the runner auto-generates SONAR_TOKEN at startup.
# GitHub
GH_RUNNER_TOKEN=<...>
GITHUB_OWNER=<your GitHub org/user>
GITHUB_ORG=<your GitHub org for org-level runners>
GH_ORG_TOKEN=<classic PAT with admin:org scope>
# SonarQube (SONAR_TOKEN is auto-generated at runner startup)
POSTGRES_USER=sonar
POSTGRES_PASSWORD=sonar
POSTGRES_DB=sonarqube
SONAR_JDBC_USERNAME=sonar
SONAR_JDBC_PASSWORD=sonar
SONAR_ADMIN_PASS=ThisIsNotSecure1234!
# Snowflake (SNOW_CONFIG_B64 is auto-generated by start.sh)
CONNECTION_NAME=<your-connection-name>
SNOW_ACCOUNT=<your-account>
SNOW_USER=SVC_CICD
SNOW_ROLE=CICD
SNOW_DATABASE=DATAOPS
SNOW_SCHEMA=IOT_RAW_V001
SNOW_WAREHOUSE=MD_TEST_WH
SNOW_PAT=<your PAT from Step 2>Only one secret is needed per org:
./start.sh # generates SNOW_CONFIG_B64 automatically
gh secret set SNOW_CONFIG_B64 --org zbrainiac-labs --visibility allSONAR_TOKEN and SNOW_CONNECTION_NAME secrets are no longer needed -- they are auto-generated at runtime.
- Start your local stack via
./start.sh - Access SonarQube at: http://localhost:9000
Login:admin/ThisIsNotSecure1234!(default 'admin') - Push to any consumer repo — the reusable workflow triggers automatically
- Check results in SonarQube
- Monitor SQL test results (incl. history) at: http://localhost:8080
| Service | Purpose | Port |
|---|---|---|
sonarqube |
Code quality + custom SQL rules | 9000 |
db |
PostgreSQL backend for SonarQube | - |
runner1 |
Org-level self-hosted GitHub runner | - |
runner2 |
Org-level self-hosted GitHub runner | - |
nginx-server |
Serves UnitTestHistory HTML reports | 8080 |
This is not just a demo. It's a reusable framework to scale DataOps -- combining validation, governance, and automation into one consistent, testable workflow.






