Skip to content

zbrainiac-labs/DataOpsBackbone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataOps Unchained: Infrastructure that Scales

Docker Build and Push to Docker Hub (Multi-Arch)

A hands-on reference architecture for fully automated SQL code quality pipelines using SonarQube, GitHub Actions, and Snowflake.


Why / What / How

Why?

In large, federated organizations, scaling analytics isn't (just) a tech challenge — it's an operational one.

From a technological and operational perspective, automation, governance and consistency are vital for scaling analytics in large, federated organisations. With agile methodology and modularisation, deployment volume can rise to hundreds or thousands per day, so manual QA simply cannot keep pace. For example, if 15 analytics teams were to deploy changes daily, the number of full-time reviewers required for manual reviews would be prohibitively high, resulting in bottlenecks, missed checks and an increased risk of inconsistent standards and data incidents. DataOpsBackbone addresses these issues by automating every critical step:

Quality Gates leverage SonarQube code scans to enforce SQL and customisable coding rules based on regular expressions, which are applied automatically with every git push request. For example, forgetting to prefix a schema name or hardcoding a database name triggers an automated block and feedback, thereby enforcing standards before code can be shipped.

  • Releases are versioned and deployed via DCM (Database Change Management) with plan/deploy semantics, then validated through SQL tests. This keeps production safe from changes that have not been properly tested.
  • Governance is built in: rules such as 'no grants to PUBLIC' or 'only UTC TIMESTAMP allowed' are continuously enforced, and all compliance-relevant data (such as SQL code scans and test results) is logged for auditing purposes.
  • Teams have full transparency and can be agile and reduce technical debt themselves. Automation enforces rules and monitors testing over time, so centralised approvals no longer become a bottleneck.

This setup offers repeatability, auditability and peace of mind, enabling new teams to get up and running quickly and allowing developers to focus on creating value rather than policing standards. The showcased projects are practical blueprints for achieving reliable, scalable analytics operations with Snowflake and GitHub Actions, not just demos.


What?

A DataOps pipeline that automates:

  • Syncing changes from GitHub
  • SQL linting & validation (SonarQube + regex rules)
  • Declarative schema deployment via Snowflake DCM (Database Change Management)
  • SQL validation testing against deployed objects (CTRF JSON reports)
  • Test trend reporting via UnitTestHistory v3.0
  • Packaging deployable artifacts

Overview of the infrastructure:

overview infrastructure


How?

It combines:

  • GitHub Actions — reusable workflow called by all consumer repos
  • Self-hosted runners (2 org-level runners on zbrainiac-labs)
  • SonarQube extended with SQL & Text plugins
  • Docker Compose for local stack orchestration
  • Snowflake CLI with DCM for declarative deployment (DEFINE syntax + Jinja templating)
  • SQL Validation with CTRF JSON test reports
  • UnitTestHistory for HTML trend dashboards

Reusable Workflow Architecture

DataOpsBackbone provides a single reusable GitHub Actions workflow that all consumer repos call:

# In each consumer repo (.github/workflows/pipeline.yml):
permissions:
  id-token: write
  contents: read

jobs:
  pipeline:
    uses: zbrainiac-labs/DataOpsBackbone/.github/workflows/dataops-pipeline.yml@main
    with:
      SOURCE_DATABASE: <DB_NAME>
      SOURCE_SCHEMA: <SCHEMA_NAME>
      DCM_PROJECT_IDENTIFIER: <DB.SCHEMA.PROJECT>
      DCM_TARGET: DEV
    secrets: inherit

Pipeline Jobs (6 composable jobs):

Job Timeout Purpose
prepare 10 min Validate inputs, OIDC auth, pre-deploy SQL
scan 30 min Extract deps, SQLFluff, SonarQube, Quality Gate
deploy 20 min Clone (optional), DCM deploy, post-deploy, custom scripts
validate 15 min SQL validation tests, CTRF-to-SonarQube conversion
cleanup 10 min Drop clone schema (always runs if clone enabled)
release 15 min Deploy to original, zip, GitHub release, summary

Authentication

The pipeline uses OIDC Workload Identity Federation by default (secretless). GitHub issues a short-lived token per job that Snowflake validates directly. No PAT or stored secrets needed.

Fallback to SNOW_CONFIG_B64 (PAT-based) is available via USE_OIDC: false.

Pipeline Hardening (built-in):

  • OIDC auth — secretless, short-lived tokens per job (no stored PAT)
  • Multi-job architecture — parallelization, resumability, clear failure boundaries
  • Environment protectionenvironment: on deploy/release for approval gates
  • Concurrency control — prevents parallel deploys to the same schema
  • Input validation — regex-validated database/schema/project identifiers
  • Shell strict modeset -Eeuo pipefail catches silent failures
  • Retry logic — 3 attempts with exponential backoff on Snowflake operations
  • Timeouts — per-job timeouts prevent runaway execution
  • Pinned actions — all GitHub Actions pinned to commit SHAs
  • Identifier quoting — Snowflake IDENTIFIER() for defense-in-depth
  • DRY_RUN mode — scan without deploying (for PR validation)

Enabling Quality Gate Enforcement

To block deployments on SonarQube quality gate failure, set QUALITY_GATE_ENFORCED: true in your consumer repo's workflow caller:

jobs:
  pipeline:
    uses: zbrainiac-labs/DataOpsBackbone/.github/workflows/dataops-pipeline.yml@main
    with:
      SOURCE_DATABASE: MY_DB
      SOURCE_SCHEMA: MY_SCHEMA
      DCM_PROJECT_IDENTIFIER: MY_DB.MY_SCHEMA.MY_PROJECT
      QUALITY_GATE_ENFORCED: true
    secrets: inherit

When disabled (default), quality gate failures are reported but do not block the pipeline.

Consumer Repos:

Repo Database DCM Schema Data Schemas Clone per Build
mother-of-all-Projects OPS_DEV OPS_DCM OPS_RAW_v001
project-one ONE_DEV ONE_DCM ONE_RAW_v001
MasterDataManagement MDM_DEV MDM_DCM MDM_RAW_v001, MDM_AGG_v001, MDM_SRV_v001
crm_dcm_project CRM_DEV CRM_DCM CRM_RAW_v001, CRM_CUR_v001
SyntheticRetailBank AAA_DEV_SYNTHETIC_BANK AAA_DCM CRM_RAW_v001, PAY_RAW_v001, ...
sharing_any_objects ECO_DEV ECO_DCM ECO_RAW_v001
crew-asset-management SAM_DEMO SAM_DCM SAM_RAW_v001

Project Structure

DataOpsBackbone/
├── .github/workflows/
│   ├── dataops-pipeline.yml    # Reusable pipeline (called by all repos)
│   └── docker-publish.yml      # Docker image CI
├── github-runner/
│   ├── Dockerfile              # Self-hosted runner image (incl. SQLFluff)
│   ├── entrypoint.sh           # Runner registration (org/repo scope)
│   ├── sonar-rules-setup.sh    # Auto-create SonarQube quality profile (40 txt: rules)
│   ├── sonar-token-init.sh     # Auto-generate SONAR_TOKEN per runner
│   ├── sonar-scanner_v2.sh     # Run sonar-scanner + import SQLFluff issues
│   ├── sqlfluff-to-sonar.sh    # Run SQLFluff → SonarQube Generic Issue format
│   ├── sqlfluff_to_sonar.py    # JSON converter (SQLFluff → SonarQube)
│   ├── sqlfluff_sonar.cfg      # SQLFluff config (non-overlapping rules only)
│   ├── ddl_uppercase_keywords.py # Normalize GET_DDL() output
│   ├── sql_validation_v4.sh    # SQL tests → CTRF JSON
│   ├── convert_junit_to_ctrf.py # Legacy XML→JSON migration
│   ├── snowflake-deploy-dcm_v1.sh
│   ├── snowflake-extract-dependencies_v1.sh
│   ├── render-sql_v1.sh        # Jinja-style template rendering
│   ├── unitth.jar              # UnitTestHistory v3.0
│   └── tests.sqltest           # Sample test file
├── sqlfluff/                   # Standalone SQLFluff linter (alternative to SonarQube)
│   ├── .sqlfluff               # SQLFluff config (all rules)
│   ├── lint.py                 # Combined runner: SQLFluff + custom regex rules
│   ├── plugins/dataops_rules/  # 28 custom regex rules (DO01–DO28)
│   └── test_sql/               # Good/bad SQL examples for testing
├── sonarqube/Dockerfile        # Custom SonarQube image
├── nginx/default.conf          # Nginx for test report serving
├── backup/                     # SonarQube quality profile backups
├── images/                     # Documentation images
├── docker-compose.yml          # Full stack (SonarQube + 2 runners + nginx)
└── start.sh                    # One-command startup

Architecture Overview - Data objects

The showcase is built around two distinct data domains, each represented as an individual database within the same Snowflake account. This approach allows for logical isolation and independent management of domain-specific data assets.

Within each domain (database), schemas are strategically utilized to achieve two key objectives:

  • Maturity Levels: Schemas separate data objects based on their maturity level (e.g., raw, curated, conformed). This provides a clear path for data as it progresses through various transformation stages.
  • Versioning: Schemas also incorporate versioning for underlying database objects like tables, views, stages and procedures. This ensures traceability, facilitates rollbacks, and supports agile development by allowing iterative changes without disrupting existing consumers.

Why This Approach?

  1. Improved Organization: Data assets are logically grouped by business domain, making them easier to discover and manage.
  2. Enhanced Data Governance: Clear maturity levels and versioning promote better control over data quality and evolution.
  3. Scalability & Maintainability: The modular design reduces interdependencies, simplifying development, testing, and maintenance.
  4. Demonstrates Best Practices: Provides a practical example of implementing a domain-driven data architecture in Snowflake.

DataOps_SF_object_structure.png

Drive modularization for better Resilience

We not only use static source code analysis to review new code coming into the environment, but also check the existing setup and enforce isolation more effectively. By isolating domains and versions, the impact of changes or failures in one area on others is minimised, thereby enhancing overall system stability and aiding regression testing.

DataOps_SF_dep_rules.png

Naming Convention

All Snowflake object names use UPPERCASE with underscore separators.

Database: {DOMAIN}_{ENV}

Position Field Values
1-3 Domain 3-char business domain (IOT, CLR, PAY, CRM, REF)
4-7 Environment _DEV, _TE1, _PER, _PRD

Examples: CLR_DEV, PAY_PRD, IOT_TE1

Schema: {DOMAIN}_{MATURITY}_v{NNN} or {DOMAIN}_DCM

Position Field Values
1-3 Domain Same 3-char domain code
4-8 Maturity _RAW_, _CUR_, _AGG_, _GOL_
9-12 Version v001 -- v999

DCM schemas use {DOMAIN}_DCM (unversioned) — one per database, holds the DCM project definition.

Examples: CRM_RAW_v001, IOT_AGG_v012, REF_CUR_v003, OPS_DCM, CRM_DCM

Database Objects (tables, views, stages, tasks, etc.): {DOMAIN}{COMP}_{MATURITY}_{TYPE}_{TEXT}

Position Field Description Values
1-3 Domain 3-char business domain IOT, CLR, PAY, CRM, REF
4 Component Sub-component (GitHub repo) Single char: I, A, T, P, etc.
5-8 Maturity Data maturity level _RAW, _CUR, _AGG, _GOL
9-12 Object type Snowflake object type _TB_, _VW_, _DT_, _ST_, _FF_, _SP_, _TK_
13+ Free text Business-meaningful name Uppercase, underscores allowed

Examples:

  • ICGI_RAW_TB_SWIFT_MESSAGES -- ICG domain, Ingestion repo, raw table
  • ICGA_AGG_DT_SWIFT_PACS008 -- ICG domain, Aggregation repo, aggregated dynamic table
  • IOTI_RAW_VW_SENSOR_GEOLOC -- IOT domain, Ingestion repo, raw view
  • ICGI_RAW_ST_SWIFT_INBOUND -- ICG domain, Ingestion repo, raw stage
  • ICGI_RAW_FF_XML -- ICG domain, Ingestion repo, raw file format

SQL Linting Rules and Regex Patterns

This list provides a few examples of SQL validation rules, each of which is paired with a regular expression (regex) that can be used to identify non-compliant code using the Community Text plugin of SonarQube.

Backups of these rules, which can be restored as a Quality Profile, are available in the repository (link). Rules are also auto-created at runner startup via sonar-rules-setup.sh.

Safety Rules

1. Disallow CREATE SCHEMA without IF NOT EXISTS or REPLACE

(?i)^\s*CREATE\s+(?!OR\s+REPLACE\b)(?!.*\bIF\s+NOT\s+EXISTS\b).*?\bSCHEMA\b

2. Disallow CREATE TABLE without IF NOT EXISTS or REPLACE

(?is)^(?!\s*--).*CREATE\s+(?!OR\s+REPLACE\b|.*IF\s+NOT\s+EXISTS\b).*TABLE\b

3. Disallow CREATE statements with hardcoded database and/or schema prefix

(?i)^(?!\s*--)\s*create\s+(or\s+replace\s+)?(table|view|schema)\s+(if\s+not\s+exists\s+)?[a-z0-9_]+\.[a-z0-9_]+(\.[a-z0-9_]+)?

4. Disallow GRANT statements to PUBLIC

(?i)^(?!\s*--).*grant\s+.*\s+to\s+public\b

5. Disallow dropping objects without IF EXISTS

(?i)^\s*DROP\s+(SCHEMA|TABLE|VIEW|DYNAMIC\s+TABLE|STAGE|FILE\s+FORMAT|PROCEDURE|FUNCTION|TASK)\s+(?!IF\s+EXISTS\b)

6. Disallow hardcoded USE DATABASE, USE SCHEMA, or USE ROLE statements

(?i)^(?!\s*--)\s*USE\s+(DATABASE|SCHEMA|ROLE)\b

Data Type Rules

7. Disallow TIMESTAMP_NTZ and TIMESTAMP_LTZ (only TIMESTAMP_TZ allowed)

^(?!\s*--).*\bTIMESTAMP_(NTZ|LTZ)(\s*\(\s*\d+\s*\))?\b

Naming Convention Rules

8. Schema names must follow {DOMAIN}_{MATURITY}_ prefix pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?SCHEMA\s+(IF\s+NOT\s+EXISTS\s+)?(?:[a-z0-9_]+\.)?(?!RAW_|CUR_|AGG_|GOL_|REF_)[a-z0-9_]+;

9. Schema names must end with version pattern _vNNN

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?SCHEMA\s+(IF\s+NOT\s+EXISTS\s+)?(?:[a-z0-9_]+\.)?[a-z0-9_]+(?<!_v\d{3});

10. Table names must follow {DOM}{COMP}_{MAT}_{TB}_ pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?(?!DYNAMIC\s)TABLE\s+(IF\s+NOT\s+EXISTS\s+)?(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_TB_)[A-Z_][A-Z0-9_]*

11. View names must follow {DOM}{COMP}_{MAT}_{VW}_ pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?VIEW\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_VW_)[A-Z_][A-Z0-9_]*

12. Dynamic Table names must follow {DOM}{COMP}_{MAT}_{DT}_ pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?DYNAMIC\s+TABLE\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_DT_)[A-Z_][A-Z0-9_]*

13. Stage names must follow {DOM}{COMP}_{MAT}_{ST}_ pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?STAGE\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_ST_)[A-Z_][A-Z0-9_]*

Dependency Rules

14. Disallow Cross-Database Dependencies

^.*cross_db_true.*$

15. Disallow Cross-Schema Dependencies

^.*cross_schema_true.*$

Security & Access Control Rules

16. Disallow GRANT ALL PRIVILEGES

(?i)^(?!\s*--)\s*GRANT\s+ALL\s+(PRIVILEGES\s+)?ON\b

17. Disallow ACCOUNTADMIN usage in SQL scripts

(?i)^(?!\s*--)\s*(USE\s+ROLE|SET\s+ROLE|GRANT\s+.*TO\s+ROLE|GRANT\s+ROLE)\s+.*\bACCOUNTADMIN\b

18. Disallow plaintext passwords in DDL

(?i)^(?!\s*--)\s*.*PASSWORD\s*=\s*'[^']+'

Data Quality & Consistency Rules

19. Disallow SELECT * (force explicit column lists) — DISABLED

(?i)^(?!\s*--)\s*SELECT\s+\*\s+FROM\b

Disabled: Redundant — already covered by SQLFluff AM04 (SELECT * unknown columns) and SQLCC C002 (SELECT * used).

20. Disallow FLOAT/DOUBLE/REAL -- prefer NUMBER(p,s)

^(?!\s*--).*\b(FLOAT|DOUBLE|REAL)\b

21. Disallow VARCHAR without explicit length

^(?!\s*--).*\bVARCHAR\s*[^(]

22. CREATE TABLE must include COMMENT

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?TABLE\s+(?!.*\bCOMMENT\b).*;\s*$

Performance & Best Practice Rules

23. Disallow ORDER BY in view definitions

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?VIEW\b[\s\S]*?\bORDER\s+BY\b

24. Disallow COPY INTO without ON_ERROR clause

(?i)^(?!\s*--)\s*COPY\s+INTO\s+(?!.*\bON_ERROR\b).*;\s*$

25. Dynamic Tables must specify TARGET_LAG

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?DYNAMIC\s+TABLE\s+(?!.*\bTARGET_LAG\b)

Naming Convention Rules (additional object types)

26. File Format names must follow {DOM}{COMP}_{MAT}_FF_ pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?FILE\s+FORMAT\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_FF_)[A-Z_][A-Z0-9_]*

27. Stored Procedure names must follow {DOM}{COMP}_{MAT}_SP_ pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?PROCEDURE\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_SP_)[A-Z_][A-Z0-9_]*

28. Task names must follow {DOM}{COMP}_{MAT}_TK_ pattern

(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?TASK\s+(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}[A-Z]_(RAW|CUR|AGG|GOL)_TK_)[A-Z_][A-Z0-9_]*

Monitoring

Issue overview

Issue overview

Issue within code

Issue within code

Technical debt

Technical debt

Monitor the history of test case execution

unitth_overview.png

SonarQube + SQLFluff — Integrated Scanning

Both tools run in the CI/CD pipeline. SQLFluff issues are imported into SonarQube as external issues via sonar.externalIssuesReportPaths, giving one dashboard for everything.

Pipeline: ... → SQLFluff lint → sonar-scanner (imports sqlfluff_issues.json) → Quality Gate → ...

Scanner Configuration

The scanner uses sonar.language=sql to ensure the SQL Code Checker plugin claims .sql files. The text plugin (txt:) runs alongside via its own sensor. Only .git/** is excluded — all other files are indexed.

Rule Ownership (no duplicates)

Each rule runs in exactly one tool to avoid double-counting:

Tool Responsibility Rules
SonarQube txt: Safety, security, naming, data types, deps, style 40 rules
SonarQube SQLCC: Structural SQL (views, joins, nulls, ORDER BY) 19 rules
SQLFluff (external) Formatting, AST-based style, implicit aliases 23 rules
Total 82 rules

SonarQube Text Plugin (txt:) — 40 regex-based rules

Category Rules Examples
Safety 6 CREATE without IF NOT EXISTS, DROP without IF EXISTS, USE statements, ALTER TABLE DROP COLUMN, TRUNCATE
Security 4 GRANT PUBLIC, ACCOUNTADMIN, GRANT ALL, plaintext passwords
Naming conventions 15 Table/View/DT/Stage/Schema/FF/SP/Task/Stream/Semantic View naming, maturity-level code enforcement
Data types 3 TIMESTAMP_NTZ/LTZ, FLOAT/DOUBLE/REAL, VARCHAR without length
Quality 6 SELECT *, TABLE COMMENT, DEFINE COMMENT, ORDER BY in views, COPY ON_ERROR, DT TARGET_LAG
Dependencies 2 Cross-database, cross-schema
Style 4 Keywords UPPER, implicit alias, JOIN without ON, ELSE NULL

Regex fix applied: Rules 7 (TIMESTAMP), 20 (FLOAT), 21 (VARCHAR) use ^(?!\s*--).* lookahead instead of the broken (?<!--.*) lookbehind which silently failed in Java's regex engine.

SonarQube SQL Code Checker (SQLCC:) — AST-based

Rule Description
C002 SELECT * used
C003 INSERT without column list
C009 Non-sargable statement
C012 NULL comparison with =
C017 ORDER BY without ASC/DESC
C022 Non-materialised view
C023 Cartesian join

SQLFluff (external issues in SonarQube) — 23 AST-based rules, excludes sources/definitions/

Rule Description Severity
LT01 Unnecessary whitespace INFO
LT02 Indentation INFO
LT06 Function name spacing INFO
LT08 CTE bracket newline INFO
LT09 Select targets formatting INFO
LT10 SELECT modifiers placement INFO
LT12 EOF newline INFO
LT14 Inconsistent line endings INFO
CP02 Identifier casing MINOR
CP04 Boolean casing MINOR
AL01 Missing AS keyword (implicit alias) MINOR
AL02 Implicit column alias MINOR
AL08 Column alias in GROUP BY MINOR
AM03 Ambiguous ORDER BY MINOR
AM04 SELECT * unknown columns MINOR
AM05 JOIN without ON clause MAJOR
AM09 LIMIT without ORDER BY MINOR
RF02 Unnecessary qualified references MINOR
RF03 Single CASE to IF MINOR
RF04 Keywords as identifiers MAJOR
ST06 Unnecessary ELSE NULL MINOR
ST07 USING vs ON in joins MINOR
ST09 Nested CASE MINOR

Excluded from SQLFluff (handled by SonarQube or not applicable)

  • PRS — parse errors on DCM DEFINE syntax (SonarQube text plugin handles these files)
  • CP01 — keywords UPPER (handled by txt:Keywords_must_be_UPPER)

DDL Post-Processing

The dependencies/ddl.sql file is auto-generated by Snowflake's GET_DDL() which outputs lowercase keywords and tab indentation. The ddl_uppercase_keywords.py filter normalizes the output:

  • Uppercases all unquoted identifiers and SQL keywords
  • Converts tabs to 4-space indentation
  • Adds space before ( in object definitions
  • Expands inline SELECT ... FROM onto multiple lines
  • Preserves string literals and comments

Quick Setup Guide

Step 1: Create CICD role and service user

Each consumer repo has its own pre_deploy.sql that creates the database, schema, and DCM project. The CICD role and service user must be created once manually:

USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS CICD;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE CICD;
GRANT CREATE ROLE ON ACCOUNT TO ROLE CICD;
GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE CICD;
GRANT MANAGE WAREHOUSES ON ACCOUNT TO ROLE CICD;
GRANT EXECUTE TASK ON ACCOUNT TO ROLE CICD;
GRANT EXECUTE MANAGED TASK ON ACCOUNT TO ROLE CICD;
GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE CICD;
GRANT ROLE CICD TO USER SVC_CICD;

Step 2: Generate a PAT for the service user

ALTER USER IF EXISTS SVC_CICD ADD PROGRAMMATIC ACCESS TOKEN CICD_PAT
  ROLE_RESTRICTION = CICD
  DAYS_TO_EXPIRY = 365
  COMMENT = 'CI/CD pipeline PAT';
-- copy <your token>

Step 3: Configure .env (single source of truth)

All configuration lives in one file. start.sh auto-generates SNOW_CONFIG_B64 and the runner auto-generates SONAR_TOKEN at startup.

# GitHub
GH_RUNNER_TOKEN=<...>
GITHUB_OWNER=<your GitHub org/user>
GITHUB_ORG=<your GitHub org for org-level runners>
GH_ORG_TOKEN=<classic PAT with admin:org scope>

# SonarQube (SONAR_TOKEN is auto-generated at runner startup)
POSTGRES_USER=sonar
POSTGRES_PASSWORD=sonar
POSTGRES_DB=sonarqube
SONAR_JDBC_USERNAME=sonar
SONAR_JDBC_PASSWORD=sonar
SONAR_ADMIN_PASS=ThisIsNotSecure1234!

# Snowflake (SNOW_CONFIG_B64 is auto-generated by start.sh)
CONNECTION_NAME=<your-connection-name>
SNOW_ACCOUNT=<your-account>
SNOW_USER=SVC_CICD
SNOW_ROLE=CICD
SNOW_DATABASE=DATAOPS
SNOW_SCHEMA=IOT_RAW_V001
SNOW_WAREHOUSE=MD_TEST_WH
SNOW_PAT=<your PAT from Step 2>

Step 4: Upload GitHub Secret

Only one secret is needed per org:

./start.sh  # generates SNOW_CONFIG_B64 automatically
gh secret set SNOW_CONFIG_B64 --org zbrainiac-labs --visibility all

SONAR_TOKEN and SNOW_CONNECTION_NAME secrets are no longer needed -- they are auto-generated at runtime.


Step 5: Run It

  1. Start your local stack via ./start.sh
  2. Access SonarQube at: http://localhost:9000
    Login: admin / ThisIsNotSecure1234! (default 'admin')
  3. Push to any consumer repo — the reusable workflow triggers automatically
  4. Check results in SonarQube
  5. Monitor SQL test results (incl. history) at: http://localhost:8080

Docker Compose Services

Service Purpose Port
sonarqube Code quality + custom SQL rules 9000
db PostgreSQL backend for SonarQube -
runner1 Org-level self-hosted GitHub runner -
runner2 Org-level self-hosted GitHub runner -
nginx-server Serves UnitTestHistory HTML reports 8080

Final Thoughts

This is not just a demo. It's a reusable framework to scale DataOps -- combining validation, governance, and automation into one consistent, testable workflow.