Skip to content

Git integration for incremental validation #36

@Mearman

Description

@Mearman

Problem

Running full link validation on large projects with extensive documentation can be time-consuming and resource-intensive. For projects with 100+ markdown files and hundreds of external links, validation can take several minutes.

Real-world Context

In a typical development workflow:

  • Most commits only change a few documentation files
  • Full validation runs check unchanged files repeatedly
  • CI pipelines waste time validating previously verified links
  • Pre-commit hooks become too slow to be practical

Proposed Solution

Add Git integration to enable incremental validation:

  1. Changed files detection

    # Only validate files changed since last commit
    markmv validate --git-diff HEAD~1
    
    # Only validate files in current branch vs main
    markmv validate --git-diff main..HEAD
    
    # Validate staged changes only
    markmv validate --git-staged
  2. Smart dependency tracking

    • Detect when shared files (like common link references) change
    • Validate dependent files that reference changed content
    • Handle moved/renamed files intelligently
  3. Validation caching

    • Store validation results with git commit hashes
    • Skip re-validation of unchanged files with same external links
    • Invalidate cache when external validation rules change
  4. Pre-commit hook integration

    # Fast pre-commit validation
    markmv validate --git-staged --cache --fail-fast

Expected Output

$ markmv validate --git-diff HEAD~1

🔍 Git Integration
Changed since HEAD~1: 3 files
- docs/firebase-setup.md (modified)
- README.md (modified)  
- docs/new-feature.md (added)

📊 Validation Summary
Files processed: 3 (97 cached, 2 unchanged)
Total links found: 23 (140 from cache)
New/changed links: 8
Broken links: 0
Processing time: 847ms (was 29s for full validation)

💾 Cache Status
- Cached results: 97 files, 140 links
- Cache hit rate: 97.1%
- Last full validation: 2 hours ago

Configuration Options

# .markmv.yml
git:
  enabled: true
  cache:
    enabled: true
    location: ".markmv-cache"
    ttl: "24h"  # Force re-check external links after 24h
  hooks:
    pre-commit: true
    fail-fast: true  # Exit on first error for faster feedback
  diff:
    base-branch: "main"
    include-dependencies: true  # Check files that reference changed files

Pre-commit Hook Setup

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: markmv-validate
        name: Validate markdown links
        entry: markmv validate --git-staged --fail-fast --cache
        language: node
        files: '\.md$'
        pass_filenames: false

CI/CD Integration Benefits

# GitHub Actions example
- name: Validate documentation links
  run: |
    if [[ "${{ github.event_name }}" == "pull_request" ]]; then
      # Only validate changed files in PR
      markmv validate --git-diff origin/${{ github.base_ref }}..HEAD
    else
      # Full validation on main branch (with caching)
      markmv validate --cache
    fi

Advanced Features

  1. Dependency tracking

    • When shared-links.md changes, validate all files that reference it
    • Handle link includes and shared reference files
  2. Smart invalidation

    • Invalidate cache when markmv config changes
    • Detect when external link patterns change
  3. Parallel processing

    • Process changed files in parallel with cached results
    • Maintain performance even with complex dependency graphs

Benefits

  • Faster development workflow: Pre-commit hooks run in <1 second instead of 30+ seconds
  • Efficient CI/CD: PR checks only validate relevant changes
  • Better developer experience: Quick feedback without sacrificing thoroughness
  • Resource optimization: Reduce unnecessary external requests
  • Scalability: Handle large documentation projects efficiently

This would make markmv practical for large-scale projects and enable seamless integration into modern development workflows.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions