Skip to content

Add RAT for header checking#149

Closed
epugh wants to merge 1 commit into
apache:mainfrom
epugh:add_rat_validation
Closed

Add RAT for header checking#149
epugh wants to merge 1 commit into
apache:mainfrom
epugh:add_rat_validation

Conversation

@epugh

@epugh epugh commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Fixes #141

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Apache RAT-based license header enforcement to address Issue #141 and updates key resource/config files to include the ASF license header so they pass header checks.

Changes:

  • Added ASF license headers to Spring Boot .properties resources and gradle/libs.versions.toml.
  • Added the Apache RAT Gradle plugin (via version catalog) and configured the rat task with excludes (including a .gitignore-derived exclude list).
  • Documented RAT report location and intended integration with the verification lifecycle.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/main/resources/application.properties Adds ASF license header to application properties.
src/main/resources/application-stdio.properties Adds ASF license header to stdio profile properties.
src/main/resources/application-http.properties Adds ASF license header to http profile properties.
gradle/libs.versions.toml Adds ASF license header and introduces the RAT plugin version + alias.
build.gradle.kts Applies RAT plugin and configures tasks.rat exclusions (explicit + .gitignore-derived).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@adityamparikh

Copy link
Copy Markdown
Contributor

Alternative approach for comparison: #150 implements this same RAT header enforcement as a buildSrc convention plugin (stacked on #138's buildSrc infra) instead of inline tasks.rat { } config. It also pulls the .gitignore→glob translation into a unit-tested RatExcludes helper, which caught an interior-slash anchoring bug in the inline version (src/generated was matched at any depth rather than root-anchored).

Whichever direction is preferred — inline (here) or buildSrc (#150) — happy to converge on one.

epugh added a commit that referenced this pull request Jun 15, 2026
…plugin (stacked on #138) (#150)

* docs: add Apache LICENSE and NOTICE files

Add the top-level Apache License 2.0 text and NOTICE file required by
ASF release policy, and bundle them into the META-INF directory of every
JAR produced by the build (main, bootJar, sources, javadoc).

See https://www.apache.org/legal/release-policy.html#licensing-documentation

* docs(spec): add SBOM generation design

Captures decisions made during brainstorming: CycloneDX over SPDX,
embed-in-bootJar via Spring Boot's native CycloneDX integration, full
build + Docker + Release coverage, no cosign attestation in this PR.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(plan): add SBOM generation implementation plan

Step-by-step bite-sized tasks covering: version catalog, Gradle plugin
wiring, actuator endpoint enablement, focused HTTP integration test,
CI workflow uploads, README + CLAUDE.md docs, final verification.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* chore(deps): add CycloneDX Gradle plugin 1.10.0 to version catalog

Plugin will be applied in the next commit. Adding the catalog entry
first keeps build.gradle.kts changes reviewable in isolation.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* feat(build): generate and embed CycloneDX SBOM

Apply org.cyclonedx.bom Gradle plugin 2.4.1. Spring Boot 3.5's
CycloneDxPluginAction auto-wires bootJar to embed the generated SBOM at
META-INF/sbom/application.cdx.json, so every distribution (JAR, Jib JVM
image, both Paketo native images) ships the embedded SBOM via bootJar
packaging — no per-image wiring.

Plugin version note: 1.10.0 breaks against Gradle 9.4 with
UnsupportedOperationException (ImmutableCollection.removeAll). 2.4.1 is
the latest v1.x-compatible class layout (CycloneDxPlugin /
CycloneDxTask) that Spring Boot's auto-integration recognizes; v3.x
renamed the classes (CyclonedxPlugin) and is incompatible until Spring
Boot adopts the new shape.

projectType is set explicitly to Component.Type.APPLICATION because
v2.4.1 changed the property from Property<String> to
Property<Component.Type>; Spring Boot's `.convention("application")`
would store a raw String and break the task at execution time.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* feat(actuator): enable /actuator/sbom endpoint explicitly

`sbom` was already in management.endpoints.web.exposure.include; this
makes the endpoint enablement explicit so the file conveys intent
without relying on Spring Boot defaults.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(spec): drop integration-test scope, document plugin-version decisions

- Drop the planned SbomEndpointIntegrationTest: /actuator/sbom is stock
  Spring Boot functionality; our only project-specific addition is two
  property lines. The build itself fails if cyclonedxBom breaks
  (Spring Boot's bootJar auto-depends on it).
- Update plugin version note to 2.4.1 and explain why both 1.10.0 (Gradle
  9.4 bug) and 3.x (Spring Boot class-name change) are unsuitable.
- CycloneDX schema 1.6 (plugin default) replaces the originally-noted 1.5.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(spec): drop stale 1.10.0 version reference

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(spec): inline the plugin-version constraints explanation

Earlier edit lost the detail by accident. Restored as part of the Tool
choice section so the spec stands on its own.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* ci: upload CycloneDX SBOM as workflow artifact

Mirrors the existing JAR/test-results/coverage upload pattern. Retains
the SBOM for 30 days (vs the standard 7) since supply-chain
investigations often happen well after a build.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* ci(release): strict SBOM generation + upload + release attachment

The existing Generate SBOM step swallowed errors with `|| echo "..."`,
masking failures now that the plugin is wired. Removes the fallback,
uploads the SBOM as a 90-day workflow artifact, and attaches it to the
v<version> GitHub Release when one exists (graceful fallback otherwise
since the source release of record lives at dist.apache.org, not GitHub).

RELEASE_VERSION is already validated by validate-release; routing it
through an env var instead of inline ${{ }} interpolation is
defence-in-depth against actions-injection.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(readme): document SBOM location, retrieval, and scanning

New 'Supply chain & SBOM' section covers all four distribution
channels (embedded in JAR/image, /actuator/sbom endpoint, GitHub
Release asset, CI workflow artifact) and shows trivy/grype usage.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* refactor(build): drop unnecessary cyclonedxBom configuration

Spring Boot 3.5.14's CycloneDxPluginAction already sets outputName,
outputFormat, projectType, and wires bootJar embedding — matching what
Spring Initializr generates for the same dependency set. Verified that
applying the plugin alone produces a valid CycloneDX 1.6 SBOM at
META-INF/sbom/application.cdx.json inside the bootJar with
component type=application.

The earlier projectType override + includeConfigs/skipConfigs were
defensive but unnecessary; let the framework defaults work.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(agents): note SBOM generation in commands + architecture

CLAUDE.md symlinks to AGENTS.md; edit lands on the real file.

Records the cyclonedxBom command and how the SBOM flows through
bootJar → actuator → Docker images, so future agents have the
mental model when working on related code.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* style: apply spotless

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* feat(build): derive binary-release LICENSE/NOTICE from the SBOM

The base LICENSE/NOTICE are correct for the source release, but the binary
release (the Spring Boot fat bootJar) bundles third-party bytecode and so per
https://infra.apache.org/licensing-howto.html must additionally enumerate each
bundled dependency's license and lift bundled ASF dependencies' NOTICE snippets.

Stacks on the CycloneDX SBOM work and reuses it as the source of dependency
license data:

- generateBinaryLicense: base Apache-2.0 + an appendix listing every
  productionRuntimeClasspath dependency with a link to its license, read from the
  bundled SBOM (META-INF/sbom/application.cdx.json). The SBOM resolves a license
  for every component, including Gradle-module-metadata-only ASF artifacts
  (solr-solrj/solr-api) that POM-only scanners miss, so no per-dependency list is
  hand-maintained. It also gates the build: a bundled module missing from the SBOM,
  or carrying a license not in config/license-policy.json, fails the build.
- generateBinaryNotice: base NOTICE + the META-INF/NOTICE files lifted verbatim and
  de-duplicated from the bundled jars (the Shade ApacheNoticeResourceTransformer
  approach), so ASF dependency notices stay current automatically.

config/license-policy.json holds the allowedLicenses set plus overrides
(group:name -> SPDX id) correcting the few components CycloneDX mislabels
(mcp-server-security -> Apache-2.0; ANTLR ST4/antlr-runtime -> BSD-3-Clause).
Source-form jars keep the base LICENSE/NOTICE.

Verified: ./gradlew build green; fat jar META-INF/LICENSE lists 158 deps
(incl. SolrJ) and META-INF/NOTICE aggregates 21 upstream notices.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* refactor(build): extract LICENSE/NOTICE generation to a buildSrc plugin

Move the inline LICENSE/NOTICE logic out of the root build.gradle.kts into a
buildSrc convention plugin (org.apache.solr.mcp.license-notice) backed by two
typed tasks:

- GenerateBinaryLicense / GenerateBinaryNotice are proper DefaultTask types with
  @InputFile/@InputFiles/@OutputFile, so they're incremental and (being real .kt
  files) free of the kts-script-compiler limitations that forced the previous
  Pair-based workarounds — the logic now reads as plain Kotlin with data classes.
- The root build.gradle.kts drops ~250 lines and three imports, and just applies
  `id("org.apache.solr.mcp.license-notice")`.

Behaviour is unchanged: the bootJar still bundles a LICENSE with the SBOM-derived
158-dependency appendix (incl. SolrJ) and a NOTICE aggregating 21 upstream
notices; source-form jars keep the base files; `check` still runs the gate.
The tasks now live in buildSrc, so they can be unit-tested with Gradle TestKit.

Verified: ./gradlew build green; fat-jar META-INF/LICENSE and NOTICE identical
to the pre-refactor output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* test(build): unit-test the LICENSE/NOTICE buildSrc tasks

Add ProjectBuilder-based tests for the two convention-plugin tasks (now possible
since they live in buildSrc as typed tasks). Covers the correctness-critical
behaviour without needing the full spring-boot + cyclonedx stack:

- generateBinaryLicense: appendix lists bundled deps with SPDX links, applies a
  policy override to correct a mislabelled SBOM license, and preserves the base
  LICENSE text; the gate fails on a disallowed license and on a bundled coordinate
  absent from the SBOM.
- generateBinaryNotice: aggregates bundled META-INF/NOTICE files verbatim,
  de-duplicates identical notices, attributes each to its module, and emits just
  the project NOTICE when no dependency notices exist.

buildSrc's test task runs as part of `./gradlew build`, so these are enforced on
every build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(build): explain the LICENSE/NOTICE tasks in comments and AGENTS.md

Add step-by-step comments to GenerateBinaryLicense/GenerateBinaryNotice walking
through what each phase does (load policy, index the SBOM, resolve+gate each
shipped dependency, write the file; and notice matching/de-dup/attribution).

Expand the AGENTS.md "Release LICENSE / NOTICE" section with where the tasks are
unit-tested and a short runbook for what to do when the license gate fails
(add an override for an SBOM mislabel, or allow a genuinely new license) instead
of silencing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* refactor(build): drop license-policy.json; disclose SBOM licenses verbatim

apache/solr has no license allow-list (it uses a per-dependency licenses/ folder,
which JanHoy said not to replicate), and the binary LICENSE is a disclosure, not a
license policy. Remove config/license-policy.json and the allow-list gate +
override corrections it powered.

generateBinaryLicense now lists each shipped dependency with the license the
CycloneDX SBOM reports, verbatim — so a few imprecise-but-permissive upstream
labels appear as-is (mcp-server-security: Apache-1.0; ANTLR: BSD-4-Clause / BSD
licence). The appendix preamble says licenses are as-reported and links each one.

The remaining gate is completeness only: fail if a bundled dependency is absent
from the SBOM, so nothing is silently omitted from the LICENSE. Tests updated to
assert verbatim SBOM labels and SBOM name/URL handling.

Verified: ./gradlew build green; fat-jar LICENSE still lists 158 deps and NOTICE
aggregates 21 upstream notices.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(build): point the LICENSE appendix to the bundled SBOM

Add a line to the appendix preamble noting the machine-readable bill of
materials (component versions, hashes, licenses) is bundled at
META-INF/sbom/application.cdx.json — the inline appendix stays the
human-readable disclosure, with the SBOM offered for tooling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs: document where/when the binary LICENSE & NOTICE are available

Add a 'where / when they appear' note to the Release LICENSE / NOTICE section:
both binary files are regenerated on every build (tasks run ahead of bootJar and
in check), land at META-INF/LICENSE and META-INF/NOTICE in the fat jar and thus
in every published Docker image, and are also written to build/generated/license/
for local viewing; source-form jars carry the repo-root base files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(build): explain the buildSrc LICENSE/NOTICE plugin for non-Gradle readers

Reviewers who don't work with Gradle had no easy way into buildSrc. Add:

- buildSrc/README.md: what buildSrc is, a short glossary of the Gradle concepts
  the code uses (Task, @TaskAction, the input/output annotations, Property/Provider
  types, convention plugin, productionRuntimeClasspath), and the end-to-end flow.
- KDoc on GenerateBinaryLicense / GenerateBinaryNotice: a "for readers new to
  Gradle" orientation on each class plus a note on every annotated property
  explaining what the input/output annotation does (up-to-date checking, ordering).
- A note on the convention plugin header explaining precompiled script plugins,
  and a comment on buildSrc/build.gradle.kts explaining what it builds.

Documentation only; no behaviour change. ./gradlew :buildSrc:test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* docs(build): comment the convention plugin body for non-Gradle readers

Add plain-language inline comments through the plugin body explaining the parts
that are opaque without Gradle background: what a 'configuration' is and why
productionRuntimeClasspath equals 'what ships', how the lazy provider chains
(flatMap/map over resolvedArtifacts) derive the coordinate list and the
jar-name->coordinate map, what tasks.register/.set wiring does, and how metaInf
from(...) plus dependsOn bundle the generated files into the bootJar while the
source-form jars keep the base files. Comments only; code unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

* Dont double load the root files

* feat(build): enforce Apache license headers via RAT convention plugin

Add Apache RAT (Release Audit Tool) header enforcement as an `org.apache.solr.mcp.rat` buildSrc convention plugin, stacked on the license-notice plugin from #138. RAT is wired into `check`, so `./gradlew build` audits that every scanned file carries an ASF header (report at build/reports/rat/index.html).

The .gitignore-to-RAT-glob translation lives in a pure, unit-tested `RatExcludes` helper rather than inline in build.gradle.kts. Moving it to buildSrc fixes two gitignore-semantics gaps from the inline approach: interior-slash patterns (e.g. src/generated) are now root-anchored instead of matched at any depth, and the negation/anchoring rules are documented and tested.

Local developer-tooling dirs (.claude worktrees, .kotlin caches) are excluded so contributors don't hit spurious audit failures. ASF headers are added to the three application*.properties and libs.versions.toml so they pass the audit.

Supersedes the inline approach in #149. Stacked on #138. Fixes #141.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

---------

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Eric Pugh <epugh@opensourceconnections.com>
@epugh

epugh commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

close in favour of #150

@epugh epugh closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Apache RAT for license scanning

3 participants