Skip to content

Package codeanalyzer-java as a PyPI distribution (with pypi-release.yml) #146

@rahlk

Description

@rahlk

Is your feature request related to a problem? Please describe.

codeanalyzer cannot currently be consumed as a self-contained Python package. Users have to clone the repo, install GraalVM/JDK, and build the native binary themselves. The goal of this issue is to publish codeanalyzer-java as a PyPI distribution that bundles a prebuilt, JVM-free native binary, installable with a plain pip install.

The main blocker is that the GraalVM native-image reflection config (the prerequisite for producing a working native binary) is stale and incomplete:

  • reflect-config.json still targets the old com.ibm.northstar.* package: 7 stale entries, 0 for the current com.ibm.cldk.* package, and 0 for JavaParser (com.github.javaparser:javaparser-symbol-solver-core:3.26.3 / javaparser-core:3.26.3, build.gradle:124-125).
  • proxy-config.json is empty ([]), jni-config.json has a single entry, and resource-config.json lists only one ICU resource — none cover JavaParser.

JavaParser leans heavily on reflection (symbol solver, AST node instantiation, generated metamodel, JSON (de)serialization), so without proper registration the native binary throws reflection exceptions at runtime even though the same code runs fine under java -jar.

Describe the solution you'd like

Ship codeanalyzer-java on PyPI as a wheel that carries a prebuilt native binary, with an automated release pipeline. Work breaks into three parts:

1. (Prerequisite) Restore/regenerate a JavaParser-aware GraalVM native-image config

Regenerate config with the GraalVM tracing agent across multiple sample apps and analysis levels (merge results):

./gradlew fatJar
java -agentlib:native-image-agent=config-merge-dir=src/main/resources/META-INF/native-image-config \
  -jar build/libs/codeanalyzer-<version>.jar \
  -i src/test/resources/sample.applications/daytrader8/source -a 2 -v

Then drop stale com.ibm.northstar.* entries and register the current com.ibm.cldk.* entity classes plus JavaParser's symbol-solver/metamodel classes and any required resources/proxies.

  • No com.ibm.northstar.* references remain; com.ibm.cldk.* and JavaParser are covered.
  • ./gradlew nativeCompile produces a binary that runs the full sample-app suite with no reflection/JNI/proxy/resource errors.
  • Native binary output matches java -jar for the integration test inputs.

2. Python packaging

  • Add a Python package layout (e.g. pyproject.toml with a chosen build backend) that wraps the native binary.
  • Resolve and invoke the bundled binary at runtime (locate it inside the installed package; expose a console entry point).
  • Build per-platform wheels: manylinux (x86_64 + aarch64), macOS (arm64 + x86_64), and Windows (x86_64).
  • Keep the wheel version in lockstep with the codeanalyzer binary version (gradle.properties).

2a. Default platform-native install (pip install codeanalyzer just works)

The point of this is that a plain pip install on a supported platform automatically pulls the correct native binary with no flags. To guarantee that, not just enable it:

  • Impure, platform-tagged wheels. Each wheel must be marked non-purelib so its platform tag is concrete (…manylinux…, …macosx_…_arm64, win_amd64) — never a universal py3-none-any wheel that would ship one platform's binary everywhere.
  • py3-none-<platform> tags. The binary doesn't link CPython, so wheels should target py3 / abi none (installs on any Python 3.x), not a CPython-pinned ABI — avoids needing a wheel per Python minor.
  • manylinux compliance via auditwheel. Linux wheels are auditwheel repair-ed to a manylinux policy so they install across distros, not just the build image.
  • musllinux coverage (Alpine): build musllinux_1_2 wheels for x86_64 (+ aarch64 if feasible), or explicitly document Alpine as unsupported.
  • sdist behavior is unambiguous. Either ship no sdist, or ship an sdist whose build deliberately fails with a clear "no prebuilt codeanalyzer binary for this platform/arch" message — so pip never silently falls back to attempting a from-source GraalVM build on unsupported platforms.
  • Verify auto-selection. On each supported platform, a clean pip install codeanalyzer (no --platform/--only-binary flags) resolves to the matching wheel and codeanalyzer --help runs the bundled native binary.

3. Automated release pipeline — pypi-release.yml (cargo-dist-style matrix)

Add a GitHub Actions workflow at .github/workflows/pypi-release.yml that builds native binaries for all supported architectures via a build matrix, modeled on how cargo-dist ships Rust binaries (full cross-platform/cross-arch matrix, one artifact per target, checksums, and a single coordinated release). It should:

  • Trigger on a tagged release (and support manual workflow_dispatch).
  • Run a matrix build across native runners so each target compiles on its own OS/arch (no fragile cross-compiling where avoidable), covering at minimum:
    • x86_64-unknown-linux-gnu (manylinux) and aarch64-unknown-linux-gnu
    • x86_64-apple-darwin and aarch64-apple-darwin
    • x86_64-pc-windows-msvc
  • On each matrix leg: set up GraalVM, run ./gradlew nativeCompile, and produce the platform-specific codeanalyzer binary.
  • Package each binary into the correctly-tagged wheel (platform/arch wheel tags) and also build the sdist.
  • Emit per-artifact SHA256 checksums (cargo-dist-style) and upload all binaries + wheels + checksums to the GitHub Release for the tag.
  • Run a smoke test on each wheel (install it, run codeanalyzer against a sample app) so a broken native config can't ship.
  • Fan-in: a final job that gathers all matrix artifacts and publishes the full set to PyPI in one step (prefer Trusted Publishing / OIDC over a long-lived PYPI_API_TOKEN); optionally publish to TestPyPI first.

Describe alternatives you've considered

  • Ship the fat JAR and require a JVM — defeats the purpose of a self-contained PyPI install; end users would need GraalVM/JDK.
  • Hand-maintain reflect-config.json — brittle and already the source of this drift; the tracing-agent regeneration is far more robust.
  • Switch off JavaParser's reflective paths — not feasible without forking the dependency.
  • Manual/local publishing instead of pypi-release.yml — error-prone and unreproducible; an automated, tested workflow is required so releases are repeatable and can't ship a broken binary.

Additional context

  • Relevant files: build.gradle:188-208 (graalvmNative), build.gradle:124-125 (JavaParser deps), src/main/resources/META-INF/native-image-config/*, gradle.properties (version), README §3 + FAQ, existing .github/workflows/release.yml (reference for the new pypi-release.yml).
  • Historical reference config: commit f09e014 ("Update codeanalyzer to use graalvm") has the original reflect-config.json.

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions