Skip to content

fix(base): make ChimeraX download fail loudly and retry#6

Open
Abdelsalam-Abbas wants to merge 1 commit into
asterafrom
fix/chimerax-download-robust
Open

fix(base): make ChimeraX download fail loudly and retry#6
Abdelsalam-Abbas wants to merge 1 commit into
asterafrom
fix/chimerax-download-robust

Conversation

@Abdelsalam-Abbas

Copy link
Copy Markdown

Problem

CI run 28394997476 failed in Build base at Dockerfile.base:111 (the ChimeraX install) with a bare exit code 1 and no diagnostics:

#14 2.934 Reading package lists...     <- apt-get update finished
#14 ERROR: ... exit code: 1            <- immediately, nothing in between

Root cause

The step was a silent pipeline:

curl -s ... | grep -oP 'url=\K[^"]*' > /tmp/cx_redirect

A pipeline's exit status is its last command's — grep, which exits 1 when it matches nothing. So an empty/unexpected response from the first curl (most likely blocked or filtered egress from the self-hosted Harbor builder to www.cgl.ucsf.edu) makes grep match nothing → the && chain aborts with exit 1. Because curl -s is silent and grep's output is redirected to a file, the log shows zero clues, and curl's own exit code is masked by the pipe.

The download logic itself is fine — running the same flow from a normal host returns a valid 449 MB .deb.

Fix

Run the token dance under bash -euo pipefail with curl -fsS:

  • network/HTTP errors now surface instead of being swallowed;
  • transient failures retry (--retry 5 --retry-all-errors);
  • if no download URL is extracted, the raw response is printed before exiting (distinguishes a network block from a page-format change on the next run);
  • apt-get install is gated behind dpkg-deb --info, so a non-.deb payload is reported rather than silently breaking the install.

Note

If the underlying cause is the runner's egress to www.cgl.ucsf.edu, this build will still fail — but the next run's log will now say exactly why (e.g. Could not resolve host / Connection timed out), which can then be taken to whoever manages the runner's network policy.

Validation

  • bash -n syntax check of the embedded script passes under set -euo pipefail.
  • Token request + redirect extraction verified against the live endpoint with the new quoting.
  • Empty-response path confirmed to trigger the loud-failure guard (exits non-zero with diagnostics).

The ChimeraX step used a silent `curl -s | grep` pipeline whose exit
status is grep's: an empty or unexpected response (e.g. blocked egress
from the Harbor builder) makes grep match nothing and exit 1, aborting
the build with a bare "exit code 1" and zero diagnostics -- which is
exactly how it failed in CI run 28394997476.

Run the token dance under bash with set -euo pipefail and curl -fsS so
HTTP/network errors surface, retry transient failures, print the raw
response when no download URL is found, and gate apt-get install behind
dpkg-deb --info so a non-.deb payload is reported instead of silently
breaking the install.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant