- Project: Apache Impala — distributed massively-parallel C++ SQL query engine for
data in HDFS, Apache Iceberg, Apache Kudu, Apache HBase, Amazon S3, Azure Data Lake
Storage, Apache Ozone, and other Hadoop-compatible storage (documented:
README.md). - Version / commit: this model is drafted against the default branch (
master), most recently HEADb8be513("IMPALA-13033: Parse WebUI thrift profile downloads"). A report against project release N should be triaged against the model as it stood at N, not at HEAD. - Date: 2026-06-02.
- Authors: ASF Security team draft, Impala PMC.
- Status: v1.
- Reporting: vulnerabilities that fall under §8 (claimed properties) should be reported per the Apache Security Team disclosure channel (security@impala.apache.org); reports that fall under §3 (out of scope), §9 (properties not provided), or §11a (known non-findings) will be closed by Impala triagers citing this document.
- Provenance legend — (documented) = drawn from in-repo docs or website docs, with citation; (maintainer) = stated by an Impala maintainer in response to this draft.
Impala is a MPP SQL engine: clients submit SQL over the HiveServer2 (HS2) Thrift
protocol or HS2-over-HTTP; the coordinator impalad parses, plans, and distributes
query fragments to worker impalad instances; metadata is served by a central
catalogd and propagated to workers via statestored; data is read and written
directly from/to the underlying storage (HDFS, S3, ADLS, Ozone, Kudu, HBase) using
the impersonated impala-process credentials. Authentication is via Kerberos, LDAP,
SAML, JWT, or OAuth bearer token; authorization is delegated to Apache Ranger.
- Production analytic SQL queries against tabular data residing in distributed
storage, served to authenticated end users via JDBC/ODBC clients,
impala-shell, BI tools, or Apache Hue (documented:README.md,docs/topics/impala_security.xml). - Multi-tenant analytic clusters where authorization is enforced by Apache Ranger
and authentication by Kerberos and/or LDAP (documented:
docs/topics/impala_security.xmllines 38–62).
Impala is not an in-process library and is not a single-binary daemon. It is a cluster of cooperating processes, deployed by an operator inside a network perimeter the operator controls. The threat model is therefore that of a distributed service, not a library (maintainer).
Following §2 of the output-structure rubric (network service split):
| Role | Trust level | Notes |
|---|---|---|
| End-user client | untrusted but authenticated | Connects via HS2 / HS2-HTTP / Beeswax; identity verified by Kerberos, LDAP, SAML, JWT, or OAuth (documented: docs/topics/impala_security.xml, docs/topics/impala_ldap.xml, be/src/rpc/authentication.cc). |
| Operator / cluster admin | trusted | Sets startup flags, manages keytabs, configures Ranger, owns the Web UI .htpasswd (documented: docs/topics/impala_security_guidelines.xml). |
| Internal Impala peer | trusted (mutually-authenticated) | impalad↔statestored↔catalogd RPC; KRPC + Thrift RPC; auth is Kerberos-only between internal components (documented: docs/topics/impala_ldap.xml, "Consideration for Connections Between Impala Components"). |
| Hive Metastore / Ranger | trusted control plane | Source of metadata + policy decisions; assumed honest (maintainer). |
| Underlying storage | trusted by virtue of operator-granted credentials | HDFS / S3 / ADLS / Ozone / Kudu / HBase; Impala holds delegation tokens or static credentials and reads/writes as the impala Unix user (or impersonated user when Ranger is enabled) (documented: docs/topics/impala_security_files.xml). |
| Delegated proxy user | conditionally trusted | When --authorized_proxy_user_config is set, an authenticated front-end (Hue, BI tool) may forward queries as a different end user (documented: docs/topics/impala_delegation.xml). |
| Family | Representative entry point | Touches outside the process? | In-model? |
|---|---|---|---|
| HS2 / Beeswax / HS2-HTTP server | :21000 (Beeswax), :21050 (HS2 binary), :28000 (HS2-HTTP) (documented: docs/topics/impala_ports.xml) |
network (TCP, optionally TLS) | yes |
| Internal KRPC + Thrift RPC | :27000 (KRPC), :23000/:24000/:26000 (statestore/catalog) |
network within the cluster | yes (peer trust depends on Kerberos / mTLS) |
Web UI / metrics / /admin/ endpoints |
:25000/:25010/:25020 |
network (TCP, optionally TLS + SPNEGO + .htpasswd) |
yes |
Query frontend (Java, fe/) — parser, analyzer, planner, Ranger checker |
invoked from coordinator impalad via JNI |
none directly | yes |
Query backend (C++, be/) — exec engine, codegen (LLVM), expression eval, scanners |
invoked from coordinator impalad and worker impalad |
reads/writes storage with process credentials | yes |
Catalog server (catalogd) |
invoked by coordinators; reads HMS + storage | reads HMS, lists HDFS, reads object stores | yes |
| Storage scanners (Parquet, ORC, Avro, text, Iceberg, Kudu, HBase, JDBC external table) | reads operator-configured locations | reads object stores / HDFS / external JDBC | yes (data trust = §6) |
| User-defined functions (UDFs) | CREATE FUNCTION … LOCATION … (C++ native or Java) |
runs operator-/user-permitted binaries in-process | out of model for UDF code itself (§3); in-model for the privilege check that admits the UDF |
| External Data Sources / JDBC external tables | CREATE DATA SOURCE / Iceberg REST catalog |
outbound JDBC / HTTPS (documented: docs/topics/impala_jdbc_external_table.xml, docs/topics/impala_iceberg_rest_catalog.xml) |
in-model for credential handling; out-of-model for the remote endpoint |
ai_generate_text LLM connector |
SQL function calling external LLM endpoint (documented: docs/topics/impala_ai_functions.xml) |
outbound HTTPS | in-model for credential / prompt handling; out-of-model for the LLM provider |
shell/ Python impala-shell |
client-side, not server | n/a | out of model for server claims (§3); in-model for credential handling of the shell binary itself |
docker/, testdata/, infra/, tests/ |
tooling | n/a | out of model (§3) |
Vendored Kudu security code under be/src/kudu/security/ |
TLS/SASL primitives shared with the Apache Kudu codebase | n/a | in-model only insofar as Impala calls into it (maintainer) |
Impala is not, and does not aim to be, the following — reports requiring any of these will be closed with the cited disposition:
- The root authority for storage-level authorization. HDFS POSIX
permissions, S3 IAM, ADLS RBAC, Ozone ACLs, etc. are enforced by the storage
provider and the credentials the operator hands to Impala. Reports that depend
primarily on over-broad bucket / IAM permissions are deployment-sensitive, not
Impala-side (documented:
docs/topics/impala_security_files.xml). →OUT-OF-MODEL: adversary-not-in-scope. - A defender against a malicious Hive Metastore, Ranger Admin, or other
trusted control-plane component. If the report requires the HMS or Ranger
to be hostile to Impala, it is out of model (maintainer). →
OUT-OF-MODEL: trusted-input. - A defender against the operator. Anyone with
root,sudo, theimpalaUnix account, the keytab file, the cookie-secret file, or the Web UI.htpasswdalready has unbounded power; "the operator misconfigured X" is not a vulnerability (documented:docs/topics/impala_security_guidelines.xml). →OUT-OF-MODEL: adversary-not-in-scope. - An isolation boundary between an authorized user's SQL and the
impaladprocess. SQL is interpreted by a trusted engine running as theimpalaUnix user; an authenticated user with appropriate SQL privileges can already cause arbitrary reads, writes, and resource consumption within the scope Ranger grants. A new way for an authorized user to do something they are already authorized to do is not a vulnerability (maintainer). →OUT-OF-MODEL: equivalent-harm. - A sandbox for user-defined functions. Native (C++) UDFs and Hive Java
UDFs run in-process with the privileges of the
impaladdaemon. UDF sandboxing is not provided; admission of aCREATE FUNCTIONis gated by Ranger and that is the entire enforcement (maintainer). →BY-DESIGN: property-disclaimed(§9). - A defender against malformed-but-parseable user data in scanned files.
Decoders (Parquet, ORC, Avro, text, Iceberg manifests) must not corrupt
process memory, but raw runtime exceptions, slow paths on adversarial
inputs, and OOM on pathological files are robustness work, not security
issues, unless they cross a trust boundary (maintainer). →
OUT-OF-MODEL: equivalent-harmfor writer-controlled files,VALID-HARDENINGfor reader-controlled files. - Code that ships but is not part of the supported product:
tests/,testdata/,infra/,docker/,package/,ssh_keys/,cmake_modules/,experiments/,udf_samples/. State the policy explicitly so integrators do not extend core guarantees to them (maintainer). →OUT-OF-MODEL: unsupported-component. - Apache Kudu, Apache Iceberg, Apache Ranger, Apache Hive client libraries,
Hadoop libraries, OpenSSL, Apache Thrift, and other upstream dependencies.
Where Impala vendors source (e.g.
be/src/kudu/), the vendored code is modeled at the wrapper boundary; vulnerabilities intrinsic to the upstream project should be reported upstream (maintainer). →OUT-OF-MODEL: unsupported-component(with an upstream pointer). - The Impala documentation site, asf-site branch, downloads page, gem/npm packages with similar names, and other non-product surfaces. Out of scope.
Impala has at least eight distinct trust transitions; a finding is in-model only when it cleanly maps to one of them.
| # | Transition | Authentication | Authorization |
|---|---|---|---|
| B1 | End-user client → HS2 / Beeswax / HS2-HTTP | Kerberos / LDAP / SAML / JWT / OAuth / trusted-domain header (documented: be/src/rpc/authentication.cc) |
Ranger on submitted SQL (documented: docs/topics/impala_authorization.xml) |
| B2 | End-user client → Web UI (:25000 and siblings) |
.htpasswd + SPNEGO, optional TLS (documented: docs/topics/impala_security_webui.xml) |
none beyond authentication; the Web UI exposes operator-grade endpoints (maintainer) |
| B3 | impalad ↔ statestored ↔ catalogd internal RPC (KRPC + Thrift) |
Kerberos (mandatory for prod) + optional TLS (documented: docs/topics/impala_ssl.xml, docs/topics/impala_ldap.xml) |
"internal_principals_whitelist" of allowed principals (documented: be/src/rpc/authentication.cc line 121) |
| B4 | Coordinator impalad → worker impalad (query fragments over KRPC) |
same as B3 | same as B3 |
| B5 | Coordinator / catalogd → Hive Metastore | Kerberos / delegation token | HMS-side; Impala assumes truthful responses |
| B6 | Coordinator → Ranger Admin (policy fetch) | service principal | Ranger-side |
| B7 | Worker impalad → underlying storage (HDFS, S3, ADLS, Ozone, Kudu, HBase) |
Kerberos / IAM / service-account keys / delegation tokens | storage-side ACLs |
| B8 | Operator → impalad startup flags + configuration files | filesystem permissions on the host | OS-level |
For each family in §2, a finding is in-model only if it is reachable as follows:
- HS2 / Beeswax / HS2-HTTP server: reachable from an unauthenticated network peer who can reach the listening port. Findings that require an already- authenticated peer collapse to "authenticated user with SQL privileges", and must additionally clear B7 (storage ACL) or B6 (Ranger policy) to be security-relevant.
- Internal KRPC: reachable from a network peer who has compromised the
Kerberos trust (B3) — i.e., has stolen a service keytab or impersonated a
principal on
internal_principals_whitelist. A flat "internal RPC has no auth" finding isOUT-OF-MODEL: adversary-not-in-scopebecause the model requires Kerberos between components in production (documented:docs/topics/impala_ldap.xml). - Web UI: reachable from a network peer with an
.htpasswdcredential (per §10) or who can reach an unprotected port. A finding that needs.htpasswdto be absent isOUT-OF-MODEL: trusted-inputagainst a guideline-violating operator (§3 item 3) (maintainer). - Query frontend / backend: reachable from SQL submitted by an authenticated user with sufficient Ranger privileges. Findings here matter only if they break out of the user's Ranger-granted privilege set.
- Scanners: reachable from bytes in operator-configured storage locations.
Bytes are partially attacker-controlled when an authorized writer has
INSERTprivilege on a table that other users read (B7). Compromise of the storage layer itself is out of model (§3 item 1). - UDFs: reachable only via
CREATE FUNCTION, which is Ranger-gated. Anything past the privilege check is out of model (§3 item 5).
- Operating system: Linux (Ubuntu 20.04/22.04/24.04, Rocky/RHEL 8/9 are the
declared supported set; others "may also be supported but are not tested by
the community") (documented:
README.md— Supported Platforms). x86_64 and arm64 supported. - Process model: at least three long-lived daemons (
impalad,statestored,catalogd); operator runs them as theimpalaUnix user (documented:docs/topics/impala_security_files.xml). - Network: operator-controlled L2/L3; no NAT or middlebox assumed to
inspect KRPC payloads; ports per
docs/topics/impala_ports.xml. Mutually- reachable cluster members assumed. - Time: Kerberos requires loosely-synchronized clocks across the realm (KDC tolerance, default 5 min) — operator's responsibility, not Impala's (maintainer).
- Filesystem: keytab and
.htpasswdfiles have OS-level permissions restricted to theimpalauser and admins (documented:docs/topics/impala_security_webui.xml). - Cryptography: the OpenSSL library shipped with the OS provides TLS,
symmetric/asymmetric primitives, and RNG (documented:
EXPORT_CONTROL.md). - Kerberos: assumes a working MIT KDC with renewable-ticket support
configured per
docs/topics/impala_kerberos.xml. - What Impala does to its host:
- does open listening sockets on the documented ports;
- does spawn child processes for
--ldap_bind_password_cmd,--s3a_access_key_cmd,--s3a_secret_key_cmd,--saml2_keystore_password_cmd,--saml2_private_key_password_cmd,--ssl_private_key_password_cmd,--webserver_private_key_password_cmd; andjava -version(maintainer); - does install signal handlers for crash reporting (SIGUSR1 → breakpad) and generating stack traces (SIGRTMIN+10), ignores SIGPIPE, and handles graceful shutdown with SIGTERM (maintainer);
- does read a documented set of environment variables (e.g.
IMPALA_HOME,JAVA_HOME,JAVA_TOOL_OPTIONS) but does not consume arbitraryLD_*for security-sensitive behavior (maintainer); - does write logs to operator-configured locations; redacted query text
if log redaction is enabled (documented:
docs/topics/impala_logging.xml).
Impala ships as a single product but a sizable number of runtime flags
materially change the security envelope. The maintainer-confirmed list is at
be/src/rpc/authentication.cc and equivalent files; the security-relevant
subset:
| Flag | Default | Maintainer stance | Effect |
|---|---|---|---|
--enable_ldap_auth |
false (documented) |
dev/test or operating with Kerberos, operator must enable per §10 (maintainer) | enables LDAP auth on HS2 client port |
--ssl_server_certificate / --ssl_private_key |
unset (documented) | dev/test, operator must enable per §10 (maintainer) | enables TLS on all listening sockets |
--ssl_minimum_version |
tlsv1.2 (documented: docs/topics/impala_ssl.xml) |
hardened in Impala 4.0 from tlsv1 (documented) |
rejects pre-1.2 handshakes |
--webserver_password_file |
unset (documented: docs/topics/impala_security_webui.xml) |
an unprotected Web UI is OUT-OF-MODEL: non-default-build (maintainer) |
Web UI authenticates against this .htpasswd |
--webserver_certificate_file |
unset (documented) | dev/test, operator must enable per §10 (maintainer) | enables HTTPS on Web UI |
--principal, --keytab-file |
unset | dev/test or operating with LDAP, operator must enable per §10 (maintainer) | enables Kerberos auth |
--authorization_provider=ranger |
unset (documented: docs/topics/impala_authorization.xml) |
dev/test, operator must enable per §10 (maintainer) | enables Ranger authz; absent → all queries run as impala user (no enforcement) |
--jwt_token_auth / --oauth_token_auth |
false |
optional alternative auth (documented: be/src/rpc/authentication.cc) |
enables bearer-token auth |
--jwt_validate_signature, --oauth_jwt_validate_signature |
true |
hardened default; flipping to false voids §8 P3 (maintainer) |
turns off JWT/OAuth signature check |
--jwt_allow_without_tls, --oauth_allow_without_tls, --saml2_allow_without_tls_debug_only |
false, marked _hidden |
"debug only" per name (→) | permits bearer / SAML auth over unencrypted transport |
--trusted_domain, --trusted_auth_header |
unset (documented) | when set, Impala accepts identity assertions from named peer without re-auth | reachability for OUT-OF-MODEL: trusted-input reports |
--trusted_domain_use_xff_header |
false |
when true, parses X-Forwarded-For to identify the originating client (documented: be/src/rpc/authentication.cc line 132) |
exposes a path where a misconfigured proxy can let a client claim any source address (maintainer) |
--internal_principals_whitelist |
hdfs (documented: be/src/rpc/authentication.cc line 121) |
governs which Kerberos principals are accepted on internal RPC ports | misconfiguration permits external service to speak as a peer |
--authorized_proxy_user_config / --authorized_proxy_group_config |
unset (documented: docs/topics/impala_delegation.xml) |
required for Hue-style impersonation; whitelists which authenticated principals may doas to which users |
breaks B1 if mis-scoped |
--cookie_secret_file |
empty (documented: be/src/rpc/authentication.cc line 98) |
when unset, HS2-HTTP cookies fall back to per-process random — sessions do not survive cluster restarts but are not forgeable (maintainer) | shared cluster-wide secret for cookie HMAC |
--abort_on_config_error |
true (maintainer) |
when off, security misconfigurations may not prevent startup | |
impala-shell --ssl |
false |
when true, must also configure ca_cert or verify_cert to validate server certificate |
required for TLS configured endpoints |
The insecure-default case. A number of these flags ship in the "off, must be turned on for production" posture. The maintainer ruling on whether the default is a supported production posture is captured in "Maintainer stance".
| Surface / route | Parameter | Attacker-controllable? | Caller must enforce |
|---|---|---|---|
HS2 binary :21050, HS2-HTTP :28000, Beeswax :21000 |
SQL text | yes | nothing — Impala parses, plans, and applies Ranger |
HS2-HTTP :28000 |
X-Forwarded-For header |
yes if --trusted_domain_use_xff_header is on; never trust otherwise (maintainer) |
per §10, only enable behind a load balancer that strips and resets XFF |
HS2-HTTP :28000 |
session cookie | signed with --cookie_secret_file HMAC; not attacker-forgeable when secret is unguessable (maintainer) |
per §10, rotate the cookie-secret file if compromised |
HS2-HTTP :28000 |
JWT / OAuth bearer | yes; signature checked when --jwt_validate_signature=true (default) (documented: be/src/rpc/authentication.cc) |
per §10, leave signature checking on, set --jwt_allow_without_tls=false |
HS2-HTTP :28000 |
--trusted_auth_header value |
yes; treated as the authenticated identity | never expose the port directly to untrusted peers when this flag is set (maintainer) |
Web UI :25000/:25010/:25020 |
.htpasswd credential |
yes if --webserver_password_file is set |
per §10, set the flag; per §10, set --webserver_certificate_file for HTTPS |
Web UI :25000/:25010/:25020 |
session cookie | signed with --cookie_secret_file |
per §10, rotate the cookie-secret file if compromised |
Web UI :25000 — query-profile and admin endpoints |
profile ID / GET parameters | yes | Web UI auth is the only gate; sensitive query bytes appear unless log redaction is enabled |
KRPC :27000 and statestore/catalog ports :23000/:24000/:26000 |
Thrift / KRPC payload | only by a peer that has cleared B3 | Kerberos + internal_principals_whitelist are the gate |
| Scanned table files (Parquet, ORC, Avro, text, Iceberg manifests) | file bytes | yes if an authorized writer can land bytes the reader will scan | Ranger separates writers from readers; B7 enforces who can land bytes |
ai_generate_text LLM endpoint |
LLM response | trusted only as far as the LLM is trusted | per §10, treat LLM output as untrusted text (do not pipe to executable contexts) (maintainer) |
| JDBC external table endpoint | rows returned by remote JDBC | trusted only as far as the remote endpoint is trusted | per §10, model JDBC external tables as data crossing a trust boundary |
- Impala accepts arbitrary-length SQL but the analyzer rejects queries above
implementation limits (controlled by query option
max_statement_length_byteswith max value of 2^31 - 1) (maintainer). - Scanned files may be terabytes; row groups are streamed. Pathological encodings (e.g. enormous string lengths in Parquet headers) are robustness concerns (maintainer).
- The HS2 / Beeswax surfaces have limited built-in rate limiting; admission
control via the
--default_pool_max_requestsfamily of flags bounds in-flight queries;--fe_service_threadsprovides a limited bound on connection and auth-attempt rate (maintainer).
| Actor | In scope? | Capabilities granted |
|---|---|---|
| Unauthenticated network peer reaching HS2 / Beeswax / HS2-HTTP | yes | TCP to the listening ports; may attempt authentication; may attempt to violate the protocol pre-auth |
| Unauthenticated peer reaching Web UI | yes, if the deployment exposes the Web UI publicly | as above for Web UI |
| Authenticated end user with limited Ranger privileges | yes | execute SQL, read tables the user has SELECT on, write tables the user has INSERT on |
| Authenticated end user with broad Ranger privileges | partial | only escapes from their Ranger envelope are in scope |
| Co-tenant on the same cluster | yes | same as authenticated end user; cross-tenant leakage is in scope |
| Authorized table writer producing data read by another user | yes for scanner robustness across the B7 boundary, but bounded — VALID-HARDENING, not VALID, unless memory corruption is reachable (maintainer) |
|
Authenticated proxy front-end (Hue) using doas |
yes only when --authorized_proxy_user_config is mis-scoped |
|
| Hostile peer impalad / statestored / catalogd | out of scope — see §3 item 2 | |
| Hostile HMS / Ranger | out of scope — see §3 item 2 | |
| Operator | out of scope — see §3 item 3 | |
Local process on the same host as impalad running as a different user |
partial (maintainer): same-host attackers with non-impala UID can read the Web UI / HS2 ports unless host firewalling forbids or authentication secured from non-impala local users; Impala does not defend against same-host UID-0 attackers |
|
| Side-channel observer (cache timing, network timing) | out of scope (maintainer) | |
| Quantum adversary | out of scope |
Impala is not a Byzantine-fault-tolerant system. A compromised
impalad/catalogd/statestored peer with a valid Kerberos identity can
cause unbounded damage (read any data the cluster can read, produce wrong
results, leak intermediate state). The cluster trusts its own membership
(maintainer). → reports requiring a Byzantine internal peer are
OUT-OF-MODEL: adversary-not-in-scope.
For each property: condition, violation symptom, severity tier, provenance.
- Condition: an authentication mode is enabled
(
--enable_ldap_auth,--principal+--keytab-file,--jwt_token_auth,--oauth_token_auth, or a SAML configuration). With none of these set, Impala accepts unauthenticated SQL (documented:be/src/rpc/authentication.cc). - Violation symptom: a network peer holding no valid credential successfully executes SQL.
- Severity: security-critical,
VALIDper §13. - (documented)
- Condition:
--authorization_provider=rangeris set and Ranger is reachable (documented:docs/topics/impala_authorization.xml). With this flag unset, no authorization is enforced and all queries run as theimpalauser (documented:docs/topics/impala_security.xml,docs/topics/impala_authorization.xml). - Violation symptom: a query reads or modifies data not licensed by the authenticated principal's Ranger policy. Failure mode includes both the authorization-bypass case (Impala fails to apply a policy) and the authorization-confusion case (Impala applies the wrong policy).
- Severity: security-critical,
VALIDper §13. - (documented)
- Condition:
--ssl_server_certificate+--ssl_private_keyset on the relevant daemon; minimum version per--ssl_minimum_version(defaulttlsv1.2since Impala 4.0) (documented:docs/topics/impala_ssl.xml). - Violation symptom: cleartext on the wire after TLS is configured, or a
TLS handshake completing with a deprecated cipher despite
--ssl_minimum_version=tlsv1.2. - Severity: security-critical,
VALIDper §13. - (documented)
- Condition:
--principaland--keytab-fileare set on all three daemons (documented:docs/topics/impala_kerberos.xml,docs/topics/impala_ldap.xml). Without this, internal RPCs are unauthenticated and a cleartext-internal-RPC report is not a §8 break — the operator violated §10. - Violation symptom: internal RPC completing successfully from a principal
not on
--internal_principals_whitelist. - Severity: security-critical,
VALIDper §13. - (documented)
- Condition:
--authorized_proxy_user_config/--authorized_proxy_group_configset; the authenticated front-end principal appears as a key (documented:docs/topics/impala_delegation.xml). - Violation symptom: an authenticated principal successfully runs a query as a delegated user not in their allow-list.
- Severity: security-critical,
VALIDper §13. - (documented)
- Condition: redaction rules configured per
docs/topics/impala_logging.xml#redaction. - Violation symptom: literal values matching configured redaction patterns appearing un-redacted in logs or Web UI query profiles.
- Severity: security-critical for data-protection-regulated deployments;
VALIDper §13. - (documented)
- Condition:
--webserver_password_fileset; optionally Kerberos SPNEGO. - Violation symptom: an unauthenticated peer accesses an authenticated Web UI endpoint, or an authenticated peer accesses an endpoint above their Web UI auth tier.
- Severity: security-critical,
VALIDper §13. - (documented:
docs/topics/impala_security_webui.xml)
- Condition: input matches the documented protocol (HS2 / Beeswax / Thrift
/ KRPC / Parquet / ORC / Avro / Iceberg manifest / etc.); the host
conformant to §5; no
_hiddendebug flag is in use (maintainer). - Violation symptom: heap or stack corruption, out-of-bounds read/write, use-after-free, double-free reachable from a §6 input.
- Severity: security-critical when reachable from network input or from
table data crossing B7;
VALID-HARDENINGwhen reachable only by a writer who already controls the bytes (§3 item 6). - (maintainer)
- Condition: applies only to flows where Impala emits SQL to a remote
system on behalf of an Impala client (JDBC external tables;
ai_generate_textwith prompt-as-SQL patterns). - Violation symptom: end-user SQL text appearing un-escaped in a remote- query string.
- Severity: case-dependent;
VALID-HARDENINGif the remote system is also Impala-trusted,VALIDif the remote system is a tenant boundary. - (maintainer)
State each plainly so a triager can route an inbound report to the matching disclaimer.
- No isolation between authenticated user SQL and the
impaladprocess. A user with Ranger privilege toCREATE FUNCTIONand toSELECTfrom the resulting function can run arbitrary native or JVM code insideimpalad. UDFs are not sandboxed. See §3 item 5 (maintainer). - No defense against decompression / decoding bombs in scanned files. A malicious or buggy table writer can land Parquet / ORC / Avro / text files designed to maximize CPU and memory; the reader has no built-in cap on per-file resource use (maintainer).
- No quotas on per-query or per-user resource consumption beyond what
admission control provides. A user with
SELECTon a large table can cause arbitrary wall-clock and memory burn. Operator must configure--default_pool_*admission-control flags (maintainer). - No defense against intra-cluster Byzantine failure. A compromised peer with a valid Kerberos identity can read any data the cluster can read; see §7 (maintainer).
- No protection against the operator. Anyone with the keytab, the
cookie-secret file, the
.htpasswd, the impala Unix account, or root on any impala host wins. See §3 item 3. - No protection against a malicious HMS / Ranger. See §3 item 2.
- No data-at-rest encryption. Impala writes file bytes through the storage layer's existing protections (HDFS Transparent Data Encryption, S3 SSE, etc.). Impala does not encrypt at the table format level (maintainer).
- No defense against side-channel observation (cache, timing, branch prediction) of query plans or data (maintainer).
- No constant-time comparison of authentication secrets beyond what the underlying SASL/Kerberos libraries provide (maintainer).
- No defender stance against an attacker on the same Linux host running as
a non-
impalaUID — Impala defends only across the network surface; same-host attackers with shell access on the impala host already have many paths to win (maintainer).
SHOW TABLES/SHOW DATABASESfiltering is an authorization view, not an information-flow channel. Object names a user is not authorized to see are hidden, but error messages, query-profile timing, and Web UI traces may reveal existence indirectly (maintainer).- Log redaction is a display feature, not a confidentiality boundary. It obfuscates literals in new log entries when patterns match; it cannot retroactively cleanse leaked log files, and a regex miss leaks the literal.
- Kerberos authenticates the principal, not the host the principal connects from. A stolen keytab is a stolen identity.
- TLS encrypts but does not authenticate the application-layer identity. Authentication is layered on (LDAP, JWT, etc.); TLS by itself does not authorize.
.htpasswdWeb-UI authentication does not provide per-user authorization on the Web UI. Any authenticated.htpasswduser sees all Web UI contents, including query bytes and profiles (maintainer).--trusted_domain/--trusted_auth_headeris an explicit bypass of client authentication. Setting it without controlling the load balancer hands an attacker the keys.- Ranger column-masking and row-filter policies operate at the planner level, not the storage level. Anyone bypassing the planner (a hostile peer reading the file directly via HDFS, a UDF reading raw bytes) is not constrained by them.
- SQL-engine-amplified DoS ("malicious analytic query"): a user with
SELECTprivilege issuing a Cartesian product across petabyte tables. The fix surface is admission control, not the engine. - Decompression / decoding bombs in supported file formats (see above).
- Adversarial table-writer collusion: a writer landing files that crash
a downstream reader is
VALID-HARDENINGat most, because the writer could simply have written wrong data. - Confused-deputy via
doaswhen the proxy list is mis-scoped. - Time-of-check-to-time-of-use between Ranger policy fetch and query execution: policy changes mid-query are not retroactively enforced (maintainer).
The operator deploying Impala in production must:
- Set
--principal+--keytab-fileonimpalad,statestored,catalogd. Without these, internal RPC has no authentication (documented:docs/topics/impala_kerberos.xml,docs/topics/impala_ldap.xml). - Set
--authorization_provider=rangerand configure Ranger. Without this, no authorization is enforced (documented:docs/topics/impala_authorization.xml). - Enable TLS —
--ssl_server_certificateand--ssl_private_keyon all daemons, and--ssl_client_ca_certificateto authenticate the peer of internal RPC (documented:docs/topics/impala_ssl.xml). - Set
--webserver_password_fileand--webserver_certificate_fileso the Web UI is authenticated and TLS-served (documented:docs/topics/impala_security_webui.xml,docs/topics/impala_security_guidelines.xml). - Restrict Web UI ports (
:25000/:25010/:25020) at the network layer to a trusted operator subnet (documented:docs/topics/impala_security_guidelines.xml). - Restrict membership in
--internal_principals_whitelistto the actual Kerberos principals of cluster members (documented:be/src/rpc/authentication.cc). - Never set
--jwt_allow_without_tls=true,--oauth_allow_without_tls=true, or--saml2_allow_without_tls_debug_only=truein production (maintainer). - Never set
--trusted_domain/--trusted_auth_header/--trusted_domain_use_xff_headerunless the listening port is exposed only to a load balancer that strips and resets the relevant header (maintainer). - Set
--cookie_secret_fileto a long, random, cluster-wide secret with filesystem permissions restricted to theimpalauser (documented:be/src/rpc/authentication.cc). - Set
--authorized_proxy_user_config/--authorized_proxy_group_configto the smallest set of front-end principals that needdoas, with the smallest set of impersonated users (documented:docs/topics/impala_delegation.xml). - Configure log redaction patterns for any sensitive literal that may
appear in WHERE-clause queries (documented:
docs/topics/impala_logging.xml#redaction). - Secure the OS-level
impalaUnix user and theroot/sudoersset on every Impala host (documented:docs/topics/impala_security_guidelines.xml). - Configure admission control (
--default_pool_max_requests,--default_pool_max_queued,--default_pool_mem_limit) to bound per-query and per-pool resource use; Impala does not enforce DoS protection by itself (maintainer). - Treat
ai_generate_textresults and JDBC external-table rows as crossing a trust boundary; do not assume the remote system is honest (maintainer). - Secure the underlying storage (HDFS, S3, ADLS, Ozone) with native ACLs;
Impala enforces only what it can see (documented:
docs/topics/impala_security_files.xml). - Provide
.impalarcforimpala-shellusers configuringssland eitherca_certorverify_cert(maintainer).
- Exposing port 25000 (Web UI) directly to the public Internet without
--webserver_password_file. Anyone reaching the port reads in-flight query bytes, server flags, table names. → operator hardening per §10. - Running Impala with
--authorization_providerunset in a multi-tenant cluster. All queries succeed as theimpalaUnix user — there is no authorization at all (documented:docs/topics/impala_authorization.xml). - Setting
--trusted_domainwithout ensuring the listening port is only reachable from the trusted reverse proxy. The flag is a deliberate client-auth bypass for proxy deployments; the operator owns the network fence. - Using
CREATE FUNCTIONto load a UDF binary supplied by an end user. UDFs run in-process. → Ranger-gateCREATE FUNCTIONto administrators. - Treating Impala's
SHOW TABLESview as a confidentiality boundary. Existence of a hidden object may leak through error messages or query profiles (maintainer). - Re-using
--cookie_secret_fileacross clusters of different trust levels. A leak in cluster A becomes a forgery primitive in cluster B (maintainer). - Disabling TLS internally between
impalad/statestored/catalogdin production. Cleartext internal RPC + Kerberosauth-intis the documented minimum; many deployments leave it atauth(maintainer). - Mixing authenticated and unauthenticated coordinator daemons in the same
cluster. Impala 2.0+ accepts both Kerberos and LDAP on the same port; an
operator who also leaves a single coordinator unauthenticated produces a
bypass (documented:
docs/topics/impala_mixed_security.xml). impala-shellfailure to verify server certificate. Invokingimpala-shell --sslwithout specifying--ca_certor--verify_certis a known insecure default that will be addressed in a future release.
This section is the highest-leverage input for automated agentic security scans. Each entry: tool symptom, why it is safe under the model, the § that licenses the call.
- "Internal RPC accepts plaintext / no auth" report against
:23000,:24000,:26000,:27000. In a model-conforming deployment the operator has set--principal+ Kerberos per §10; cleartext is a §10 violation by the operator, not an Impala bug. →OUT-OF-MODEL: non-default-buildper §5a. - "Web UI on
:25000reachable without authentication" against an un-.htpasswd-protected cluster. Same shape as above; operator responsibility per §10. →OUT-OF-MODEL: non-default-build. - "
--jwt_allow_without_tls=truepermits credentials over plaintext" in a config file. The flag is_hiddenand named "debug only"; setting it voids §8 P3. →OUT-OF-MODEL: non-default-build(maintainer). - "Path traversal in
gzopen-style filename" against scanners. All scanner paths are Ranger-checked URIs, not OS paths; the URI namespace is rooted at the operator-configured warehouse. →OUT-OF-MODEL: trusted-input(maintainer). - "Hardcoded test password / keytab in
tests/,testdata/,ssh_keys/."tests/,testdata/,ssh_keys/are unsupported components. →OUT-OF-MODEL: unsupported-component. - "SQL injection via end-user SQL text." End-user SQL is the input;
the engine is designed to interpret it. The Ranger envelope is the
authorization boundary, not SQL-text sanitization. →
BY-DESIGN: property-disclaimed. - "User-defined function executes arbitrary code in the impalad process."
Documented and intentional; admission is Ranger-gated. →
BY-DESIGN: property-disclaimedper §9. - "DoS via expensive analytic query on a large table." Admission control
is the fix surface, not the engine. →
BY-DESIGN: property-disclaimedper §9. - "Decompression bomb in a Parquet/ORC file landed in a warehouse table."
Writers must have
INSERT; the harm is reachable from an already- privileged actor. →VALID-HARDENINGat most, unless it reaches §8 P8 memory safety. - "Unchecked return from
mallocinudf_samples/."udf_samples/is unsupported sample code. →OUT-OF-MODEL: unsupported-component. - "Vendored Apache Kudu code under
be/src/kudu/security/has CVE-X." Report upstream to Apache Kudu; Impala will pick up the fix on the next vendored sync. →OUT-OF-MODEL: unsupported-component(upstream pointer) (maintainer).
Revise this document when any of the following lands:
- A new authentication mechanism on a client-facing surface (e.g. mTLS-as- auth on HS2-HTTP, OIDC, U2F).
- A new authorization provider beyond Ranger (e.g. native Impala policy store, OPA integration).
- A new data-at-rest encryption story at the Impala layer (currently delegated; see §9).
- A new external-data surface (a new JDBC external-table connector, a new
REST catalog beyond Iceberg, a new LLM connector beyond
ai_generate_text). - A UDF sandboxing story (changes §9 and §3 item 5).
- A change in the default value of any §5a flag, especially flags
controlling auth (
--ssl_minimum_version,--jwt_validate_signature). - A vulnerability report that cannot be cleanly routed to one of the §13 dispositions: that is evidence the model is incomplete.
A report against Impala receives exactly one of the following:
| Disposition | Meaning | Licensed by |
|---|---|---|
VALID |
Violates a §8 property via an in-scope §7 adversary using an in-scope §6 input. | §8, §6, §7 |
VALID-HARDENING |
No §8 property violated, but a §11 misuse pattern can be made harder to fall into by code change. Fixed at maintainer discretion, typically no CVE. | §11 |
OUT-OF-MODEL: trusted-input |
Requires attacker control of a §6 parameter the model marks trusted (e.g. HMS-supplied metadata, Ranger-supplied policy, operator-supplied config flag). | §6 |
OUT-OF-MODEL: adversary-not-in-scope |
Requires a §7 actor the model excludes (operator, malicious HMS/Ranger, Byzantine peer, side-channel observer, same-host non-impala UID-0). |
§7 |
OUT-OF-MODEL: unsupported-component |
Lands in tests/, testdata/, infra/, ssh_keys/, udf_samples/, vendored upstream code under be/src/kudu/, etc. |
§3 item 7, §3 item 8 |
OUT-OF-MODEL: non-default-build |
Only manifests under a §5a flag the maintainer has ruled is dev/test (e.g. --jwt_allow_without_tls=true, no --principal). |
§5a |
OUT-OF-MODEL: equivalent-harm |
An actor already-authorized under the model can cause the same harm via a documented path (writer landing arbitrary file bytes, SQL-privileged user submitting expensive queries). | §3 item 4, §3 item 6 |
BY-DESIGN: property-disclaimed |
Concerns a §9 property the project explicitly does not provide (UDF sandboxing, DoS protection, side channels, etc.). | §9 |
KNOWN-NON-FINDING |
Matches a §11a recurring false positive. | §11a |
MODEL-GAP |
Cannot be cleanly routed to any of the above — triggers §12 model revision. | §12 |
Impala does not currently ship an in-repo SECURITY.md, and the website
does not publish a project-level security policy page
(https://impala.apache.org/security.html returns 404 at draft time; the
landing page only links the generic ASF security URL). The de facto
SECURITY-policy artifacts are the in-repo DITA docs under
docs/topics/impala_security*.xml, which are the source for the published
documentation at https://impala.apache.org/docs/build/html/topics/impala_security.html.
The back-map below covers the in-repo source.
| Source | Claim | Lands in |
|---|---|---|
docs/topics/impala_security.xml |
"Impala includes a fine-grained authorization framework … based on Apache Ranger" | §8 P2 |
docs/topics/impala_security.xml |
"Impala relies on the Kerberos subsystem for authentication" | §8 P1, §8 P4 |
docs/topics/impala_security.xml |
"auditing capability … Impala generates the audit data which can be consumed … by cluster-management components focused on governance" | §10 item 11 (operator picks the audit sink) |
docs/topics/impala_security_guidelines.xml |
"Secure the root account", restrict sudoers |
§3 item 3, §10 item 12 |
docs/topics/impala_security_guidelines.xml |
"Ensure that the Impala web UI … is password-protected" | §10 item 4, §11 first bullet |
docs/topics/impala_security_files.xml |
"All Impala read and write operations are performed under the filesystem privileges of the impala user" | §4 B7, §10 item 15 |
docs/topics/impala_authentication.xml |
"Impala supports authentication using either Kerberos or LDAP. You can also make proxy connections through Apache Knox." | §8 P1, §11 (Knox proxy is --trusted_domain-shaped) |
docs/topics/impala_authorization.xml |
"By default … Impala does all read and write operations with the privileges of the impala user" | §5a default-row "authorization_provider unset", §8 P2 violation symptom |
docs/topics/impala_ldap.xml |
"You must use the Kerberos authentication mechanism for connections between internal Impala components" | §4 B3, §8 P4 |
docs/topics/impala_ssl.xml |
"Impala supports TLS/SSL network encryption … default version was changed from 'tlsv1' to 'tlsv1.2' starting in Impala 4.0" | §8 P3, §5a row |
docs/topics/impala_security_webui.xml |
"This file should only be readable by the Impala process and machine administrators" | §10 item 9 |
docs/topics/impala_mixed_security.xml |
"Impala 2.0 and later automatically handles both Kerberos and LDAP authentication" | §11 (mixed-mode misconfig) |
docs/topics/impala_delegation.xml |
"Impala supports delegation where users whose names you specify can delegate the execution of a query to another user" | §8 P5, §10 item 10 |
docs/topics/impala_logging.xml#redaction |
"log redaction is a security feature that prevents sensitive information from being displayed in locations used by administrators for monitoring and troubleshooting" | §8 P6, §9 false-friend |
docs/topics/impala_ports.xml |
port inventory | §2 component table, §4 boundary table |
EXPORT_CONTROL.md |
"This software uses OpenSSL to enable TLS-encrypted connections, generate keys for asymmetric cryptography, and generate and verify signatures" | §5 cryptography assumption |
be/src/rpc/authentication.cc lines 98–283 |
flag inventory (cookie_secret_file, jwt_, oauth_, trusted_*, internal_principals_whitelist) | §5a, §6, §10 |