Apache Impala Security Threat Model

§1 Header

Project: Apache Impala — distributed massively-parallel C++ SQL query engine for data in HDFS, Apache Iceberg, Apache Kudu, Apache HBase, Amazon S3, Azure Data Lake Storage, Apache Ozone, and other Hadoop-compatible storage (documented: README.md).
Version / commit: this model is drafted against the default branch (master), most recently HEAD b8be513 ("IMPALA-13033: Parse WebUI thrift profile downloads"). A report against project release N should be triaged against the model as it stood at N, not at HEAD.
Date: 2026-06-02.
Authors: ASF Security team draft, Impala PMC.
Status: v1.
Reporting: vulnerabilities that fall under §8 (claimed properties) should be reported per the Apache Security Team disclosure channel (security@impala.apache.org); reports that fall under §3 (out of scope), §9 (properties not provided), or §11a (known non-findings) will be closed by Impala triagers citing this document.
Provenance legend — (documented) = drawn from in-repo docs or website docs, with citation; (maintainer) = stated by an Impala maintainer in response to this draft.

Impala is a MPP SQL engine: clients submit SQL over the HiveServer2 (HS2) Thrift protocol or HS2-over-HTTP; the coordinator impalad parses, plans, and distributes query fragments to worker impalad instances; metadata is served by a central catalogd and propagated to workers via statestored; data is read and written directly from/to the underlying storage (HDFS, S3, ADLS, Ozone, Kudu, HBase) using the impersonated impala-process credentials. Authentication is via Kerberos, LDAP, SAML, JWT, or OAuth bearer token; authorization is delegated to Apache Ranger.

§2 Scope and intended use

Intended use

Production analytic SQL queries against tabular data residing in distributed storage, served to authenticated end users via JDBC/ODBC clients, impala-shell, BI tools, or Apache Hue (documented: README.md, docs/topics/impala_security.xml).
Multi-tenant analytic clusters where authorization is enforced by Apache Ranger and authentication by Kerberos and/or LDAP (documented: docs/topics/impala_security.xml lines 38–62).

Deployment shape

Impala is not an in-process library and is not a single-binary daemon. It is a cluster of cooperating processes, deployed by an operator inside a network perimeter the operator controls. The threat model is therefore that of a distributed service, not a library (maintainer).

Caller roles

Following §2 of the output-structure rubric (network service split):

Role	Trust level	Notes
End-user client	untrusted but authenticated	Connects via HS2 / HS2-HTTP / Beeswax; identity verified by Kerberos, LDAP, SAML, JWT, or OAuth (documented: `docs/topics/impala_security.xml`, `docs/topics/impala_ldap.xml`, `be/src/rpc/authentication.cc`).
Operator / cluster admin	trusted	Sets startup flags, manages keytabs, configures Ranger, owns the Web UI `.htpasswd` (documented: `docs/topics/impala_security_guidelines.xml`).
Internal Impala peer	trusted (mutually-authenticated)	`impalad`↔`statestored`↔`catalogd` RPC; KRPC + Thrift RPC; auth is Kerberos-only between internal components (documented: `docs/topics/impala_ldap.xml`, "Consideration for Connections Between Impala Components").
Hive Metastore / Ranger	trusted control plane	Source of metadata + policy decisions; assumed honest (maintainer).
Underlying storage	trusted by virtue of operator-granted credentials	HDFS / S3 / ADLS / Ozone / Kudu / HBase; Impala holds delegation tokens or static credentials and reads/writes as the `impala` Unix user (or impersonated user when Ranger is enabled) (documented: `docs/topics/impala_security_files.xml`).
Delegated proxy user	conditionally trusted	When `--authorized_proxy_user_config` is set, an authenticated front-end (Hue, BI tool) may forward queries as a different end user (documented: `docs/topics/impala_delegation.xml`).

Component-family table

Family	Representative entry point	Touches outside the process?	In-model?
HS2 / Beeswax / HS2-HTTP server	`:21000` (Beeswax), `:21050` (HS2 binary), `:28000` (HS2-HTTP) (documented: `docs/topics/impala_ports.xml`)	network (TCP, optionally TLS)	yes
Internal KRPC + Thrift RPC	`:27000` (KRPC), `:23000`/`:24000`/`:26000` (statestore/catalog)	network within the cluster	yes (peer trust depends on Kerberos / mTLS)
Web UI / metrics / `/admin/` endpoints	`:25000`/`:25010`/`:25020`	network (TCP, optionally TLS + SPNEGO + `.htpasswd`)	yes
Query frontend (Java, `fe/`) — parser, analyzer, planner, Ranger checker	invoked from coordinator `impalad` via JNI	none directly	yes
Query backend (C++, `be/`) — exec engine, codegen (LLVM), expression eval, scanners	invoked from coordinator `impalad` and worker `impalad`	reads/writes storage with process credentials	yes
Catalog server (`catalogd`)	invoked by coordinators; reads HMS + storage	reads HMS, lists HDFS, reads object stores	yes
Storage scanners (Parquet, ORC, Avro, text, Iceberg, Kudu, HBase, JDBC external table)	reads operator-configured locations	reads object stores / HDFS / external JDBC	yes (data trust = §6)
User-defined functions (UDFs)	`CREATE FUNCTION … LOCATION …` (C++ native or Java)	runs operator-/user-permitted binaries in-process	out of model for UDF code itself (§3); in-model for the privilege check that admits the UDF
External Data Sources / JDBC external tables	`CREATE DATA SOURCE` / Iceberg REST catalog	outbound JDBC / HTTPS (documented: `docs/topics/impala_jdbc_external_table.xml`, `docs/topics/impala_iceberg_rest_catalog.xml`)	in-model for credential handling; out-of-model for the remote endpoint
`ai_generate_text` LLM connector	SQL function calling external LLM endpoint (documented: `docs/topics/impala_ai_functions.xml`)	outbound HTTPS	in-model for credential / prompt handling; out-of-model for the LLM provider
`shell/` Python impala-shell	client-side, not server	n/a	out of model for server claims (§3); in-model for credential handling of the shell binary itself
`docker/`, `testdata/`, `infra/`, `tests/`	tooling	n/a	out of model (§3)
Vendored Kudu security code under `be/src/kudu/security/`	TLS/SASL primitives shared with the Apache Kudu codebase	n/a	in-model only insofar as Impala calls into it (maintainer)

§3 Out of scope (explicit non-goals)

Impala is not, and does not aim to be, the following — reports requiring any of these will be closed with the cited disposition:

The root authority for storage-level authorization. HDFS POSIX permissions, S3 IAM, ADLS RBAC, Ozone ACLs, etc. are enforced by the storage provider and the credentials the operator hands to Impala. Reports that depend primarily on over-broad bucket / IAM permissions are deployment-sensitive, not Impala-side (documented: docs/topics/impala_security_files.xml). → OUT-OF-MODEL: adversary-not-in-scope.
A defender against a malicious Hive Metastore, Ranger Admin, or other trusted control-plane component. If the report requires the HMS or Ranger to be hostile to Impala, it is out of model (maintainer). → OUT-OF-MODEL: trusted-input.
A defender against the operator. Anyone with root, sudo, the impala Unix account, the keytab file, the cookie-secret file, or the Web UI .htpasswd already has unbounded power; "the operator misconfigured X" is not a vulnerability (documented: docs/topics/impala_security_guidelines.xml). → OUT-OF-MODEL: adversary-not-in-scope.
An isolation boundary between an authorized user's SQL and the impalad process. SQL is interpreted by a trusted engine running as the impala Unix user; an authenticated user with appropriate SQL privileges can already cause arbitrary reads, writes, and resource consumption within the scope Ranger grants. A new way for an authorized user to do something they are already authorized to do is not a vulnerability (maintainer). → OUT-OF-MODEL: equivalent-harm.
A sandbox for user-defined functions. Native (C++) UDFs and Hive Java UDFs run in-process with the privileges of the impalad daemon. UDF sandboxing is not provided; admission of a CREATE FUNCTION is gated by Ranger and that is the entire enforcement (maintainer). → BY-DESIGN: property-disclaimed (§9).
A defender against malformed-but-parseable user data in scanned files. Decoders (Parquet, ORC, Avro, text, Iceberg manifests) must not corrupt process memory, but raw runtime exceptions, slow paths on adversarial inputs, and OOM on pathological files are robustness work, not security issues, unless they cross a trust boundary (maintainer). → OUT-OF-MODEL: equivalent-harm for writer-controlled files, VALID-HARDENING for reader-controlled files.
Code that ships but is not part of the supported product: tests/, testdata/, infra/, docker/, package/, ssh_keys/, cmake_modules/, experiments/, udf_samples/. State the policy explicitly so integrators do not extend core guarantees to them (maintainer). → OUT-OF-MODEL: unsupported-component.
Apache Kudu, Apache Iceberg, Apache Ranger, Apache Hive client libraries, Hadoop libraries, OpenSSL, Apache Thrift, and other upstream dependencies. Where Impala vendors source (e.g. be/src/kudu/), the vendored code is modeled at the wrapper boundary; vulnerabilities intrinsic to the upstream project should be reported upstream (maintainer). → OUT-OF-MODEL: unsupported-component (with an upstream pointer).
The Impala documentation site, asf-site branch, downloads page, gem/npm packages with similar names, and other non-product surfaces. Out of scope.

§4 Trust boundaries and data flow

Impala has at least eight distinct trust transitions; a finding is in-model only when it cleanly maps to one of them.

#	Transition	Authentication	Authorization
B1	End-user client → HS2 / Beeswax / HS2-HTTP	Kerberos / LDAP / SAML / JWT / OAuth / trusted-domain header (documented: `be/src/rpc/authentication.cc`)	Ranger on submitted SQL (documented: `docs/topics/impala_authorization.xml`)
B2	End-user client → Web UI (`:25000` and siblings)	`.htpasswd` + SPNEGO, optional TLS (documented: `docs/topics/impala_security_webui.xml`)	none beyond authentication; the Web UI exposes operator-grade endpoints (maintainer)
B3	`impalad` ↔ `statestored` ↔ `catalogd` internal RPC (KRPC + Thrift)	Kerberos (mandatory for prod) + optional TLS (documented: `docs/topics/impala_ssl.xml`, `docs/topics/impala_ldap.xml`)	"internal_principals_whitelist" of allowed principals (documented: `be/src/rpc/authentication.cc` line 121)
B4	Coordinator `impalad` → worker `impalad` (query fragments over KRPC)	same as B3	same as B3
B5	Coordinator / catalogd → Hive Metastore	Kerberos / delegation token	HMS-side; Impala assumes truthful responses
B6	Coordinator → Ranger Admin (policy fetch)	service principal	Ranger-side
B7	Worker `impalad` → underlying storage (HDFS, S3, ADLS, Ozone, Kudu, HBase)	Kerberos / IAM / service-account keys / delegation tokens	storage-side ACLs
B8	Operator → impalad startup flags + configuration files	filesystem permissions on the host	OS-level

Reachability preconditions per family

For each family in §2, a finding is in-model only if it is reachable as follows:

HS2 / Beeswax / HS2-HTTP server: reachable from an unauthenticated network peer who can reach the listening port. Findings that require an already- authenticated peer collapse to "authenticated user with SQL privileges", and must additionally clear B7 (storage ACL) or B6 (Ranger policy) to be security-relevant.
Internal KRPC: reachable from a network peer who has compromised the Kerberos trust (B3) — i.e., has stolen a service keytab or impersonated a principal on internal_principals_whitelist. A flat "internal RPC has no auth" finding is OUT-OF-MODEL: adversary-not-in-scope because the model requires Kerberos between components in production (documented: docs/topics/impala_ldap.xml).
Web UI: reachable from a network peer with an .htpasswd credential (per §10) or who can reach an unprotected port. A finding that needs .htpasswd to be absent is OUT-OF-MODEL: trusted-input against a guideline-violating operator (§3 item 3) (maintainer).
Query frontend / backend: reachable from SQL submitted by an authenticated user with sufficient Ranger privileges. Findings here matter only if they break out of the user's Ranger-granted privilege set.
Scanners: reachable from bytes in operator-configured storage locations. Bytes are partially attacker-controlled when an authorized writer has INSERT privilege on a table that other users read (B7). Compromise of the storage layer itself is out of model (§3 item 1).
UDFs: reachable only via CREATE FUNCTION, which is Ranger-gated. Anything past the privilege check is out of model (§3 item 5).

§5 Assumptions about the environment

Operating system: Linux (Ubuntu 20.04/22.04/24.04, Rocky/RHEL 8/9 are the declared supported set; others "may also be supported but are not tested by the community") (documented: README.md — Supported Platforms). x86_64 and arm64 supported.
Process model: at least three long-lived daemons (impalad, statestored, catalogd); operator runs them as the impala Unix user (documented: docs/topics/impala_security_files.xml).
Network: operator-controlled L2/L3; no NAT or middlebox assumed to inspect KRPC payloads; ports per docs/topics/impala_ports.xml. Mutually- reachable cluster members assumed.
Time: Kerberos requires loosely-synchronized clocks across the realm (KDC tolerance, default 5 min) — operator's responsibility, not Impala's (maintainer).
Filesystem: keytab and .htpasswd files have OS-level permissions restricted to the impala user and admins (documented: docs/topics/impala_security_webui.xml).
Cryptography: the OpenSSL library shipped with the OS provides TLS, symmetric/asymmetric primitives, and RNG (documented: EXPORT_CONTROL.md).
Kerberos: assumes a working MIT KDC with renewable-ticket support configured per docs/topics/impala_kerberos.xml.
What Impala does to its host:
- does open listening sockets on the documented ports;
- does spawn child processes for --ldap_bind_password_cmd, --s3a_access_key_cmd, --s3a_secret_key_cmd, --saml2_keystore_password_cmd, --saml2_private_key_password_cmd, --ssl_private_key_password_cmd, --webserver_private_key_password_cmd; and java -version (maintainer);
- does install signal handlers for crash reporting (SIGUSR1 → breakpad) and generating stack traces (SIGRTMIN+10), ignores SIGPIPE, and handles graceful shutdown with SIGTERM (maintainer);
- does read a documented set of environment variables (e.g. IMPALA_HOME, JAVA_HOME, JAVA_TOOL_OPTIONS) but does not consume arbitrary LD_* for security-sensitive behavior (maintainer);
- does write logs to operator-configured locations; redacted query text if log redaction is enabled (documented: docs/topics/impala_logging.xml).

§5a Build-time and configuration variants

Impala ships as a single product but a sizable number of runtime flags materially change the security envelope. The maintainer-confirmed list is at be/src/rpc/authentication.cc and equivalent files; the security-relevant subset:

Flag	Default	Maintainer stance	Effect
`--enable_ldap_auth`	`false` (documented)	dev/test or operating with Kerberos, operator must enable per §10 (maintainer)	enables LDAP auth on HS2 client port
`--ssl_server_certificate` / `--ssl_private_key`	unset (documented)	dev/test, operator must enable per §10 (maintainer)	enables TLS on all listening sockets
`--ssl_minimum_version`	`tlsv1.2` (documented: `docs/topics/impala_ssl.xml`)	hardened in Impala 4.0 from `tlsv1` (documented)	rejects pre-1.2 handshakes
`--webserver_password_file`	unset (documented: `docs/topics/impala_security_webui.xml`)	an unprotected Web UI is `OUT-OF-MODEL: non-default-build` (maintainer)	Web UI authenticates against this `.htpasswd`
`--webserver_certificate_file`	unset (documented)	dev/test, operator must enable per §10 (maintainer)	enables HTTPS on Web UI
`--principal`, `--keytab-file`	unset	dev/test or operating with LDAP, operator must enable per §10 (maintainer)	enables Kerberos auth
`--authorization_provider=ranger`	unset (documented: `docs/topics/impala_authorization.xml`)	dev/test, operator must enable per §10 (maintainer)	enables Ranger authz; absent → all queries run as `impala` user (no enforcement)
`--jwt_token_auth` / `--oauth_token_auth`	`false`	optional alternative auth (documented: `be/src/rpc/authentication.cc`)	enables bearer-token auth
`--jwt_validate_signature`, `--oauth_jwt_validate_signature`	`true`	hardened default; flipping to `false` voids §8 P3 (maintainer)	turns off JWT/OAuth signature check
`--jwt_allow_without_tls`, `--oauth_allow_without_tls`, `--saml2_allow_without_tls_debug_only`	`false`, marked `_hidden`	"debug only" per name (→)	permits bearer / SAML auth over unencrypted transport
`--trusted_domain`, `--trusted_auth_header`	unset (documented)	when set, Impala accepts identity assertions from named peer without re-auth	reachability for `OUT-OF-MODEL: trusted-input` reports
`--trusted_domain_use_xff_header`	`false`	when `true`, parses `X-Forwarded-For` to identify the originating client (documented: `be/src/rpc/authentication.cc` line 132)	exposes a path where a misconfigured proxy can let a client claim any source address (maintainer)
`--internal_principals_whitelist`	`hdfs` (documented: `be/src/rpc/authentication.cc` line 121)	governs which Kerberos principals are accepted on internal RPC ports	misconfiguration permits external service to speak as a peer
`--authorized_proxy_user_config` / `--authorized_proxy_group_config`	unset (documented: `docs/topics/impala_delegation.xml`)	required for Hue-style impersonation; whitelists which authenticated principals may `doas` to which users	breaks B1 if mis-scoped
`--cookie_secret_file`	empty (documented: `be/src/rpc/authentication.cc` line 98)	when unset, HS2-HTTP cookies fall back to per-process random — sessions do not survive cluster restarts but are not forgeable (maintainer)	shared cluster-wide secret for cookie HMAC
`--abort_on_config_error`	`true` (maintainer)	when off, security misconfigurations may not prevent startup
`impala-shell --ssl`	`false`	when true, must also configure `ca_cert` or `verify_cert` to validate server certificate	required for TLS configured endpoints

The insecure-default case. A number of these flags ship in the "off, must be turned on for production" posture. The maintainer ruling on whether the default is a supported production posture is captured in "Maintainer stance".

§6 Assumptions about inputs

Per-endpoint trust table (network surfaces)

Surface / route	Parameter	Attacker-controllable?	Caller must enforce
HS2 binary `:21050`, HS2-HTTP `:28000`, Beeswax `:21000`	SQL text	yes	nothing — Impala parses, plans, and applies Ranger
HS2-HTTP `:28000`	`X-Forwarded-For` header	yes if `--trusted_domain_use_xff_header` is on; never trust otherwise (maintainer)	per §10, only enable behind a load balancer that strips and resets XFF
HS2-HTTP `:28000`	session cookie	signed with `--cookie_secret_file` HMAC; not attacker-forgeable when secret is unguessable (maintainer)	per §10, rotate the cookie-secret file if compromised
HS2-HTTP `:28000`	JWT / OAuth bearer	yes; signature checked when `--jwt_validate_signature=true` (default) (documented: `be/src/rpc/authentication.cc`)	per §10, leave signature checking on, set `--jwt_allow_without_tls=false`
HS2-HTTP `:28000`	`--trusted_auth_header` value	yes; treated as the authenticated identity	never expose the port directly to untrusted peers when this flag is set (maintainer)
Web UI `:25000`/`:25010`/`:25020`	`.htpasswd` credential	yes if `--webserver_password_file` is set	per §10, set the flag; per §10, set `--webserver_certificate_file` for HTTPS
Web UI `:25000`/`:25010`/`:25020`	session cookie	signed with `--cookie_secret_file`	per §10, rotate the cookie-secret file if compromised
Web UI `:25000` — query-profile and admin endpoints	profile ID / GET parameters	yes	Web UI auth is the only gate; sensitive query bytes appear unless log redaction is enabled
KRPC `:27000` and statestore/catalog ports `:23000`/`:24000`/`:26000`	Thrift / KRPC payload	only by a peer that has cleared B3	Kerberos + `internal_principals_whitelist` are the gate
Scanned table files (Parquet, ORC, Avro, text, Iceberg manifests)	file bytes	yes if an authorized writer can land bytes the reader will scan	Ranger separates writers from readers; B7 enforces who can land bytes
`ai_generate_text` LLM endpoint	LLM response	trusted only as far as the LLM is trusted	per §10, treat LLM output as untrusted text (do not pipe to executable contexts) (maintainer)
JDBC external table endpoint	rows returned by remote JDBC	trusted only as far as the remote endpoint is trusted	per §10, model JDBC external tables as data crossing a trust boundary

Size / shape / rate

Impala accepts arbitrary-length SQL but the analyzer rejects queries above implementation limits (controlled by query option max_statement_length_bytes with max value of 2^31 - 1) (maintainer).
Scanned files may be terabytes; row groups are streamed. Pathological encodings (e.g. enormous string lengths in Parquet headers) are robustness concerns (maintainer).
The HS2 / Beeswax surfaces have limited built-in rate limiting; admission control via the --default_pool_max_requests family of flags bounds in-flight queries; --fe_service_threads provides a limited bound on connection and auth-attempt rate (maintainer).

§7 Adversary model

Actors

Actor	In scope?	Capabilities granted
Unauthenticated network peer reaching HS2 / Beeswax / HS2-HTTP	yes	TCP to the listening ports; may attempt authentication; may attempt to violate the protocol pre-auth
Unauthenticated peer reaching Web UI	yes, if the deployment exposes the Web UI publicly	as above for Web UI
Authenticated end user with limited Ranger privileges	yes	execute SQL, read tables the user has `SELECT` on, write tables the user has `INSERT` on
Authenticated end user with broad Ranger privileges	partial	only escapes from their Ranger envelope are in scope
Co-tenant on the same cluster	yes	same as authenticated end user; cross-tenant leakage is in scope
Authorized table writer producing data read by another user	yes for scanner robustness across the B7 boundary, but bounded — `VALID-HARDENING`, not `VALID`, unless memory corruption is reachable (maintainer)
Authenticated proxy front-end (Hue) using `doas`	yes only when `--authorized_proxy_user_config` is mis-scoped
Hostile peer impalad / statestored / catalogd	out of scope — see §3 item 2
Hostile HMS / Ranger	out of scope — see §3 item 2
Operator	out of scope — see §3 item 3
Local process on the same host as `impalad` running as a different user	partial (maintainer): same-host attackers with non-`impala` UID can read the Web UI / HS2 ports unless host firewalling forbids or authentication secured from non-`impala` local users; Impala does not defend against same-host UID-0 attackers
Side-channel observer (cache timing, network timing)	out of scope (maintainer)
Quantum adversary	out of scope

Authenticated-but-Byzantine peer (distributed-systems threshold)

Impala is not a Byzantine-fault-tolerant system. A compromised impalad/catalogd/statestored peer with a valid Kerberos identity can cause unbounded damage (read any data the cluster can read, produce wrong results, leak intermediate state). The cluster trusts its own membership (maintainer). → reports requiring a Byzantine internal peer are OUT-OF-MODEL: adversary-not-in-scope.

§8 Security properties the project provides

For each property: condition, violation symptom, severity tier, provenance.

P1 — Authentication of HS2 / Beeswax / HS2-HTTP clients

Condition: an authentication mode is enabled (--enable_ldap_auth, --principal+--keytab-file, --jwt_token_auth, --oauth_token_auth, or a SAML configuration). With none of these set, Impala accepts unauthenticated SQL (documented: be/src/rpc/authentication.cc).
Violation symptom: a network peer holding no valid credential successfully executes SQL.
Severity: security-critical, VALID per §13.
(documented)

P2 — Authorization of SQL operations via Apache Ranger

Condition: --authorization_provider=ranger is set and Ranger is reachable (documented: docs/topics/impala_authorization.xml). With this flag unset, no authorization is enforced and all queries run as the impala user (documented: docs/topics/impala_security.xml, docs/topics/impala_authorization.xml).
Violation symptom: a query reads or modifies data not licensed by the authenticated principal's Ranger policy. Failure mode includes both the authorization-bypass case (Impala fails to apply a policy) and the authorization-confusion case (Impala applies the wrong policy).
Severity: security-critical, VALID per §13.
(documented)

P3 — TLS confidentiality and integrity on the wire, when configured

Condition: --ssl_server_certificate + --ssl_private_key set on the relevant daemon; minimum version per --ssl_minimum_version (default tlsv1.2 since Impala 4.0) (documented: docs/topics/impala_ssl.xml).
Violation symptom: cleartext on the wire after TLS is configured, or a TLS handshake completing with a deprecated cipher despite --ssl_minimum_version=tlsv1.2.
Severity: security-critical, VALID per §13.
(documented)

P4 — Kerberos authentication on internal RPCs in production

Condition: --principal and --keytab-file are set on all three daemons (documented: docs/topics/impala_kerberos.xml, docs/topics/impala_ldap.xml). Without this, internal RPCs are unauthenticated and a cleartext-internal-RPC report is not a §8 break — the operator violated §10.
Violation symptom: internal RPC completing successfully from a principal not on --internal_principals_whitelist.
Severity: security-critical, VALID per §13.
(documented)

P5 — Bounded scope of authenticated impersonation (`doas`)

Condition: --authorized_proxy_user_config / --authorized_proxy_group_config set; the authenticated front-end principal appears as a key (documented: docs/topics/impala_delegation.xml).
Violation symptom: an authenticated principal successfully runs a query as a delegated user not in their allow-list.
Severity: security-critical, VALID per §13.
(documented)

P6 — Log redaction, when configured

Condition: redaction rules configured per docs/topics/impala_logging.xml#redaction.
Violation symptom: literal values matching configured redaction patterns appearing un-redacted in logs or Web UI query profiles.
Severity: security-critical for data-protection-regulated deployments; VALID per §13.
(documented)

P7 — Web UI authentication, when configured

Condition: --webserver_password_file set; optionally Kerberos SPNEGO.
Violation symptom: an unauthenticated peer accesses an authenticated Web UI endpoint, or an authenticated peer accesses an endpoint above their Web UI auth tier.
Severity: security-critical, VALID per §13.
(documented: docs/topics/impala_security_webui.xml)

P8 — Memory safety on well-formed inputs across documented surfaces

Condition: input matches the documented protocol (HS2 / Beeswax / Thrift / KRPC / Parquet / ORC / Avro / Iceberg manifest / etc.); the host conformant to §5; no _hidden debug flag is in use (maintainer).
Violation symptom: heap or stack corruption, out-of-bounds read/write, use-after-free, double-free reachable from a §6 input.
Severity: security-critical when reachable from network input or from table data crossing B7; VALID-HARDENING when reachable only by a writer who already controls the bytes (§3 item 6).
(maintainer)

P9 — No SQL injection from end-user-supplied parameters into back-end queries against other systems

Condition: applies only to flows where Impala emits SQL to a remote system on behalf of an Impala client (JDBC external tables; ai_generate_text with prompt-as-SQL patterns).
Violation symptom: end-user SQL text appearing un-escaped in a remote- query string.
Severity: case-dependent; VALID-HARDENING if the remote system is also Impala-trusted, VALID if the remote system is a tenant boundary.
(maintainer)

§9 Security properties the project does not provide

State each plainly so a triager can route an inbound report to the matching disclaimer.

No isolation between authenticated user SQL and the impalad process. A user with Ranger privilege to CREATE FUNCTION and to SELECT from the resulting function can run arbitrary native or JVM code inside impalad. UDFs are not sandboxed. See §3 item 5 (maintainer).
No defense against decompression / decoding bombs in scanned files. A malicious or buggy table writer can land Parquet / ORC / Avro / text files designed to maximize CPU and memory; the reader has no built-in cap on per-file resource use (maintainer).
No quotas on per-query or per-user resource consumption beyond what admission control provides. A user with SELECT on a large table can cause arbitrary wall-clock and memory burn. Operator must configure --default_pool_* admission-control flags (maintainer).
No defense against intra-cluster Byzantine failure. A compromised peer with a valid Kerberos identity can read any data the cluster can read; see §7 (maintainer).
No protection against the operator. Anyone with the keytab, the cookie-secret file, the .htpasswd, the impala Unix account, or root on any impala host wins. See §3 item 3.
No protection against a malicious HMS / Ranger. See §3 item 2.
No data-at-rest encryption. Impala writes file bytes through the storage layer's existing protections (HDFS Transparent Data Encryption, S3 SSE, etc.). Impala does not encrypt at the table format level (maintainer).
No defense against side-channel observation (cache, timing, branch prediction) of query plans or data (maintainer).
No constant-time comparison of authentication secrets beyond what the underlying SASL/Kerberos libraries provide (maintainer).
No defender stance against an attacker on the same Linux host running as a non-impala UID — Impala defends only across the network surface; same-host attackers with shell access on the impala host already have many paths to win (maintainer).

False-friend properties (call out separately)

SHOW TABLES / SHOW DATABASES filtering is an authorization view, not an information-flow channel. Object names a user is not authorized to see are hidden, but error messages, query-profile timing, and Web UI traces may reveal existence indirectly (maintainer).
Log redaction is a display feature, not a confidentiality boundary. It obfuscates literals in new log entries when patterns match; it cannot retroactively cleanse leaked log files, and a regex miss leaks the literal.
Kerberos authenticates the principal, not the host the principal connects from. A stolen keytab is a stolen identity.
TLS encrypts but does not authenticate the application-layer identity. Authentication is layered on (LDAP, JWT, etc.); TLS by itself does not authorize.
.htpasswd Web-UI authentication does not provide per-user authorization on the Web UI. Any authenticated .htpasswd user sees all Web UI contents, including query bytes and profiles (maintainer).
--trusted_domain / --trusted_auth_header is an explicit bypass of client authentication. Setting it without controlling the load balancer hands an attacker the keys.
Ranger column-masking and row-filter policies operate at the planner level, not the storage level. Anyone bypassing the planner (a hostile peer reading the file directly via HDFS, a UDF reading raw bytes) is not constrained by them.

Well-known attack classes the project does not defend against

SQL-engine-amplified DoS ("malicious analytic query"): a user with SELECT privilege issuing a Cartesian product across petabyte tables. The fix surface is admission control, not the engine.
Decompression / decoding bombs in supported file formats (see above).
Adversarial table-writer collusion: a writer landing files that crash a downstream reader is VALID-HARDENING at most, because the writer could simply have written wrong data.
Confused-deputy via doas when the proxy list is mis-scoped.
Time-of-check-to-time-of-use between Ranger policy fetch and query execution: policy changes mid-query are not retroactively enforced (maintainer).

§10 Downstream responsibilities

The operator deploying Impala in production must:

Set --principal + --keytab-file on impalad, statestored, catalogd. Without these, internal RPC has no authentication (documented: docs/topics/impala_kerberos.xml, docs/topics/impala_ldap.xml).
Set --authorization_provider=ranger and configure Ranger. Without this, no authorization is enforced (documented: docs/topics/impala_authorization.xml).
Enable TLS — --ssl_server_certificate and --ssl_private_key on all daemons, and --ssl_client_ca_certificate to authenticate the peer of internal RPC (documented: docs/topics/impala_ssl.xml).
Set --webserver_password_file and --webserver_certificate_file so the Web UI is authenticated and TLS-served (documented: docs/topics/impala_security_webui.xml, docs/topics/impala_security_guidelines.xml).
Restrict Web UI ports (:25000/:25010/:25020) at the network layer to a trusted operator subnet (documented: docs/topics/impala_security_guidelines.xml).
Restrict membership in --internal_principals_whitelist to the actual Kerberos principals of cluster members (documented: be/src/rpc/authentication.cc).
Never set --jwt_allow_without_tls=true, --oauth_allow_without_tls=true, or --saml2_allow_without_tls_debug_only=true in production (maintainer).
Never set --trusted_domain / --trusted_auth_header / --trusted_domain_use_xff_header unless the listening port is exposed only to a load balancer that strips and resets the relevant header (maintainer).
Set --cookie_secret_file to a long, random, cluster-wide secret with filesystem permissions restricted to the impala user (documented: be/src/rpc/authentication.cc).
Set --authorized_proxy_user_config / --authorized_proxy_group_config to the smallest set of front-end principals that need doas, with the smallest set of impersonated users (documented: docs/topics/impala_delegation.xml).
Configure log redaction patterns for any sensitive literal that may appear in WHERE-clause queries (documented: docs/topics/impala_logging.xml#redaction).
Secure the OS-level impala Unix user and the root / sudoers set on every Impala host (documented: docs/topics/impala_security_guidelines.xml).
Configure admission control (--default_pool_max_requests, --default_pool_max_queued, --default_pool_mem_limit) to bound per-query and per-pool resource use; Impala does not enforce DoS protection by itself (maintainer).
Treat ai_generate_text results and JDBC external-table rows as crossing a trust boundary; do not assume the remote system is honest (maintainer).
Secure the underlying storage (HDFS, S3, ADLS, Ozone) with native ACLs; Impala enforces only what it can see (documented: docs/topics/impala_security_files.xml).
Provide .impalarc for impala-shell users configuring ssl and either ca_cert or verify_cert (maintainer).

§11 Known misuse patterns

Exposing port 25000 (Web UI) directly to the public Internet without --webserver_password_file. Anyone reaching the port reads in-flight query bytes, server flags, table names. → operator hardening per §10.
Running Impala with --authorization_provider unset in a multi-tenant cluster. All queries succeed as the impala Unix user — there is no authorization at all (documented: docs/topics/impala_authorization.xml).
Setting --trusted_domain without ensuring the listening port is only reachable from the trusted reverse proxy. The flag is a deliberate client-auth bypass for proxy deployments; the operator owns the network fence.
Using CREATE FUNCTION to load a UDF binary supplied by an end user. UDFs run in-process. → Ranger-gate CREATE FUNCTION to administrators.
Treating Impala's SHOW TABLES view as a confidentiality boundary. Existence of a hidden object may leak through error messages or query profiles (maintainer).
Re-using --cookie_secret_file across clusters of different trust levels. A leak in cluster A becomes a forgery primitive in cluster B (maintainer).
Disabling TLS internally between impalad/statestored/catalogd in production. Cleartext internal RPC + Kerberos auth-int is the documented minimum; many deployments leave it at auth (maintainer).
Mixing authenticated and unauthenticated coordinator daemons in the same cluster. Impala 2.0+ accepts both Kerberos and LDAP on the same port; an operator who also leaves a single coordinator unauthenticated produces a bypass (documented: docs/topics/impala_mixed_security.xml).
impala-shell failure to verify server certificate. Invoking impala-shell --ssl without specifying --ca_cert or --verify_cert is a known insecure default that will be addressed in a future release.

§11a Known non-findings (recurring false positives)

This section is the highest-leverage input for automated agentic security scans. Each entry: tool symptom, why it is safe under the model, the § that licenses the call.

"Internal RPC accepts plaintext / no auth" report against :23000, :24000, :26000, :27000. In a model-conforming deployment the operator has set --principal + Kerberos per §10; cleartext is a §10 violation by the operator, not an Impala bug. → OUT-OF-MODEL: non-default-build per §5a.
"Web UI on :25000 reachable without authentication" against an un-.htpasswd-protected cluster. Same shape as above; operator responsibility per §10. → OUT-OF-MODEL: non-default-build.
"--jwt_allow_without_tls=true permits credentials over plaintext" in a config file. The flag is _hidden and named "debug only"; setting it voids §8 P3. → OUT-OF-MODEL: non-default-build (maintainer).
"Path traversal in gzopen-style filename" against scanners. All scanner paths are Ranger-checked URIs, not OS paths; the URI namespace is rooted at the operator-configured warehouse. → OUT-OF-MODEL: trusted-input (maintainer).
"Hardcoded test password / keytab in tests/, testdata/, ssh_keys/." tests/, testdata/, ssh_keys/ are unsupported components. → OUT-OF-MODEL: unsupported-component.
"SQL injection via end-user SQL text." End-user SQL is the input; the engine is designed to interpret it. The Ranger envelope is the authorization boundary, not SQL-text sanitization. → BY-DESIGN: property-disclaimed.
"User-defined function executes arbitrary code in the impalad process." Documented and intentional; admission is Ranger-gated. → BY-DESIGN: property-disclaimed per §9.
"DoS via expensive analytic query on a large table." Admission control is the fix surface, not the engine. → BY-DESIGN: property-disclaimed per §9.
"Decompression bomb in a Parquet/ORC file landed in a warehouse table." Writers must have INSERT; the harm is reachable from an already- privileged actor. → VALID-HARDENING at most, unless it reaches §8 P8 memory safety.
"Unchecked return from malloc in udf_samples/." udf_samples/ is unsupported sample code. → OUT-OF-MODEL: unsupported-component.
"Vendored Apache Kudu code under be/src/kudu/security/ has CVE-X." Report upstream to Apache Kudu; Impala will pick up the fix on the next vendored sync. → OUT-OF-MODEL: unsupported-component (upstream pointer) (maintainer).

§12 Conditions that would change this model

Revise this document when any of the following lands:

A new authentication mechanism on a client-facing surface (e.g. mTLS-as- auth on HS2-HTTP, OIDC, U2F).
A new authorization provider beyond Ranger (e.g. native Impala policy store, OPA integration).
A new data-at-rest encryption story at the Impala layer (currently delegated; see §9).
A new external-data surface (a new JDBC external-table connector, a new REST catalog beyond Iceberg, a new LLM connector beyond ai_generate_text).
A UDF sandboxing story (changes §9 and §3 item 5).
A change in the default value of any §5a flag, especially flags controlling auth (--ssl_minimum_version, --jwt_validate_signature).
A vulnerability report that cannot be cleanly routed to one of the §13 dispositions: that is evidence the model is incomplete.

§13 Triage dispositions

A report against Impala receives exactly one of the following:

Disposition	Meaning	Licensed by
`VALID`	Violates a §8 property via an in-scope §7 adversary using an in-scope §6 input.	§8, §6, §7
`VALID-HARDENING`	No §8 property violated, but a §11 misuse pattern can be made harder to fall into by code change. Fixed at maintainer discretion, typically no CVE.	§11
`OUT-OF-MODEL: trusted-input`	Requires attacker control of a §6 parameter the model marks trusted (e.g. HMS-supplied metadata, Ranger-supplied policy, operator-supplied config flag).	§6
`OUT-OF-MODEL: adversary-not-in-scope`	Requires a §7 actor the model excludes (operator, malicious HMS/Ranger, Byzantine peer, side-channel observer, same-host non-`impala` UID-0).	§7
`OUT-OF-MODEL: unsupported-component`	Lands in `tests/`, `testdata/`, `infra/`, `ssh_keys/`, `udf_samples/`, vendored upstream code under `be/src/kudu/`, etc.	§3 item 7, §3 item 8
`OUT-OF-MODEL: non-default-build`	Only manifests under a §5a flag the maintainer has ruled is dev/test (e.g. `--jwt_allow_without_tls=true`, no `--principal`).	§5a
`OUT-OF-MODEL: equivalent-harm`	An actor already-authorized under the model can cause the same harm via a documented path (writer landing arbitrary file bytes, SQL-privileged user submitting expensive queries).	§3 item 4, §3 item 6
`BY-DESIGN: property-disclaimed`	Concerns a §9 property the project explicitly does not provide (UDF sandboxing, DoS protection, side channels, etc.).	§9
`KNOWN-NON-FINDING`	Matches a §11a recurring false positive.	§11a
`MODEL-GAP`	Cannot be cleanly routed to any of the above — triggers §12 model revision.	§12

Appendix: SECURITY.md → §x back-map

Impala does not currently ship an in-repo SECURITY.md, and the website does not publish a project-level security policy page (https://impala.apache.org/security.html returns 404 at draft time; the landing page only links the generic ASF security URL). The de facto SECURITY-policy artifacts are the in-repo DITA docs under docs/topics/impala_security*.xml, which are the source for the published documentation at https://impala.apache.org/docs/build/html/topics/impala_security.html. The back-map below covers the in-repo source.

Source	Claim	Lands in
`docs/topics/impala_security.xml`	"Impala includes a fine-grained authorization framework … based on Apache Ranger"	§8 P2
`docs/topics/impala_security.xml`	"Impala relies on the Kerberos subsystem for authentication"	§8 P1, §8 P4
`docs/topics/impala_security.xml`	"auditing capability … Impala generates the audit data which can be consumed … by cluster-management components focused on governance"	§10 item 11 (operator picks the audit sink)
`docs/topics/impala_security_guidelines.xml`	"Secure the root account", restrict `sudoers`	§3 item 3, §10 item 12
`docs/topics/impala_security_guidelines.xml`	"Ensure that the Impala web UI … is password-protected"	§10 item 4, §11 first bullet
`docs/topics/impala_security_files.xml`	"All Impala read and write operations are performed under the filesystem privileges of the impala user"	§4 B7, §10 item 15
`docs/topics/impala_authentication.xml`	"Impala supports authentication using either Kerberos or LDAP. You can also make proxy connections through Apache Knox."	§8 P1, §11 (Knox proxy is `--trusted_domain`-shaped)
`docs/topics/impala_authorization.xml`	"By default … Impala does all read and write operations with the privileges of the impala user"	§5a default-row "authorization_provider unset", §8 P2 violation symptom
`docs/topics/impala_ldap.xml`	"You must use the Kerberos authentication mechanism for connections between internal Impala components"	§4 B3, §8 P4
`docs/topics/impala_ssl.xml`	"Impala supports TLS/SSL network encryption … default version was changed from 'tlsv1' to 'tlsv1.2' starting in Impala 4.0"	§8 P3, §5a row
`docs/topics/impala_security_webui.xml`	"This file should only be readable by the Impala process and machine administrators"	§10 item 9
`docs/topics/impala_mixed_security.xml`	"Impala 2.0 and later automatically handles both Kerberos and LDAP authentication"	§11 (mixed-mode misconfig)
`docs/topics/impala_delegation.xml`	"Impala supports delegation where users whose names you specify can delegate the execution of a query to another user"	§8 P5, §10 item 10
`docs/topics/impala_logging.xml#redaction`	"log redaction is a security feature that prevents sensitive information from being displayed in locations used by administrators for monitoring and troubleshooting"	§8 P6, §9 false-friend
`docs/topics/impala_ports.xml`	port inventory	§2 component table, §4 boundary table
`EXPORT_CONTROL.md`	"This software uses OpenSSL to enable TLS-encrypted connections, generate keys for asymmetric cryptography, and generate and verify signatures"	§5 cryptography assumption
`be/src/rpc/authentication.cc` lines 98–283	flag inventory (cookie_secret_file, jwt_, oauth_, trusted_*, internal_principals_whitelist)	§5a, §6, §10

Security: apache/impala

Security

SECURITY.md