From 477d81ba12cc434056cd3c8cbad40809a75a9a61 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 17 Jun 2026 08:57:26 -0400 Subject: [PATCH] add OpenNLP CVE Vex files --- content/solr/vex/2026-05-04-cve-2026-40682.md | 57 ++++++++++++++++ content/solr/vex/2026-05-04-cve-2026-42027.md | 66 +++++++++++++++++++ content/solr/vex/2026-05-04-cve-2026-42440.md | 49 ++++++++++++++ 3 files changed, 172 insertions(+) create mode 100644 content/solr/vex/2026-05-04-cve-2026-40682.md create mode 100644 content/solr/vex/2026-05-04-cve-2026-42027.md create mode 100644 content/solr/vex/2026-05-04-cve-2026-42440.md diff --git a/content/solr/vex/2026-05-04-cve-2026-40682.md b/content/solr/vex/2026-05-04-cve-2026-40682.md new file mode 100644 index 000000000..3dc153d0f --- /dev/null +++ b/content/solr/vex/2026-05-04-cve-2026-40682.md @@ -0,0 +1,57 @@ +--- +cve: CVE-2026-40682 +category: + - solr/vex +versions: "< 10.1.0" +jars: + - opennlp-tools-1.9.4.jar +analysis: + state: exploitable + response: + - workaround_available + - update +title: "Apache OpenNLP: XXE in dictionary parsing" +--- + +CVE-2026-40682 (CVSS 9.1) is an XML External Entity (XXE) vulnerability in Apache OpenNLP's +dictionary parsing. The `DictionaryEntryPersistor` class and the public `Dictionary(InputStream)` +constructor create a SAX parser without enabling `FEATURE_SECURE_PROCESSING` or disabling DTD +processing, so external entity resolution and DOCTYPE declarations remain fully enabled. An attacker +who can supply a crafted dictionary file — either directly or embedded in a model archive that +OpenNLP deserializes — can therefore read local files from the server or trigger outbound requests +(server-side request forgery). Other OpenNLP XML parsing paths route through the hardened +`XmlUtil.createSaxParser()` helper, but this code path does not. + +The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively via +`lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no patched 1.9 +release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which parse dictionaries with secure +XML processing that rejects DOCTYPE declarations and external entities. + +#### When Solr is exposed + +OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed +modules are enabled. The vulnerable dictionary-parsing code path becomes reachable when: + +* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default. +* OpenNLP analysis components or update processors are configured to load OpenNLP model or + dictionary files (the schema-configured OpenNLP analyzers — tokenizer, POS, chunker, lemmatizer, + name-finder — or the `langid` / `analysis-extras` update processors). + +Any deployment that enables these modules and parses an OpenNLP model or dictionary an attacker can +influence is exploitable. A deployment that does not enable them never parses an OpenNLP dictionary +and is not exposed. + +#### Mitigation + +The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid` +modules** unless they are required. If you do use OpenNLP, load model and dictionary files only from +locations you fully control, since the bundled OpenNLP 1.9.4 does not disable external entities when +parsing dictionaries. + +Every OpenNLP model and dictionary is loaded as a resource through Solr's resource loaders, so the +archive passes through a single chokepoint before OpenNLP is allowed to deserialize it. This applies +to both standalone (filesystem configset) and SolrCloud (ZooKeeper) deployments, and whether the +dictionary is loaded by the `langid` / `analysis-extras` update processors or by the +schema-configured OpenNLP analyzers. + +Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9. diff --git a/content/solr/vex/2026-05-04-cve-2026-42027.md b/content/solr/vex/2026-05-04-cve-2026-42027.md new file mode 100644 index 000000000..d302414f3 --- /dev/null +++ b/content/solr/vex/2026-05-04-cve-2026-42027.md @@ -0,0 +1,66 @@ +--- +cve: CVE-2026-42027 +category: + - solr/vex +versions: "< 10.1.0" +jars: + - opennlp-tools-1.9.4.jar +analysis: + state: exploitable + response: + - workaround_available + - update +title: "Apache OpenNLP: Arbitrary class instantiation via model manifest" +--- + +CVE-2026-42027 (CVSS 9.8) is an arbitrary class instantiation issue in Apache OpenNLP's +`ExtensionLoader`. The `instantiateExtension(Class, String)` method loads a class named in a +model archive's `manifest.properties` via `Class.forName()` and only performs its +`isAssignableFrom` type check *after* the class has been loaded. Because `Class.forName()` +runs the target class's static initializer at load time, an attacker who can supply a crafted +model archive can trigger the static initializer of any class on the classpath (e.g. one that +performs a JNDI lookup, outbound network I/O, or filesystem access), regardless of the +type check that follows. + +The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively +via `lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no +patched 1.9 release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which consults a +package-prefix allowlist *before* calling `Class.forName()`. + +#### When Solr is exposed + +OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed +modules are enabled. The vulnerable code path becomes reachable when: + +* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default. +* OpenNLP analysis components or update processors are configured to load model files. + +Any deployment that enables these modules and loads an OpenNLP model an attacker can influence is +exploitable. A deployment that does not enable them never invokes `ExtensionLoader` and is not +exposed. + +#### Mitigation + +The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid` +modules** unless they are required. If you do use OpenNLP, load model files only from locations you +fully control and never from untrusted or user-supplied sources. + +In Solr 9.11.0 the upstream allowlist defense is backfilled at the application layer, since the +vulnerable jar cannot yet be upgraded (Solr must track the OpenNLP version used by +`lucene-analysis-opennlp`). Every OpenNLP model is loaded as a resource through Solr's resource +loaders, so the archive is validated at that single chokepoint before OpenNLP is allowed to +deserialize it: + +* The archive is opened as a ZIP and every class name it declares — the `manifest.properties` + `factory` entry and any class referenced from embedded feature-generator XML descriptors — must + resolve under the `opennlp.` package prefix. +* A model that names a class outside that prefix is rejected and never reaches `ExtensionLoader`, + so no attacker-controlled static initializer can run. +* Validation covers both standalone (filesystem configset) and SolrCloud (ZooKeeper) deployments, + and applies whether the model is loaded by the `langid` / `analysis-extras` update processors or + by the schema-configured OpenNLP analyzers (tokenizer, POS, chunker, lemmatizer, name-finder). + +Solr's own usage only ever references built-in `opennlp.*` factories, so legitimate models load +unchanged; only crafted models that try to instantiate arbitrary classes are blocked. + +Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9. diff --git a/content/solr/vex/2026-05-04-cve-2026-42440.md b/content/solr/vex/2026-05-04-cve-2026-42440.md new file mode 100644 index 000000000..f3377acc1 --- /dev/null +++ b/content/solr/vex/2026-05-04-cve-2026-42440.md @@ -0,0 +1,49 @@ +--- +cve: CVE-2026-42440 +category: + - solr/vex +versions: "< 10.1.0" +jars: + - opennlp-tools-1.9.4.jar +analysis: + state: exploitable + response: + - workaround_available + - update +title: "Apache OpenNLP: Out-of-memory denial of service via crafted model file" +--- + +CVE-2026-42440 (CVSS 7.5) is an out-of-memory denial-of-service issue in Apache OpenNLP's binary +model reader. The `AbstractModelReader` methods `getOutcomes()`, `getOutcomePatterns()` and +`getPredicates()` read a 32-bit signed integer count field from a binary model stream and pass it +directly to an array allocation without validating it. An attacker who can supply a crafted `.bin` +model file with a count set to `Integer.MAX_VALUE` triggers an immediate `OutOfMemoryError` during +model deserialization, before any substantial data is consumed. + +The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively via +`lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no patched 1.9 +release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which validate the count against an +upper bound (default 10,000,000, configurable via the `OPENNLP_MAX_ENTRIES` system property) before +allocating. + +#### When Solr is exposed + +OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed +modules are enabled. The vulnerable model-loading code path becomes reachable when: + +* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default. +* OpenNLP analysis components or update processors are configured to load OpenNLP model files (the + schema-configured OpenNLP analyzers — tokenizer, POS, chunker, lemmatizer, name-finder — or the + `langid` / `analysis-extras` update processors). + +Any deployment that enables these modules and loads an OpenNLP model an attacker can influence is +exploitable. A deployment that does not enable them never deserializes an OpenNLP model and is not +exposed. + +#### Mitigation + +The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid` +modules** unless they are required. If you do use OpenNLP, load model files only from locations you +fully control and never from untrusted or user-supplied sources. + +Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9.