From 477d81ba12cc434056cd3c8cbad40809a75a9a61 Mon Sep 17 00:00:00 2001
From: Eric Pugh <epugh@opensourceconnections.com>
Date: Wed, 17 Jun 2026 08:57:26 -0400
Subject: [PATCH] add OpenNLP CVE Vex files

---
 content/solr/vex/2026-05-04-cve-2026-40682.md | 57 ++++++++++++++++
 content/solr/vex/2026-05-04-cve-2026-42027.md | 66 +++++++++++++++++++
 content/solr/vex/2026-05-04-cve-2026-42440.md | 49 ++++++++++++++
 3 files changed, 172 insertions(+)
 create mode 100644 content/solr/vex/2026-05-04-cve-2026-40682.md
 create mode 100644 content/solr/vex/2026-05-04-cve-2026-42027.md
 create mode 100644 content/solr/vex/2026-05-04-cve-2026-42440.md

diff --git a/content/solr/vex/2026-05-04-cve-2026-40682.md b/content/solr/vex/2026-05-04-cve-2026-40682.md
new file mode 100644
index 000000000..3dc153d0f
--- /dev/null
+++ b/content/solr/vex/2026-05-04-cve-2026-40682.md
@@ -0,0 +1,57 @@
+---
+cve: CVE-2026-40682
+category:
+  - solr/vex
+versions: "< 10.1.0"
+jars:
+  - opennlp-tools-1.9.4.jar
+analysis:
+  state: exploitable
+  response:
+    - workaround_available
+    - update
+title: "Apache OpenNLP: XXE in dictionary parsing"
+---
+
+CVE-2026-40682 (CVSS 9.1) is an XML External Entity (XXE) vulnerability in Apache OpenNLP's
+dictionary parsing. The `DictionaryEntryPersistor` class and the public `Dictionary(InputStream)`
+constructor create a SAX parser without enabling `FEATURE_SECURE_PROCESSING` or disabling DTD
+processing, so external entity resolution and DOCTYPE declarations remain fully enabled. An attacker
+who can supply a crafted dictionary file — either directly or embedded in a model archive that
+OpenNLP deserializes — can therefore read local files from the server or trigger outbound requests
+(server-side request forgery). Other OpenNLP XML parsing paths route through the hardened
+`XmlUtil.createSaxParser()` helper, but this code path does not.
+
+The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively via
+`lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no patched 1.9
+release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which parse dictionaries with secure
+XML processing that rejects DOCTYPE declarations and external entities.
+
+#### When Solr is exposed
+
+OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed
+modules are enabled. The vulnerable dictionary-parsing code path becomes reachable when:
+
+* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default.
+* OpenNLP analysis components or update processors are configured to load OpenNLP model or
+  dictionary files (the schema-configured OpenNLP analyzers — tokenizer, POS, chunker, lemmatizer,
+  name-finder — or the `langid` / `analysis-extras` update processors).
+
+Any deployment that enables these modules and parses an OpenNLP model or dictionary an attacker can
+influence is exploitable. A deployment that does not enable them never parses an OpenNLP dictionary
+and is not exposed.
+
+#### Mitigation
+
+The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid`
+modules** unless they are required. If you do use OpenNLP, load model and dictionary files only from
+locations you fully control, since the bundled OpenNLP 1.9.4 does not disable external entities when
+parsing dictionaries.
+
+Every OpenNLP model and dictionary is loaded as a resource through Solr's resource loaders, so the
+archive passes through a single chokepoint before OpenNLP is allowed to deserialize it. This applies
+to both standalone (filesystem configset) and SolrCloud (ZooKeeper) deployments, and whether the
+dictionary is loaded by the `langid` / `analysis-extras` update processors or by the
+schema-configured OpenNLP analyzers.
+
+Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9.
diff --git a/content/solr/vex/2026-05-04-cve-2026-42027.md b/content/solr/vex/2026-05-04-cve-2026-42027.md
new file mode 100644
index 000000000..d302414f3
--- /dev/null
+++ b/content/solr/vex/2026-05-04-cve-2026-42027.md
@@ -0,0 +1,66 @@
+---
+cve: CVE-2026-42027
+category:
+  - solr/vex
+versions: "< 10.1.0"
+jars:
+  - opennlp-tools-1.9.4.jar
+analysis:
+  state: exploitable
+  response:
+    - workaround_available
+    - update
+title: "Apache OpenNLP: Arbitrary class instantiation via model manifest"
+---
+
+CVE-2026-42027 (CVSS 9.8) is an arbitrary class instantiation issue in Apache OpenNLP's
+`ExtensionLoader`. The `instantiateExtension(Class, String)` method loads a class named in a
+model archive's `manifest.properties` via `Class.forName()` and only performs its
+`isAssignableFrom` type check *after* the class has been loaded. Because `Class.forName()`
+runs the target class's static initializer at load time, an attacker who can supply a crafted
+model archive can trigger the static initializer of any class on the classpath (e.g. one that
+performs a JNDI lookup, outbound network I/O, or filesystem access), regardless of the
+type check that follows.
+
+The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively
+via `lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no
+patched 1.9 release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which consults a
+package-prefix allowlist *before* calling `Class.forName()`.
+
+#### When Solr is exposed
+
+OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed
+modules are enabled. The vulnerable code path becomes reachable when:
+
+* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default.
+* OpenNLP analysis components or update processors are configured to load model files.
+
+Any deployment that enables these modules and loads an OpenNLP model an attacker can influence is
+exploitable. A deployment that does not enable them never invokes `ExtensionLoader` and is not
+exposed.
+
+#### Mitigation
+
+The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid`
+modules** unless they are required. If you do use OpenNLP, load model files only from locations you
+fully control and never from untrusted or user-supplied sources.
+
+In Solr 9.11.0 the upstream allowlist defense is backfilled at the application layer, since the
+vulnerable jar cannot yet be upgraded (Solr must track the OpenNLP version used by
+`lucene-analysis-opennlp`). Every OpenNLP model is loaded as a resource through Solr's resource
+loaders, so the archive is validated at that single chokepoint before OpenNLP is allowed to
+deserialize it:
+
+* The archive is opened as a ZIP and every class name it declares — the `manifest.properties`
+  `factory` entry and any class referenced from embedded feature-generator XML descriptors — must
+  resolve under the `opennlp.` package prefix.
+* A model that names a class outside that prefix is rejected and never reaches `ExtensionLoader`,
+  so no attacker-controlled static initializer can run.
+* Validation covers both standalone (filesystem configset) and SolrCloud (ZooKeeper) deployments,
+  and applies whether the model is loaded by the `langid` / `analysis-extras` update processors or
+  by the schema-configured OpenNLP analyzers (tokenizer, POS, chunker, lemmatizer, name-finder).
+
+Solr's own usage only ever references built-in `opennlp.*` factories, so legitimate models load
+unchanged; only crafted models that try to instantiate arbitrary classes are blocked.
+
+Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9.
diff --git a/content/solr/vex/2026-05-04-cve-2026-42440.md b/content/solr/vex/2026-05-04-cve-2026-42440.md
new file mode 100644
index 000000000..f3377acc1
--- /dev/null
+++ b/content/solr/vex/2026-05-04-cve-2026-42440.md
@@ -0,0 +1,49 @@
+---
+cve: CVE-2026-42440
+category:
+  - solr/vex
+versions: "< 10.1.0"
+jars:
+  - opennlp-tools-1.9.4.jar
+analysis:
+  state: exploitable
+  response:
+    - workaround_available
+    - update
+title: "Apache OpenNLP: Out-of-memory denial of service via crafted model file"
+---
+
+CVE-2026-42440 (CVSS 7.5) is an out-of-memory denial-of-service issue in Apache OpenNLP's binary
+model reader. The `AbstractModelReader` methods `getOutcomes()`, `getOutcomePatterns()` and
+`getPredicates()` read a 32-bit signed integer count field from a binary model stream and pass it
+directly to an array allocation without validating it. An attacker who can supply a crafted `.bin`
+model file with a count set to `Integer.MAX_VALUE` triggers an immediate `OutOfMemoryError` during
+model deserialization, before any substantial data is consumed.
+
+The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively via
+`lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no patched 1.9
+release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which validate the count against an
+upper bound (default 10,000,000, configurable via the `OPENNLP_MAX_ENTRIES` system property) before
+allocating.
+
+#### When Solr is exposed
+
+OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed
+modules are enabled. The vulnerable model-loading code path becomes reachable when:
+
+* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default.
+* OpenNLP analysis components or update processors are configured to load OpenNLP model files (the
+  schema-configured OpenNLP analyzers — tokenizer, POS, chunker, lemmatizer, name-finder — or the
+  `langid` / `analysis-extras` update processors).
+
+Any deployment that enables these modules and loads an OpenNLP model an attacker can influence is
+exploitable. A deployment that does not enable them never deserializes an OpenNLP model and is not
+exposed.
+
+#### Mitigation
+
+The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid`
+modules** unless they are required. If you do use OpenNLP, load model files only from locations you
+fully control and never from untrusted or user-supplied sources.
+
+Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9.