Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions content/solr/vex/2026-05-04-cve-2026-40682.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
cve: CVE-2026-40682
category:
- solr/vex
versions: "< 10.1.0"
jars:
- opennlp-tools-1.9.4.jar
analysis:
state: exploitable
response:
- workaround_available
- update
title: "Apache OpenNLP: XXE in dictionary parsing"
---

CVE-2026-40682 (CVSS 9.1) is an XML External Entity (XXE) vulnerability in Apache OpenNLP's
dictionary parsing. The `DictionaryEntryPersistor` class and the public `Dictionary(InputStream)`
constructor create a SAX parser without enabling `FEATURE_SECURE_PROCESSING` or disabling DTD
processing, so external entity resolution and DOCTYPE declarations remain fully enabled. An attacker
who can supply a crafted dictionary file — either directly or embedded in a model archive that
OpenNLP deserializes — can therefore read local files from the server or trigger outbound requests
(server-side request forgery). Other OpenNLP XML parsing paths route through the hardened
`XmlUtil.createSaxParser()` helper, but this code path does not.

The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively via
`lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no patched 1.9
release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which parse dictionaries with secure
Comment on lines +26 to +27
XML processing that rejects DOCTYPE declarations and external entities.

#### When Solr is exposed

OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed
modules are enabled. The vulnerable dictionary-parsing code path becomes reachable when:

* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default.
* OpenNLP analysis components or update processors are configured to load OpenNLP model or
dictionary files (the schema-configured OpenNLP analyzers — tokenizer, POS, chunker, lemmatizer,
name-finder — or the `langid` / `analysis-extras` update processors).

Any deployment that enables these modules and parses an OpenNLP model or dictionary an attacker can
influence is exploitable. A deployment that does not enable them never parses an OpenNLP dictionary
and is not exposed.

#### Mitigation

The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid`
modules** unless they are required. If you do use OpenNLP, load model and dictionary files only from
locations you fully control, since the bundled OpenNLP 1.9.4 does not disable external entities when
parsing dictionaries.

Every OpenNLP model and dictionary is loaded as a resource through Solr's resource loaders, so the
archive passes through a single chokepoint before OpenNLP is allowed to deserialize it. This applies
to both standalone (filesystem configset) and SolrCloud (ZooKeeper) deployments, and whether the
dictionary is loaded by the `langid` / `analysis-extras` update processors or by the
schema-configured OpenNLP analyzers.

Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9.
66 changes: 66 additions & 0 deletions content/solr/vex/2026-05-04-cve-2026-42027.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
cve: CVE-2026-42027
category:
- solr/vex
versions: "< 10.1.0"
jars:
- opennlp-tools-1.9.4.jar
analysis:
state: exploitable
response:
- workaround_available
- update
title: "Apache OpenNLP: Arbitrary class instantiation via model manifest"
---

CVE-2026-42027 (CVSS 9.8) is an arbitrary class instantiation issue in Apache OpenNLP's
`ExtensionLoader`. The `instantiateExtension(Class, String)` method loads a class named in a
model archive's `manifest.properties` via `Class.forName()` and only performs its
`isAssignableFrom` type check *after* the class has been loaded. Because `Class.forName()`
runs the target class's static initializer at load time, an attacker who can supply a crafted
model archive can trigger the static initializer of any class on the classpath (e.g. one that
performs a JNDI lookup, outbound network I/O, or filesystem access), regardless of the
type check that follows.

The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively
via `lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no
patched 1.9 release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which consults a
Comment on lines +26 to +27
package-prefix allowlist *before* calling `Class.forName()`.

#### When Solr is exposed

OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed
modules are enabled. The vulnerable code path becomes reachable when:

* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default.
* OpenNLP analysis components or update processors are configured to load model files.

Any deployment that enables these modules and loads an OpenNLP model an attacker can influence is
exploitable. A deployment that does not enable them never invokes `ExtensionLoader` and is not
exposed.

#### Mitigation

The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid`
modules** unless they are required. If you do use OpenNLP, load model files only from locations you
fully control and never from untrusted or user-supplied sources.

In Solr 9.11.0 the upstream allowlist defense is backfilled at the application layer, since the
vulnerable jar cannot yet be upgraded (Solr must track the OpenNLP version used by
`lucene-analysis-opennlp`). Every OpenNLP model is loaded as a resource through Solr's resource
loaders, so the archive is validated at that single chokepoint before OpenNLP is allowed to
deserialize it:

* The archive is opened as a ZIP and every class name it declares — the `manifest.properties`
`factory` entry and any class referenced from embedded feature-generator XML descriptors — must
resolve under the `opennlp.` package prefix.
* A model that names a class outside that prefix is rejected and never reaches `ExtensionLoader`,
so no attacker-controlled static initializer can run.
* Validation covers both standalone (filesystem configset) and SolrCloud (ZooKeeper) deployments,
and applies whether the model is loaded by the `langid` / `analysis-extras` update processors or
by the schema-configured OpenNLP analyzers (tokenizer, POS, chunker, lemmatizer, name-finder).

Solr's own usage only ever references built-in `opennlp.*` factories, so legitimate models load
unchanged; only crafted models that try to instantiate arbitrary classes are blocked.

Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9.
49 changes: 49 additions & 0 deletions content/solr/vex/2026-05-04-cve-2026-42440.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
cve: CVE-2026-42440
category:
- solr/vex
versions: "< 10.1.0"
jars:
- opennlp-tools-1.9.4.jar
analysis:
state: exploitable
response:
- workaround_available
- update
title: "Apache OpenNLP: Out-of-memory denial of service via crafted model file"
---

CVE-2026-42440 (CVSS 7.5) is an out-of-memory denial-of-service issue in Apache OpenNLP's binary
model reader. The `AbstractModelReader` methods `getOutcomes()`, `getOutcomePatterns()` and
`getPredicates()` read a 32-bit signed integer count field from a binary model stream and pass it
directly to an array allocation without validating it. An attacker who can supply a crafted `.bin`
model file with a count set to `Integer.MAX_VALUE` triggers an immediate `OutOfMemoryError` during
model deserialization, before any substantial data is consumed.

The vulnerable code is present in the `opennlp-tools-1.9.4.jar` that Solr pulls in transitively via
`lucene-analysis-opennlp` (Lucene 9.12.3). The 1.9.x line is end-of-life, so there is no patched 1.9
release; the issue is fixed in OpenNLP 2.5.9 (and 3.0.0-M3), which validate the count against an
upper bound (default 10,000,000, configurable via the `OPENNLP_MAX_ENTRIES` system property) before
Comment on lines +23 to +26
allocating.

#### When Solr is exposed

OpenNLP is not part of a default Solr installation, but it is exploitable once the OpenNLP-backed
modules are enabled. The vulnerable model-loading code path becomes reachable when:

* The `analysis-extras` and/or `langid` modules are enabled — they are not loaded by default.
* OpenNLP analysis components or update processors are configured to load OpenNLP model files (the
schema-configured OpenNLP analyzers — tokenizer, POS, chunker, lemmatizer, name-finder — or the
`langid` / `analysis-extras` update processors).

Any deployment that enables these modules and loads an OpenNLP model an attacker can influence is
exploitable. A deployment that does not enable them never deserializes an OpenNLP model and is not
exposed.

#### Mitigation

The most reliable mitigation is to **not enable the OpenNLP-backed `analysis-extras` or `langid`
modules** unless they are required. If you do use OpenNLP, load model files only from locations you
fully control and never from untrusted or user-supplied sources.

Fully resolved in Solr 10.1 by upgrading the bundled OpenNLP (via `lucene-analysis-opennlp`) to 2.5.9.