antlr-rust-runtime is a pure Rust runtime and metadata generator for ANTLR v4
lexers and parsers. It is a clean-room implementation written from scratch from
the public ANTLR runtime contract; it does not vendor or fork an older Rust
ANTLR runtime.
Follow the ANTLR getting-started guide and install the ANTLR tool jar. The
runtime tests currently validate against ANTLR 4.13.2.
Each ANTLR target language needs a runtime package used by generated parsers. For Rust projects, add the runtime crate:
[dependencies]
antlr-rust-runtime = "0.3"The library crate is imported as antlr4_runtime:
use antlr4_runtime::{CommonTokenStream, InputStream};Install the companion generator binary:
cargo install antlr-rust-runtimeThis installs antlr4-rust-gen, which turns ANTLR .interp metadata into Rust
lexer and parser modules.
The current release uses a metadata-first generation path:
- run the official ANTLR tool to produce
.interpfiles, - run
antlr4-rust-gento emit Rust modules, - compile those modules against
antlr4_runtime.
For a split lexer/parser grammar:
antlr4 MyGrammarLexer.g4 MyGrammarParser.g4
antlr4-rust-gen \
--lexer MyGrammarLexer.interp \
--parser MyGrammarParser.interp \
--out-dir src/generatedThe checked-in ANTLR RustTarget/StringTemplate shell is kept in tool/ and
will be expanded around the same runtime contracts.
antlr-ng is a TypeScript/npm
parser generator based on ANTLR 4.13.2. It does not currently ship a Rust
target, but it can produce the same .interp metadata that antlr4-rust-gen
uses.
Install it with npm or run it through npx:
npx -y antlr-ng -Dlanguage=Java -o build/antlr --exact-output-dir true JSON.g4The -Dlanguage=Java option selects one of antlr-ng's bundled code-generation
targets only so the tool emits grammar artifacts, including JSONLexer.interp
and JSON.interp. The Java files can be ignored; Rust code still comes from
antlr4-rust-gen:
antlr4-rust-gen \
--lexer build/antlr/JSONLexer.interp \
--parser build/antlr/JSON.interp \
--out-dir src/generatedFor local tooling, antlr-ng requires Node.js 20 or newer. See the antlr-ng getting-started guide for CLI installation and option details.
Suppose you are using the JSON grammar from antlr/grammars-v4/json.
Fetch or copy JSON.g4, then generate ANTLR metadata:
antlr4 JSON.g4Generate Rust modules:
antlr4-rust-gen \
--lexer JSONLexer.interp \
--parser JSON.interp \
--out-dir src/generatedDeclare the generated modules in your crate:
mod generated {
#![allow(dead_code)]
pub mod json;
pub mod json_lexer;
}Call the generated lexer and parser:
use antlr4_runtime::{CommonTokenStream, InputStream};
use generated::json::Json;
use generated::json_lexer::JsonLexer;
fn main() -> Result<(), antlr4_runtime::AntlrError> {
let lexer = JsonLexer::new(InputStream::new(r#"{"a":1}"#));
let tokens = CommonTokenStream::new(lexer);
let mut parser = Json::new(tokens);
let tree = parser.json()?;
println!("{}", tree.text());
Ok(())
}- Pure Rust runtime implementation.
- Written from scratch as a clean-room implementation.
- Supports ANTLR serialized ATN deserialization.
- Supports lexer and parser execution through generated Rust wrappers.
- Supports real split lexer/parser grammars, including Kotlin smoke builds.
- Passes every upstream ANTLR runtime-testsuite descriptor discovered by the
harness:
357 passed, 0 failed, 0 skipped, 357 run. - Licensed under BSD-3-Clause for compatibility with ANTLR's runtime licensing pattern and downstream open-source applications.
The runtime contains:
IntStreamandCharStream- UTF-8 input as Unicode scalar values
Token,CommonToken, token factories, andTokenSource- buffered, channel-aware
CommonTokenStream Vocabulary- recognizer metadata and error listener plumbing
- parse tree node types, rule contexts, terminal nodes, error nodes, and walkers
- ANTLR v4 serialized ATN deserialization
- lexer ATN recognition with longest-match/rule-priority behavior and lexer actions
- parser ATN rule recognition with backtracking over token stream indices
antlr4-rust-gen, a Rust generator that consumes ANTLR.interpmetadata and emits Rust modulesantlr4-runtime-testsuite, a harness for running upstream ANTLR runtime-test descriptors through the Rust metadata path
See docs/kotlin-build.md for the Kotlin smoke workflow. See docs/runtime-testsuite.md for the upstream runtime-testsuite harness.
On the maintainer checkout, where the ANTLR jar and upstream runtime-testsuite
live under /tmp/antlr-cleanroom, run the full sweep with:
cargo run --quiet --bin antlr4-runtime-testsuiteRun a specific descriptor:
cargo run --bin antlr4-runtime-testsuite -- \
--antlr-jar path/to/antlr-4.13.2-complete.jar \
--descriptors path/to/antlr4/runtime-testsuite \
--case LexerExec/KeywordIDtools/parse-bench/ benchmarks parse throughput of the generated Rust parsers
against the upstream Go runtime (github.com/antlr4-go/antlr/v4) — and
optionally the reference Python runtime and tree-sitter — on real-world Kotlin,
C#, Java, and Trino SQL fixtures. See
tools/parse-bench/README.md for setup (the
ANTLR jar, the grammars-v4 sparse checkout, and the Python dependencies).
Run the Rust-vs-Go comparison across all fixture languages:
python3 tools/parse-bench/run.py \
--languages kotlin,csharp,java,trino \
--runtimes rust-antlr,go-antlr \
--quick \
--json target/parse-bench/results.json \
--markdown target/parse-bench/results.mdThe report prints min/avg parse time and a ratio against rust-antlr for
every fixture. Drop --quick (or add --iters/--warmups) for longer, lower
variance runs; add --runtimes rust-antlr,go-antlr,python-antlr,tree-sitter to
include the other runtimes.
Relative parse speed of this runtime versus the Go runtime, summarized as the
geometric mean of the per-fixture go ÷ rust parse-time ratios in each language
group (> 1.0 means Rust is faster than Go; < 1.0 means slower):
| Language | Fixtures | Rust vs Go (parse time) |
|---|---|---|
| Kotlin | 4 | ~10× faster |
| Java | 4 | ~0.9× (roughly on par) |
| C# | 4 | ~0.45× (Go ~2.2× faster) |
| Trino SQL | 5 | ~0.4× (Go ~2.6× faster) |
Rust is dramatically faster on Kotlin (expression-ladder memoization in the
generated walker) and near parity on Java; C# and Trino remain ahead for Go and
are the focus of ongoing prediction/closure optimization. Numbers are quick-mode
(--quick, best-of-min) on an Apple M3 Pro and are indicative — re-run the
benchmark on your own hardware for authoritative figures.
- ANTLR: https://www.antlr.org/
- ANTLR documentation: https://github.com/antlr/antlr4/blob/dev/doc/index.md
- Grammars v4: https://github.com/antlr/grammars-v4