Terms of use?
By using this project or its source code, for any purpose and in any shape or form, you grant your implicit agreement to all the following statements:
- You condemn Russia and its military aggression against Ukraine
- You recognize that Russia is an occupant that unlawfully invaded a sovereign state
- You support Ukraine's territorial integrity, including its claims over temporarily occupied territories of Crimea and Donbas
- You reject false narratives perpetuated by Russian state propaganda
To learn more about the war and how you can help, click here. Glory to Ukraine! 🇺🇦
Raffinert.FuzzySharp is a high-performance .NET fuzzy string matching library inspired by SeatGeek's FuzzyWuzzy and RapidFuzz.
It is a bit-parallel accelerated version of the original FuzzySharp, with multiple fixes to the partial_ratio implementation, lower-level distance APIs, cached scorers, and reusable extraction pipelines.
Use Raffinert.FuzzySharp when you need to:
- Score two strings for similarity.
- Find the best match in a list of choices.
- Match strings while ignoring word order, duplicate words, case, punctuation, or abbreviations.
- Reuse cached scorers for one-to-many comparisons.
- Run extraction pipelines sequentially or in parallel.
- Access fast Levenshtein, Indel, and Longest Common Subsequence distance APIs directly.
- Installation
- Compatibility
- Quick Start
- Choosing a Scorer
- Ratio Examples
- Process Extraction
- Fluent Pipeline API
- Using Different Scorers
- Distance APIs
- String Preprocessors
- Performance
- Migration Notes
- Credits
- Support
Install-Package Raffinert.FuzzySharpor
dotnet add package Raffinert.FuzzySharpRaffinert.FuzzySharp targets:
- .NET Standard 2.0 and 2.1
- .NET Framework 4.5, 4.6, 4.6.2, 4.7.2, and 4.8
- .NET Core 3.1
- .NET 6, .NET 8, .NET 9, and .NET 10
Find the best match from a list of choices:
var choices = new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" };
var result = Process.ExtractOne("cowboys", choices);
// (string: Dallas Cowboys, score: 90, index: 3)Compare two strings directly:
Fuzz.Ratio("mysmilarstring", "mysimilarstring");
// 97
Fuzz.WeightedRatio(
"The quick brown fox jimps ofver the small lazy dog",
"the quick brown fox jumps over the small lazy dog");
// 95By default, Fuzz methods compare strings as-is. Process extraction methods use StringPreprocessor.Full by default, which normalizes whitespace, lowercases, and strips non-alphanumeric characters.
| Scorer | Use when |
|---|---|
Fuzz.Ratio |
You need direct similarity between two strings. |
Fuzz.PartialRatio |
One string may be a substring or close substring of the other. |
Fuzz.TokenSortRatio |
Word order should not matter. |
Fuzz.TokenSetRatio |
Duplicate words or extra common words should have less impact. |
Fuzz.TokenInitialismRatio |
You need to compare an initialism with its expanded phrase. |
Fuzz.TokenAbbreviationRatio |
You need abbreviation-aware matching. |
Fuzz.WeightedRatio |
You want a general-purpose scorer that combines several strategies. |
Process.ExtractOne / ExtractTop / ExtractAll |
You need to search a collection of choices. |
Process.Configure() |
You want reusable extraction with caching, custom scorers, or parallel execution. |
Fuzz.Ratio("mysmilarstring", "myawfullysimilarstirng");
// 72
Fuzz.Ratio("mysmilarstring", "mysimilarstring");
// 97Fuzz.PartialRatio("similar", "somewhresimlrbetweenthisstring");
// 71Fuzz.TokenSortRatio("order words out of", " words out of order");
// 100
Fuzz.PartialTokenSortRatio("order words out of", " words out of order");
// 100Fuzz.TokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100
Fuzz.PartialTokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100Fuzz.TokenInitialismRatio("NASA", "National Aeronautics and Space Administration");
// 89
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration");
// 100
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 53
Fuzz.PartialTokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 100Fuzz.TokenAbbreviationRatio("bl 420", "Baseline section 420", StringPreprocessor.Full);
// 40
Fuzz.PartialTokenAbbreviationRatio("bl 420", "Baseline section 420", StringPreprocessor.Full);
// 67Fuzz.WeightedRatio("The quick brown fox jimps ofver the small lazy dog", "the quick brown fox jumps over the small lazy dog");
// 95Find the best match(es) from a collection of choices.
Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
// (string: Dallas Cowboys, score: 90, index: 3)Process.ExtractTop("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, limit: 3);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: bing, score: 36, index: 1), ...]
// With score cutoff
Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, cutoff: 40);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]Process.ExtractSorted("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7), ...]Extraction uses WeightedRatio and Full preprocessing by default. Override these in the method parameters to use different scorers and processing:
Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" }, s => s, ScorerCache.Get<DefaultRatioScorer>());
// (string: Dallas Cowboys, score: 57, index: 3)Extraction can operate on objects of any type. Use the extractor parameter to reduce the object to the string it should be compared on:
var events = new[]
{
new[] { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" },
new[] { "new york yankees vs boston red sox", "Fenway Park", "2011-05-11", "8pm" },
new[] { "atlanta braves vs pittsburgh pirates", "PNC Park", "2011-05-11", "8pm" },
};
var query = new[] { "new york mets vs chicago cubs", "CitiField", "2017-03-19", "8pm" };
var best = Process.ExtractOneBy(query, events, strings => strings[0]);
// (value: { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" }, score: 95, index: 0)If the query is already a string, it can be matched directly against generic choices by providing only a choice extractor:
var events = new[]
{
new { Name = "new york mets vs chicago cubs", Venue = "CitiField" },
new { Name = "chicago cubs vs chicago white sox", Venue = "Wrigley Field" },
new { Name = "philadelphia phillies vs atlanta braves", Venue = "Citizens Bank Park" },
};
var query = "new york mets at chicago cubs";
var best = Process.ExtractOneBy(query, events, e => e.Name);
// best.Value.Name == "new york mets vs chicago cubs"
var top = Process.ExtractTopBy(query, events, e => e.Name, limit: 2);The Process.Configure() fluent builder creates reusable, immutable pipelines with preconfigured scoring, caching, and parallel execution.
Equivalent to the static Process methods, but reusable across multiple queries:
var pipeline = Process.Configure().Build();
var result1 = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
//(string: Dallas Cowboys, score: 90, index: 3)
var result2 = pipeline.ExtractOne(
"chicago cubs",
new[]
{
"Boston Red Sox",
"Los Angeles Dodgers",
"New York Yankees",
"San Francisco Giants",
"St. Louis Cardinals",
"Houston Astros"
});
//(string: San Francisco Giants, score: 45, index: 3)var pipeline = Process.Configure()
.WithScorer(ScorerCache.Get<DefaultRatioScorer>())
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
//(string: Dallas Cowboys, score: 67, index: 3)Enable multi-threaded processing for large choice sets:
var pipeline = Process.Configure()
.Parallel()
.Build();
var results = pipeline.ExtractAll("goolge", largeChoicesList);With ParallelOptions for fine-grained control:
var pipeline = Process.Configure()
.Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
.Build();Automatic caching creates a CachedWeightedRatioScorer per extraction call, pre-initializing internal data structures for the query string:
var pipeline = Process.Configure()
.Cached()
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });Combine caching and parallelism. Builder methods are order independent -- .Cached().Parallel() and .Parallel().Cached() produce identical results:
var pipeline = Process.Configure()
.Cached()
.Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
.Build();
var results = pipeline.ExtractAll("goolge", largeChoicesList);For maximum performance when running the same query against different choice sets, provide an externally managed ICachedRatioScorer. The scorer pre-initializes once and is reused across all extraction calls:
using var scorer = new CachedWeightedRatioScorer("new york mets at atlanta braves");
var pipeline = Process.Configure()
.Cached(scorer)
.Parallel()
.Build();
var results1 = pipeline.ExtractAll(choiceSet1);
var results2 = pipeline.ExtractAll(choiceSet2);Note: External cached scorers implement
IDisposable. Useusingto ensure proper cleanup.
Pass a CancellationToken via ParallelOptions to cancel long-running parallel extractions:
var cts = new CancellationTokenSource();
var pipeline = Process.Configure()
.Cached()
.Parallel(new ParallelOptions { CancellationToken = cts.Token })
.Build();
// Throws OperationCanceledException if cancelled
var results = pipeline.ExtractAll(query, largeChoicesList).ToList();Stateless scorers for use with Process static methods and the WithScorer() builder method:
var ratio = ScorerCache.Get<DefaultRatioScorer>();
var partialRatio = ScorerCache.Get<PartialRatioScorer>();
var tokenSet = ScorerCache.Get<TokenSetScorer>();
var partialTokenSet = ScorerCache.Get<PartialTokenSetScorer>();
var tokenSort = ScorerCache.Get<TokenSortScorer>();
var partialTokenSort = ScorerCache.Get<PartialTokenSortScorer>();
var tokenAbbreviation = ScorerCache.Get<TokenAbbreviationScorer>();
var partialTokenAbbrev = ScorerCache.Get<PartialTokenAbbreviationScorer>();
var weighted = ScorerCache.Get<WeightedRatioScorer>();Pre-initialize with a query string for repeated comparisons. These implement IDisposable:
using var scorer = new CachedWeightedRatioScorer("search query");
int score = scorer.Score("candidate string");Available cached scorers:
CachedWeightedRatioScorer-- weighted combination (default for.Cached())CachedDefaultRatioScorer-- simple Levenshtein ratioCachedTokenSortScorer-- token sort ratioCachedTokenSetScorer-- token set ratioCachedPartialTokenSetScorer-- partial token set ratioCachedTokenDifferenceScorer-- token difference ratio
Low-level access to the bit-parallel Levenshtein distance implementation:
// Edit distance
int distance = Levenshtein.Distance("kitten", "sitting");
// 3
// Normalized similarity (1.0 = identical, 0.0 = completely different)
double similarity = Levenshtein.NormalizedSimilarity("kitten", "sitting");
// 0.5714285714285714
// Edit operations to transform one string into another
EditOp[] ops = Levenshtein.GetEditOps("kitten", "sitting");
// [REPLACE(0, 0), REPLACE(4, 4), INSERT(6, 6)]The Levenshtein, Indel, and LongestCommonSubsequence classes also offer an instance API for one-to-many comparisons. The constructor pre-computes a bit-parallel pattern match vector from the source string, which is then reused across all subsequent calls. This avoids rebuilding the internal data structure on every comparison, giving a significant speedup when comparing one source against many targets.
All three implement IDisposable -- use using to return pooled arrays.
using var lev = new Levenshtein("chicago cubs vs new york mets");
int d1 = lev.DistanceFrom("new york mets vs chicago cubs");
// 22
int d2 = lev.DistanceFrom("atlanta braves vs pittsburgh pirates");
// 26Indel distance counts only insertions and deletions (no replacements). NormalizedSimilarityWith returns a value between 0.0 (completely different) and 1.0 (identical):
using var indel = new Indel("chicago cubs");
int distance = indel.DistanceFrom("chicago white sox");
// 11
double similarity = indel.NormalizedSimilarityWith("chicago white sox");
// 0.6206896551724138A generic variant IndelT<T> is available for comparing sequences of any IEquatable<T>:
using var indel = new IndelT<string>(new[] { "hello", "world" });
int distance = indel.DistanceFrom(new[] { "hello", "there" });
// 2
double similarity = indel.NormalizedSimilarityWith(new[] { "hello", "there" });
// 0.5LCS distance is defined as max(len1, len2) - LCS_length:
using var lcs = new LongestCommonSubsequence("chicago cubs");
int distance = lcs.DistanceFrom("chicago white sox");
// 8By default, Fuzz methods compare strings as-is. Pass StringPreprocessor.Full to normalize whitespace, lowercase, and strip non-alphanumeric characters before comparing:
Fuzz.Ratio("new york mets", "NEW YORK METS");
// < 100 (case sensitive)
Fuzz.Ratio("new york mets", "NEW YORK METS", StringPreprocessor.Full);
// 100 (case insensitive after preprocessing)Process extraction methods use StringPreprocessor.Full by default. Pass StringPreprocessor.None (or a custom processor function) to override this behavior.
The distance implementations use bit-parallel algorithms designed to reduce both execution time and allocations. The table below compares naive DP Levenshtein distance calculation (baseline), original FuzzySharp, Fastenshtein, Quickenshtein, and Raffinert.FuzzySharp.
Random words of 3 to 1024 random chars (LevenshteinLarge.cs):
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| NaiveDp | 231.563 ms | 57.5403 ms | 3.1540 ms | 1.00 | 0.02 | 43500.0000 | 34500.0000 | 275312920 B | 1.000 |
| FuzzySharp | 141.820 ms | 4.0905 ms | 0.2242 ms | 0.61 | 0.01 | - | - | 1545732 B | 0.006 |
| Fastenshtein | 123.356 ms | 13.0959 ms | 0.7178 ms | 0.53 | 0.01 | - | - | 34028 B | 0.000 |
| Quickenshtein | 12.918 ms | 12.8046 ms | 0.7019 ms | 0.06 | 0.00 | - | - | 12 B | 0.000 |
| Raffinert.FuzzySharp | 4.970 ms | 0.3311 ms | 0.0181 ms | 0.02 | 0.00 | - | - | 3051 B | 0.000 |
Benchmark sources are in FuzzySharp.Benchmarks.
The package name is Raffinert.FuzzySharp, and the default namespace is Raffinert.FuzzySharp. This package keeps the familiar FuzzyWuzzy-style API while adding faster distance implementations, fixed partial-ratio behavior, cached scorers, and the fluent process pipeline.
PreprocessMode-based APIs were replaced withFunc<string, string>preprocessors.- Use
StringPreprocessor.Full,StringPreprocessor.None, or a custom delegate instead ofPreprocessMode.FullorPreprocessMode.None. - Generic extraction methods that use extractor delegates are now named
Extract*By, such asExtractOneByandExtractTopBy.
- Adam Cohen (seatgeek/fuzzywuzzy)
- Antti Haapala (python-Levenshtein)
- David Necas (python-Levenshtein)
- Jacob Bayer (original FuzzySharp library)
- Max Bachmann (RapidFuzz)
- Mikko Ohtamaa (python-Levenshtein)
- Panayiotis (Java implementation)
Support the project through GitHub Sponsors or via PayPal.
See CHANGELOG.md for release history.