Raffinert.FuzzySharp

Terms of use^?

By using this project or its source code, for any purpose and in any shape or form, you grant your implicit agreement to all the following statements:

You condemn Russia and its military aggression against Ukraine
You recognize that Russia is an occupant that unlawfully invaded a sovereign state
You support Ukraine's territorial integrity, including its claims over temporarily occupied territories of Crimea and Donbas
You reject false narratives perpetuated by Russian state propaganda

To learn more about the war and how you can help, click here. Glory to Ukraine! 🇺🇦

Raffinert.FuzzySharp

Raffinert.FuzzySharp is a high-performance .NET fuzzy string matching library inspired by SeatGeek's FuzzyWuzzy and RapidFuzz.

It is a bit-parallel accelerated version of the original FuzzySharp, with multiple fixes to the partial_ratio implementation, lower-level distance APIs, cached scorers, and reusable extraction pipelines.

Overview

Use Raffinert.FuzzySharp when you need to:

Score two strings for similarity.
Find the best match in a list of choices.
Match strings while ignoring word order, duplicate words, case, punctuation, or abbreviations.
Reuse cached scorers for one-to-many comparisons.
Run extraction pipelines sequentially or in parallel.
Access fast Levenshtein, Indel, and Longest Common Subsequence distance APIs directly.

Installation

Install-Package Raffinert.FuzzySharp

or

dotnet add package Raffinert.FuzzySharp

Compatibility

Raffinert.FuzzySharp targets:

.NET Standard 2.0 and 2.1
.NET Framework 4.5, 4.6, 4.6.2, 4.7.2, and 4.8
.NET Core 3.1
.NET 6, .NET 8, .NET 9, and .NET 10

Quick Start

Find the best match from a list of choices:

var choices = new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" };

var result = Process.ExtractOne("cowboys", choices);
// (string: Dallas Cowboys, score: 90, index: 3)

Compare two strings directly:

Fuzz.Ratio("mysmilarstring", "mysimilarstring");
// 97

Fuzz.WeightedRatio(
    "The quick brown fox jimps ofver the small lazy dog",
    "the quick brown fox jumps over the small lazy dog");
// 95

By default, Fuzz methods compare strings as-is. Process extraction methods use StringPreprocessor.Full by default, which normalizes whitespace, lowercases, and strips non-alphanumeric characters.

Choosing a Scorer

Scorer	Use when
`Fuzz.Ratio`	You need direct similarity between two strings.
`Fuzz.PartialRatio`	One string may be a substring or close substring of the other.
`Fuzz.TokenSortRatio`	Word order should not matter.
`Fuzz.TokenSetRatio`	Duplicate words or extra common words should have less impact.
`Fuzz.TokenInitialismRatio`	You need to compare an initialism with its expanded phrase.
`Fuzz.TokenAbbreviationRatio`	You need abbreviation-aware matching.
`Fuzz.WeightedRatio`	You want a general-purpose scorer that combines several strategies.
`Process.ExtractOne` / `ExtractTop` / `ExtractAll`	You need to search a collection of choices.
`Process.Configure()`	You want reusable extraction with caching, custom scorers, or parallel execution.

Ratio Examples

Simple Ratios

Run .NET fiddle

Fuzz.Ratio("mysmilarstring", "myawfullysimilarstirng");
// 72
Fuzz.Ratio("mysmilarstring", "mysimilarstring");
// 97

Partial Ratio

Run .NET fiddle

Fuzz.PartialRatio("similar", "somewhresimlrbetweenthisstring");
// 71

Token Sort Ratio

Run .NET fiddle

Fuzz.TokenSortRatio("order words out of", "  words out of order");
// 100
Fuzz.PartialTokenSortRatio("order words out of", "  words out of order");
// 100

Token Set Ratio

Run .NET fiddle

Fuzz.TokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100
Fuzz.PartialTokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100

Token Initialism Ratio

Run .NET fiddle

Fuzz.TokenInitialismRatio("NASA", "National Aeronautics and Space Administration");
// 89
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration");
// 100

Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 53
Fuzz.PartialTokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 100

Token Abbreviation Ratio

Run .NET fiddle

Fuzz.TokenAbbreviationRatio("bl 420", "Baseline section 420", StringPreprocessor.Full);
// 40
Fuzz.PartialTokenAbbreviationRatio("bl 420", "Baseline section 420", StringPreprocessor.Full);
// 67

Weighted Ratio

Run .NET fiddle

Fuzz.WeightedRatio("The quick brown fox jimps ofver the small lazy dog", "the quick brown fox jumps over the small lazy dog");
// 95

Process Extraction

Find the best match(es) from a collection of choices.

Run .NET fiddle

Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
// (string: Dallas Cowboys, score: 90, index: 3)

Process.ExtractTop("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, limit: 3);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]

Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: bing, score: 36, index: 1), ...]

// With score cutoff
Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, cutoff: 40);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]

Process.ExtractSorted("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7), ...]

Extraction uses WeightedRatio and Full preprocessing by default. Override these in the method parameters to use different scorers and processing:

Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" }, s => s, ScorerCache.Get<DefaultRatioScorer>());
// (string: Dallas Cowboys, score: 57, index: 3)

Generic Type Extraction

Run .NET fiddle

Extraction can operate on objects of any type. Use the extractor parameter to reduce the object to the string it should be compared on:

var events = new[]
{
    new[] { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" },
    new[] { "new york yankees vs boston red sox", "Fenway Park", "2011-05-11", "8pm" },
    new[] { "atlanta braves vs pittsburgh pirates", "PNC Park", "2011-05-11", "8pm" },
};
var query = new[] { "new york mets vs chicago cubs", "CitiField", "2017-03-19", "8pm" };
var best = Process.ExtractOneBy(query, events, strings => strings[0]);
// (value: { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" }, score: 95, index: 0)

If the query is already a string, it can be matched directly against generic choices by providing only a choice extractor:

var events = new[]
{
    new { Name = "new york mets vs chicago cubs", Venue = "CitiField" },
    new { Name = "chicago cubs vs chicago white sox", Venue = "Wrigley Field" },
    new { Name = "philadelphia phillies vs atlanta braves", Venue = "Citizens Bank Park" },
};

var query = "new york mets at chicago cubs";

var best = Process.ExtractOneBy(query, events, e => e.Name);
// best.Value.Name == "new york mets vs chicago cubs"

var top = Process.ExtractTopBy(query, events, e => e.Name, limit: 2);

Fluent Pipeline API

The Process.Configure() fluent builder creates reusable, immutable pipelines with preconfigured scoring, caching, and parallel execution.

Basic Pipeline

Equivalent to the static Process methods, but reusable across multiple queries:

Run .NET fiddle

var pipeline = Process.Configure().Build();

var result1 = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
//(string: Dallas Cowboys, score: 90, index: 3)
var result2 = pipeline.ExtractOne(
    "chicago cubs",
    new[]
    {
        "Boston Red Sox",
        "Los Angeles Dodgers",
        "New York Yankees",
        "San Francisco Giants",
        "St. Louis Cardinals",
        "Houston Astros"
    });
//(string: San Francisco Giants, score: 45, index: 3)

Custom Scorer

Run .NET fiddle

var pipeline = Process.Configure()
    .WithScorer(ScorerCache.Get<DefaultRatioScorer>())
    .Build();

var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
//(string: Dallas Cowboys, score: 67, index: 3)

Parallel Execution

Enable multi-threaded processing for large choice sets:

var pipeline = Process.Configure()
    .Parallel()
    .Build();

var results = pipeline.ExtractAll("goolge", largeChoicesList);

With ParallelOptions for fine-grained control:

var pipeline = Process.Configure()
    .Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
    .Build();

Cached Execution

Run .NET fiddle

Automatic caching creates a CachedWeightedRatioScorer per extraction call, pre-initializing internal data structures for the query string:

var pipeline = Process.Configure()
    .Cached()
    .Build();

var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });

Cached + Parallel

Combine caching and parallelism. Builder methods are order independent -- .Cached().Parallel() and .Parallel().Cached() produce identical results:

var pipeline = Process.Configure()
    .Cached()
    .Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
    .Build();

var results = pipeline.ExtractAll("goolge", largeChoicesList);

External Cached Scorer

For maximum performance when running the same query against different choice sets, provide an externally managed ICachedRatioScorer. The scorer pre-initializes once and is reused across all extraction calls:

using var scorer = new CachedWeightedRatioScorer("new york mets at atlanta braves");

var pipeline = Process.Configure()
    .Cached(scorer)
    .Parallel()
    .Build();

var results1 = pipeline.ExtractAll(choiceSet1);
var results2 = pipeline.ExtractAll(choiceSet2);

Note: External cached scorers implement IDisposable. Use using to ensure proper cleanup.

CancellationToken Support

Pass a CancellationToken via ParallelOptions to cancel long-running parallel extractions:

var cts = new CancellationTokenSource();

var pipeline = Process.Configure()
    .Cached()
    .Parallel(new ParallelOptions { CancellationToken = cts.Token })
    .Build();

// Throws OperationCanceledException if cancelled
var results = pipeline.ExtractAll(query, largeChoicesList).ToList();

Using Different Scorers

Non-Cached Scorers (`IRatioScorer`)

Stateless scorers for use with Process static methods and the WithScorer() builder method:

var ratio              = ScorerCache.Get<DefaultRatioScorer>();
var partialRatio       = ScorerCache.Get<PartialRatioScorer>();
var tokenSet           = ScorerCache.Get<TokenSetScorer>();
var partialTokenSet    = ScorerCache.Get<PartialTokenSetScorer>();
var tokenSort          = ScorerCache.Get<TokenSortScorer>();
var partialTokenSort   = ScorerCache.Get<PartialTokenSortScorer>();
var tokenAbbreviation  = ScorerCache.Get<TokenAbbreviationScorer>();
var partialTokenAbbrev = ScorerCache.Get<PartialTokenAbbreviationScorer>();
var weighted           = ScorerCache.Get<WeightedRatioScorer>();

Cached Scorers (`ICachedRatioScorer`)

Run .NET fiddle

Pre-initialize with a query string for repeated comparisons. These implement IDisposable:

using var scorer = new CachedWeightedRatioScorer("search query");
int score = scorer.Score("candidate string");

Available cached scorers:

CachedWeightedRatioScorer -- weighted combination (default for .Cached())
CachedDefaultRatioScorer -- simple Levenshtein ratio
CachedTokenSortScorer -- token sort ratio
CachedTokenSetScorer -- token set ratio
CachedPartialTokenSetScorer -- partial token set ratio
CachedTokenDifferenceScorer -- token difference ratio

Distance APIs

Levenshtein Distance API

Run .NET fiddle

Low-level access to the bit-parallel Levenshtein distance implementation:

// Edit distance
int distance = Levenshtein.Distance("kitten", "sitting");
// 3

// Normalized similarity (1.0 = identical, 0.0 = completely different)
double similarity = Levenshtein.NormalizedSimilarity("kitten", "sitting");
// 0.5714285714285714

// Edit operations to transform one string into another
EditOp[] ops = Levenshtein.GetEditOps("kitten", "sitting");
// [REPLACE(0, 0), REPLACE(4, 4), INSERT(6, 6)]

Instance Distance Classes

The Levenshtein, Indel, and LongestCommonSubsequence classes also offer an instance API for one-to-many comparisons. The constructor pre-computes a bit-parallel pattern match vector from the source string, which is then reused across all subsequent calls. This avoids rebuilding the internal data structure on every comparison, giving a significant speedup when comparing one source against many targets.

All three implement IDisposable -- use using to return pooled arrays.

Levenshtein Instance

Run .NET fiddle

using var lev = new Levenshtein("chicago cubs vs new york mets");

int d1 = lev.DistanceFrom("new york mets vs chicago cubs");
// 22
int d2 = lev.DistanceFrom("atlanta braves vs pittsburgh pirates");
// 26

Indel Instance

Run .NET fiddle

Indel distance counts only insertions and deletions (no replacements). NormalizedSimilarityWith returns a value between 0.0 (completely different) and 1.0 (identical):

using var indel = new Indel("chicago cubs");

int distance = indel.DistanceFrom("chicago white sox");
// 11
double similarity = indel.NormalizedSimilarityWith("chicago white sox");
// 0.6206896551724138

A generic variant IndelT<T> is available for comparing sequences of any IEquatable<T>:

using var indel = new IndelT<string>(new[] { "hello", "world" });

int distance = indel.DistanceFrom(new[] { "hello", "there" });
// 2
double similarity = indel.NormalizedSimilarityWith(new[] { "hello", "there" });
// 0.5

LongestCommonSubsequence Instance

Run .NET fiddle

LCS distance is defined as max(len1, len2) - LCS_length:

using var lcs = new LongestCommonSubsequence("chicago cubs");

int distance = lcs.DistanceFrom("chicago white sox");
// 8

String Preprocessors

Run .NET fiddle

By default, Fuzz methods compare strings as-is. Pass StringPreprocessor.Full to normalize whitespace, lowercase, and strip non-alphanumeric characters before comparing:

Fuzz.Ratio("new york mets", "NEW YORK METS");
// < 100 (case sensitive)

Fuzz.Ratio("new york mets", "NEW YORK METS", StringPreprocessor.Full);
// 100 (case insensitive after preprocessing)

Process extraction methods use StringPreprocessor.Full by default. Pass StringPreprocessor.None (or a custom processor function) to override this behavior.

Performance

The distance implementations use bit-parallel algorithms designed to reduce both execution time and allocations. The table below compares naive DP Levenshtein distance calculation (baseline), original FuzzySharp, Fastenshtein, Quickenshtein, and Raffinert.FuzzySharp.

Random words of 3 to 1024 random chars (LevenshteinLarge.cs):

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
NaiveDp	231.563 ms	57.5403 ms	3.1540 ms	1.00	0.02	43500.0000	34500.0000	275312920 B	1.000
FuzzySharp	141.820 ms	4.0905 ms	0.2242 ms	0.61	0.01	-	-	1545732 B	0.006
Fastenshtein	123.356 ms	13.0959 ms	0.7178 ms	0.53	0.01	-	-	34028 B	0.000
Quickenshtein	12.918 ms	12.8046 ms	0.7019 ms	0.06	0.00	-	-	12 B	0.000
Raffinert.FuzzySharp	4.970 ms	0.3311 ms	0.0181 ms	0.02	0.00	-	-	3051 B	0.000

Benchmark sources are in FuzzySharp.Benchmarks.

Migration Notes

From original FuzzySharp

The package name is Raffinert.FuzzySharp, and the default namespace is Raffinert.FuzzySharp. This package keeps the familiar FuzzyWuzzy-style API while adding faster distance implementations, fixed partial-ratio behavior, cached scorers, and the fluent process pipeline.

From Raffinert.FuzzySharp v4 to v5

PreprocessMode-based APIs were replaced with Func<string, string> preprocessors.
Use StringPreprocessor.Full, StringPreprocessor.None, or a custom delegate instead of PreprocessMode.Full or PreprocessMode.None.
Generic extraction methods that use extractor delegates are now named Extract*By, such as ExtractOneBy and ExtractTopBy.

Credits

Support

Support the project through GitHub Sponsors or via PayPal.

See CHANGELOG.md for release history.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.ai		.ai
.github		.github
FuzzySharp.Benchmarks		FuzzySharp.Benchmarks
FuzzySharp.Test		FuzzySharp.Test
FuzzySharp		FuzzySharp
build		build
codefetch		codefetch
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Directory.Build.props		Directory.Build.props
FuzzySharp.slnx		FuzzySharp.slnx
LICENSE		LICENSE
README.md		README.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Terms of use?

Raffinert.FuzzySharp

Overview

Contents

Installation

Compatibility

Quick Start

Choosing a Scorer

Ratio Examples

Simple Ratios

Partial Ratio

Token Sort Ratio

Token Set Ratio

Token Initialism Ratio

Token Abbreviation Ratio

Weighted Ratio

Process Extraction

Generic Type Extraction

Fluent Pipeline API

Basic Pipeline

Custom Scorer

Parallel Execution

Cached Execution

Cached + Parallel

External Cached Scorer

CancellationToken Support

Using Different Scorers

Non-Cached Scorers (IRatioScorer)

Cached Scorers (ICachedRatioScorer)

Distance APIs

Levenshtein Distance API

Instance Distance Classes

Levenshtein Instance

Indel Instance

LongestCommonSubsequence Instance

String Preprocessors

Performance

Migration Notes

From original FuzzySharp

From Raffinert.FuzzySharp v4 to v5

Credits

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Terms of use^?

Non-Cached Scorers (`IRatioScorer`)

Cached Scorers (`ICachedRatioScorer`)

Packages