Skip to content

feat: add switch RFC3986 / IRI-Style for @TrackLink#3077

Open
gregouzdev wants to merge 1 commit into
knadh:masterfrom
gregouzdev:fix/url-iri-style
Open

feat: add switch RFC3986 / IRI-Style for @TrackLink#3077
gregouzdev wants to merge 1 commit into
knadh:masterfrom
gregouzdev:fix/url-iri-style

Conversation

@gregouzdev

Copy link
Copy Markdown

Response for Issue #3076

Summary

The root cause was identified in models/common.go (line 55): the existing regex pattern only accepted a strict subset of RFC 3986 characters (ASCII letters, digits, and a limited set of symbols). As a result, URLs containing emojis or non-ASCII Unicode characters caused the pattern to break before reaching @TrackLink, preventing the shortcut from being converted into the expected {{ TrackLink ... }} template tag — leaving the raw, corrupted URL in the final email output.

Changes

models/common.go (line 45) — configurable regex mode via LISTMONK_TRACKLINK_REGEX_MODE

The fix introduces an environment variable to control the URL matching strategy, allowing operators to choose between improved compatibility and strict legacy behaviour:

Mode Behaviour
iri (default) Captures URLs up to common HTML/template delimiters (", ', whitespace, <, >, {, }) — handles emojis, Unicode, encoded sequences, and long query strings
unicode Restores the previous strict regex for environments that require it

Regex comparison

Before (strict RFC 3986):

// Only matched ASCII letters, digits, and a limited set of symbols
// — broke on emojis and non-ASCII Unicode characters

After (IRI-style, default):

(https?://[^\s<>"'{}]*)@TrackLink\b

This pattern captures everything up to the first structural HTML or template delimiter, making it robust against the full range of real-world client URLs.

Backward compatibility

This change is non-breaking. The new iri mode is the default, but the previous behaviour can be fully restored by setting:

LISTMONK_TRACKLINK_REGEX_MODE=unicode

Testing

  • URLs with emojis (e.g. , 📊, 🚀)
  • URLs with non-ASCII Unicode characters (e.g. accented characters, CJK)
  • URLs with partial encoding (%20, %2F, etc.)
  • Long URLs with complex query strings
  • Standard ASCII URLs — no regression
  • Legacy unicode mode — behaviour identical to previous implementation

@knadh

knadh commented Jun 19, 2026

Copy link
Copy Markdown
Owner

There should be no os.Getenv()-level switch. This should be handled solely in the regexp.

@gregouzdev

gregouzdev commented Jun 19, 2026

Copy link
Copy Markdown
Author

Good point regarding the environment switch. I agree this can be handled directly in the regexp and doesn't need an additional configuration option.

The underlying issue is that @TrackLink is not really validating URLs, but extracting the URL portion before transforming it into {{ TrackLink ... }}.

The current character-class approach requires continuously enumerating allowed URL characters. Using a delimiter-based approach is simpler because @TrackLink only needs to identify where the URL ends, not validate which characters are allowed inside it.

Furthermore, in real-world conditions, some platforms like SharePoint or similar systems generate very long and complex URLs (parameters, nested paths, tokens, etc.), making strict or overly restrictive approaches even less suitable.

Rather than managing multiple variants, it's possible to simplify by using a single, more permissive regular expression that works correctly: (https?://[^"'\s<>{}]+)@TrackLink

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants