Allow interpolated string adjacent to '=' (e.g. C(Name=$"value"))#19820
Allow interpolated string adjacent to '=' (e.g. C(Name=$"value"))#19820edgarfgp wants to merge 3 commits into
Conversation
❗ Release notes required
Warning No PR link found in some release notes, please consider adding it.
|
The lexer greedily matched '=$' as a single INFIX_COMPARE_OP, so
'C(Name=$"123")' and 'let x =$"123"' failed with FS0035 ('$' not
permitted in operator names). Add a lexer rule that matches '=$"',
consumes only the '=', and rewinds so the next scan begins at '$"' —
letting the regular interpolated-string lexer handle the rest, including
interpolation holes and triple-quoted forms.
Fix is position-agnostic (works in let-bindings, named args, record
creation/copy). Scoped to '=$"'; the '$@', '@$' and '$$' verbatim/extended
forms still require a space, as before.
03ecfa7 to
98d38a4
Compare
T-Gro
left a comment
There was a problem hiding this comment.
Clean, minimal fix at exactly the right layer. The approach is sound:
Correctness
- The 3-char rule '=' '$' '"' wins by longest-match over the 2-char =$ that the INFIX_COMPARE_OP rule would match (since " is not in op_char). This is the key insight that makes the fix safe.
- The rewind via LexemeLength <- 1 + EndPos adjustment correctly lets the regular interpolated-string lexer take over at
$", including triple-quoted forms ($ """...""" — first three chars =$" still trigger the rule, then next scan sees $""" and the existing triple-quote handling kicks in). - Negative cases (=, defining (=$), infix =$) all still route through checkExprOp and reject correctly.
API surface
- LexemeLength setter is exposed in prim-lexing.fsi but LexBuffer<'Char> is ype internal — no public API impact.
CI: All 50 check runs green.
Tests: Excellent coverage — positive cases (let-bindings, named args, records, record-copy, holes, typed holes), negative cases (non-quote, operator definition, infix usage, verbatim out-of-scope), syntax-tree baselines, and the space-separated form unchanged.
One minor suggestion (non-blocking): consider adding a component test that compiles and runs =$"""triple {x}""" alongside the syntax-tree baseline. The baseline proves parsing but not codegen/execution for that form.
Nice work!
auduchinok
left a comment
There was a problem hiding this comment.
This change feels very ad-hoc. It workarounds a specific case that has an explicit existing limitation in the language. The case arises due to non-idiomatic formatting of the code, i.e. due to the missing spaces around =.
@edgarfgp Since this PR changes what is allowed by the language, have you considered changing the rule that causes the problem in the first place instead of trying to workaround it?
The workaround is done by exposing more lexer internals and doing processing after a token is lexed. Modifying tokens like this is sometimes done in the compiler already, but for operators it's done in LexFilter. Why this approach is better than the existing one?
The tests look as if they are completely AI-generated: there's a lot of noise that hides the relevant parts of the code, and the tests effectively duplicate each other, since they only test simple adjacent = and $, while not testing how different kinds of interpolations interoperate, how operators are lexed differently now due to this change. I think we should not merge changes with such tests.
| let n = 42 | ||
| let world = "world" | ||
| let plain =$"123" | ||
| let withHole =$"hello {world}" | ||
| let withTypedHole =$"%d{n}" | ||
| if plain <> "123" then failwithf "plain: expected 123, got %s" plain | ||
| if withHole <> "hello world" then failwithf "withHole: expected 'hello world', got %s" withHole | ||
| if withTypedHole <> "42" then failwithf "withTypedHole: expected 42, got %s" withTypedHole |
There was a problem hiding this comment.
@edgarfgp Could you minimize the test, so it's clear what it tests?
| | '=' '$' '"' { | ||
| lexbuf.LexemeLength <- 1 | ||
| lexbuf.EndPos <- lexbuf.StartPos.ShiftColumnBy(1) | ||
| EQUALS } |
There was a problem hiding this comment.
@edgarfgp What other implications may this change bring? Have you considered following approach used for operators and type arguments?
Fixes #16696.
Problem
When an interpolated string immediately follows
=with no space, the lexer greedily matches=$as a singleINFIX_COMPARE_OPtoken (since$is an operator character).checkExprOpthen rejects it with FS0035 ('$' is not permitted as a character in operator names), so property/named-argument initialization with an interpolated string fails to parse.Before
After
The produced AST is identical to the spaced form
= $"...", so all downstream tooling sees the same tree.Solution
A lexer-level rule for the exact 3-character sequence
=$":It matches
=$"(winning over the 2-char operator rule via longest-match), emitsEQUALS, then rewindsLexemeLengthto 1 so the next scan begins at$". The existing interpolated-string lexer then processes the rest — including interpolation holes and triple-quoted forms. Because the fix is in the lexer, it applies in every position where= $"..."is valid (let-bindings, named args, record creation/copy), not just one grammar rule.The previously-
internalLexBuffer.LexemeLengthsetter is exposed inprim-lexing.fsi(no change to the public API surface — the type isinternal).Scope
Limited to the
$"opening sequence (single- and triple-quoted). Defining or using=$as a custom operator remains rejected (FS0035), preserving the "reserved for future use" semantics.Open question — should we also handle the remaining variants?
These forms still require a space and continue to error as before (regression tests assert this):
They're out of scope for the reported issue (which only shows the
$"form), and the workaround is a single space. They could be added with the same lexer technique (longer=$@",=@$",=$$"""rules with a larger rewind), at the cost of more lexer rules. Do we want to cover them in this PR, defer to a follow-up, or leave them as-is?