Clear delimiter buffer before each peek in isDelimiter#611
Conversation
garydgregory
left a comment
There was a problem hiding this comment.
Hello @rootvector2
Thank you for your PR.
Would you please add a test that shows why this is an issue only using the public parsing API?
|
Added |
| void testPartialMultiCharacterDelimiterAtEOFAfterMismatch() throws IOException { | ||
| final CSVFormat format = CSVFormat.DEFAULT.builder().setDelimiter("[|]").get(); | ||
| // The "[a]" peek leaves ']' in the look-ahead buffer; the trailing "[|" must not match "[|]". | ||
| try (CSVParser parser = format.parse(new StringReader("x[a][|"))) { |
There was a problem hiding this comment.
Thank you for your update.
In the future, please make magic strings like "x[a][|" a local variable such that you never have to check that the assertion below is checking for a possibly different value.
| void testPartialMultiCharacterDelimiterAtEOFAfterMismatch() throws IOException { | ||
| final CSVFormat format = CSVFormat.DEFAULT.builder().setDelimiter("[|]").get(); | ||
| // The "[a]" peek leaves ']' in the look-ahead buffer; the trailing "[|" must not match "[|]". | ||
| try (Lexer lexer = createLexer("x[a][|", format)) { |
There was a problem hiding this comment.
Thank you for your update.
In the future, please make magic strings like "x[a][|" a local variable such that you never have to check that the assertion below is checking for a possibly different value.
|
@rootvector2 |
|
makes sense, will pull magic strings like that into locals going forward. thanks for the merge. |
isDelimiterpeeks the next characters into the reuseddelimiterBuflook-ahead buffer without clearing it, so a non-matching peek earlier in the same token leaves stale chars in the trailing positions and a truncated multi-character delimiter at EOF false-matches. With delimiter[|], inputx[a][|is split intox[a]and an empty field, because the earlier[a]peek left]in the buffer and the trailing[|(only two of the three delimiter chars present before EOF) matches against that stale]. CSV-324 cleareddelimiterBufonce pernextToken, butisDelimiterpeeks repeatedly within a token, so the reset has to happen before the peek, the same wayisEscapeDelimiteralready does it; the once-per-token clear is then redundant and dropped.Found while auditing the multi-character delimiter look-ahead after CSV-324.
mvn; that'smvnon the command line by itself.