license-scanner 0.12.0 crashes when processing a real-world source package (libidn2).
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] file too large (1098942 > 1000000)
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
[ERROR] failed to normalize data: invalid input text with control characters
panic: runtime error: slice bounds out of range [9:8]
goroutine 191076 [running]:
github.com/CycloneDX/license-scanner/normalizer.(*NormalizationData).removeHTMLTags(0xc001f3e000)
github.com/CycloneDX/license-scanner/normalizer/normalizer.go:381 +0x19a
github.com/CycloneDX/license-scanner/normalizer.(*NormalizationData).NormalizeText(0xc001f3e000)
github.com/CycloneDX/license-scanner/normalizer/normalizer.go:218 +0x1ec
github.com/CycloneDX/license-scanner/identifier.IdentifyLicensesInString({_, _}, {0x1, 0x0, {{0x0, 0x0}, 0x1, 0x0, 0x0, 0x0}}, ...)
github.com/CycloneDX/license-scanner/identifier/identifier.go:99 +0xd9
github.com/CycloneDX/license-scanner/identifier.IdentifyLicensesInFile({_, _}, {0x1, 0x0, {{0x0, 0x0}, 0x1, 0x0, 0x0, 0x0}}, ...)
github.com/CycloneDX/license-scanner/identifier/identifier.go:122 +0x145
github.com/CycloneDX/license-scanner/identifier.IdentifyLicensesInDirectory.func3()
github.com/CycloneDX/license-scanner/identifier/identifier.go:168 +0x73
golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x50
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0x95
license-scanner 0.12.0 crashes when processing a real-world source package (libidn2).
Repro:
curl -JOL 'https://ftp.gnu.org/gnu/libidn/libidn2-2.3.8.tar.gz' tar -xf libidn2-2.3.8.tar.gz license-scanner --dir libidn2-2.3.8Output: