Update dependencies#112
Conversation
Reviewer's GuideUpdates dependency constraints (notably netaddr) and adjusts IP bogon detection logic and stream parsing to remain compatible, including support for non-seekable input streams, with added tests. Sequence diagram for updated parse_file stream handlingsequenceDiagram
actor User
participant PlainTextParser
participant FileEntry
participant StreamBuffer
participant GzipFile
User->>PlainTextParser: parse_file(file_entry, is_stream=True)
PlainTextParser->>FileEntry: access buffer
FileEntry-->>PlainTextParser: buffer
PlainTextParser->>StreamBuffer: assign stream_buffer
alt StreamBuffer has peek
PlainTextParser->>StreamBuffer: peek(2)
StreamBuffer-->>PlainTextParser: two_bytes
PlainTextParser->>PlainTextParser: two_bytes = two_bytes[:2]
else StreamBuffer has no peek
PlainTextParser->>StreamBuffer: read(2)
StreamBuffer-->>PlainTextParser: two_bytes
alt StreamBuffer is seekable
PlainTextParser->>StreamBuffer: seek(0)
else StreamBuffer is not seekable
PlainTextParser->>StreamBuffer: read()
StreamBuffer-->>PlainTextParser: remaining_bytes
PlainTextParser->>PlainTextParser: stream_buffer = BytesIO(two_bytes + remaining_bytes)
end
end
PlainTextParser->>PlainTextParser: normalize two_bytes to bytes
PlainTextParser->>PlainTextParser: check binascii.hexlify(two_bytes)
alt Gzip magic matches
PlainTextParser->>GzipFile: GzipFile(fileobj=stream_buffer)
GzipFile-->>PlainTextParser: file_data (decompressed)
else Not gzip
PlainTextParser->>PlainTextParser: file_data = stream_buffer
end
loop for each raw_line in file_data
PlainTextParser->>PlainTextParser: decode raw_line if needed
end
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `libchickadee/test/test_chickadee.py:516-514` </location>
<code_context>
ips = Chickadee.file_handler(stream, ignore_bogon=True)
self.assertDictEqual(ips, {"1.1.1.1": 1})
+ def test_file_handler_non_seekable_stream(self):
+ """Validate stream parsing when stdin is not seekable (common in Linux pipes)."""
+
+ class NonSeekableBytesIO(io.BytesIO):
+ def seekable(self):
+ return False
+
+ def seek(self, *args, **kwargs):
+ raise io.UnsupportedOperation("underlying stream is not seekable")
+
+ stream = io.TextIOWrapper(NonSeekableBytesIO(b"test 1.1.1.1 ip"))
+ ips = Chickadee.file_handler(stream, ignore_bogon=True)
+ self.assertDictEqual(ips, {"1.1.1.1": 1})
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for non-seekable *gzip* streams and peek-capable streams to fully exercise the new stream handling logic.
To more completely exercise the new `parse_file` behavior and prevent regressions, consider adding:
1) A test where the non-seekable stream contains gzip-compressed content, to cover the `BytesIO(two_bytes + stream_buffer.read())` and `GzipFile(fileobj=stream_buffer)` branches.
2) A test using a buffer that implements `peek()` (e.g., `io.BufferedReader` over `BytesIO`) to verify the non-destructive `peek` path and that full content is still parsed correctly after the header read.
This would cover both the `hasattr(stream_buffer, "peek")` and `seekable()` branches for non-seekable streams with gzip detection.
</issue_to_address>
### Comment 2
<location> `libchickadee/test/test_parser_base.py:47` </location>
<code_context>
"100::517b:deaa:fb23:5013",
] # nosec
for ip in ip_list:
- self.assertTrue(ParserBase.is_bogon(ip))
+ self.assertTrue(ParserBase.is_bogon(ip), msg=f"Failed bogon test for IP: {ip}")
def test_nonbogon(self):
</code_context>
<issue_to_address>
**issue (testing):** Extend bogon tests to explicitly cover new IPv6 site-local and non-global logic, as well as clear non-bogon examples.
`ParserBase.is_bogon` now relies on `not ip.is_global()` plus explicit checks for multicast/link-local/reserved, and adds IPv6 site-local (`fec0::/10`). The current tests don’t exercise these new paths.
Please add:
1) Bogon cases that depend on the new logic, e.g.:
- A site-local IPv6 like `"fec0::1"`.
- A non-global address that is not private/multicast/link-local/reserved, to validate the `not ip.is_global()` behavior.
2) In `test_nonbogon`, explicit global IPv4 and IPv6 addresses with `assertFalse(ParserBase.is_bogon(addr), ...)` to confirm the new condition doesn’t over-classify bogons.
This will better anchor the tests to the new implementation and guard against regressions related to issue #69.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Updates Chickadee’s dependency set and adjusts parsing logic to better handle stdin streams (including non-seekable cases) while refining bogon detection, aligning with the “Cannot parse data from stdin” issue (#69).
Changes:
- Bump dependency constraints (notably
netaddrto>=1,<2and widenpython-evtxupper bound). - Update plain-text stream parsing to avoid destructive reads / seeking on stdin-like streams.
- Refine bogon detection (including IPv6 site-local handling) and add/adjust unit tests.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Regenerates lockfile for updated dependency versions/ranges. |
pyproject.toml |
Updates declared dependency constraints for netaddr and python-evtx. |
libchickadee/parsers/plain_text.py |
Improves stdin/stream handling for gzip sniffing and non-seekable streams. |
libchickadee/parsers/__init__.py |
Updates bogon detection logic and adds IPv6 site-local network constant. |
libchickadee/test/test_chickadee.py |
Adds coverage for non-seekable stdin-like stream parsing. |
libchickadee/test/test_parser_base.py |
Improves failure messaging for bogon test cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if hasattr(stream_buffer, "seekable") and stream_buffer.seekable(): | ||
| stream_buffer.seek(0) | ||
| else: | ||
| stream_buffer = BytesIO(two_bytes + stream_buffer.read()) | ||
|
|
There was a problem hiding this comment.
When the stream buffer is non-seekable and does not support peek(), this path reads the remainder of the stream into memory to recreate the first two bytes. For large stdin inputs this can cause high memory usage and potentially hang/kill the process. Consider wrapping the underlying raw stream in a buffered reader that supports peeking, or using a small prefix buffer strategy that doesn't require materializing the full stream in RAM.
|
|


And close #69
Summary by Sourcery
Update dependency constraints and adjust IP bogon detection and stream parsing to support newer libraries and non-seekable input streams.
New Features:
Bug Fixes:
Enhancements:
Build:
Tests: