Skip to content

Update dependencies#112

Open
chapinb wants to merge 7 commits into
mainfrom
update-dependencies
Open

Update dependencies#112
chapinb wants to merge 7 commits into
mainfrom
update-dependencies

Conversation

@chapinb

@chapinb chapinb commented Feb 8, 2026

Copy link
Copy Markdown
Owner

And close #69

Summary by Sourcery

Update dependency constraints and adjust IP bogon detection and stream parsing to support newer libraries and non-seekable input streams.

New Features:

  • Support parsing of non-seekable input streams by handling non-seekable stdin-like sources without failing.

Bug Fixes:

  • Correct bogon IP detection to use updated netaddr semantics and explicitly treat IPv6 site-local addresses as bogons.

Enhancements:

  • Refine stream gzip detection to avoid destructive reads and fallback to an in-memory buffer for non-seekable streams.

Build:

  • Relax and bump version ranges for netaddr and python-evtx in project dependencies.

Tests:

  • Add coverage for non-seekable input streams and improve failure messages for bogon IP tests.

@chapinb chapinb requested a review from Copilot February 8, 2026 20:27
@sourcery-ai

sourcery-ai Bot commented Feb 8, 2026

Copy link
Copy Markdown

Reviewer's Guide

Updates dependency constraints (notably netaddr) and adjusts IP bogon detection logic and stream parsing to remain compatible, including support for non-seekable input streams, with added tests.

Sequence diagram for updated parse_file stream handling

sequenceDiagram
    actor User
    participant PlainTextParser
    participant FileEntry
    participant StreamBuffer
    participant GzipFile

    User->>PlainTextParser: parse_file(file_entry, is_stream=True)
    PlainTextParser->>FileEntry: access buffer
    FileEntry-->>PlainTextParser: buffer
    PlainTextParser->>StreamBuffer: assign stream_buffer

    alt StreamBuffer has peek
        PlainTextParser->>StreamBuffer: peek(2)
        StreamBuffer-->>PlainTextParser: two_bytes
        PlainTextParser->>PlainTextParser: two_bytes = two_bytes[:2]
    else StreamBuffer has no peek
        PlainTextParser->>StreamBuffer: read(2)
        StreamBuffer-->>PlainTextParser: two_bytes
        alt StreamBuffer is seekable
            PlainTextParser->>StreamBuffer: seek(0)
        else StreamBuffer is not seekable
            PlainTextParser->>StreamBuffer: read()
            StreamBuffer-->>PlainTextParser: remaining_bytes
            PlainTextParser->>PlainTextParser: stream_buffer = BytesIO(two_bytes + remaining_bytes)
        end
    end

    PlainTextParser->>PlainTextParser: normalize two_bytes to bytes
    PlainTextParser->>PlainTextParser: check binascii.hexlify(two_bytes)
    alt Gzip magic matches
        PlainTextParser->>GzipFile: GzipFile(fileobj=stream_buffer)
        GzipFile-->>PlainTextParser: file_data (decompressed)
    else Not gzip
        PlainTextParser->>PlainTextParser: file_data = stream_buffer
    end

    loop for each raw_line in file_data
        PlainTextParser->>PlainTextParser: decode raw_line if needed
    end
Loading

File-Level Changes

Change Details Files
Make gzip sniffing for streamed input robust to non-seekable streams.
  • Refactor streamed input handling to work with a local stream_buffer reference instead of file_entry directly.
  • Use peek(2) when available to non-destructively inspect the first two bytes of the stream.
  • Fallback to read(2) and, if the stream is not seekable, wrap remaining data in a BytesIO constructed from the inspected bytes and the rest of the stream.
  • Remove unconditional seek(0) on the original file_entry and use stream_buffer for GzipFile/file_data selection.
libchickadee/parsers/plain_text.py
Tighten and modernize bogon IP detection in line with newer netaddr behavior.
  • Import IPNetwork alongside IPAddress for IPv6 network handling.
  • Define an IPV6_SITE_LOCAL network constant for fec0::/10.
  • Change is_bogon to explicitly treat IPv6 site-local addresses as bogons and to rely on ip.is_multicast, ip.is_link_local, ip.is_reserved, and not ip.is_global instead of older helpers.
libchickadee/parsers/__init__.py
Expand tests to cover new bogon logic and non-seekable stream handling.
  • Add a specific assertion message to the bogon test loop to identify which IP fails when the test breaks.
  • Add a test using a NonSeekableBytesIO-based TextIOWrapper to validate file_handler behavior when stdin-like streams are not seekable.
libchickadee/test/test_parser_base.py
libchickadee/test/test_chickadee.py
Update dependency versions and lockfile to newer releases compatible with the new behavior.
  • Relax version constraints for netaddr to >=1.0.0,<2.0.0.
  • Relax version constraints for python-evtx to >=0.7.4,<1.0.0.
  • Regenerate uv.lock to reflect updated dependencies.
pyproject.toml
uv.lock

Assessment against linked issues

Issue Objective Addressed Explanation
#69 Fix stdin/stream handling so that chickadee can correctly parse IP addresses from piped stdin (including non-seekable streams) without raising the "'bytes' object has no attribute 'read'" error.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `libchickadee/test/test_chickadee.py:516-514` </location>
<code_context>
         ips = Chickadee.file_handler(stream, ignore_bogon=True)
         self.assertDictEqual(ips, {"1.1.1.1": 1})

+    def test_file_handler_non_seekable_stream(self):
+        """Validate stream parsing when stdin is not seekable (common in Linux pipes)."""
+
+        class NonSeekableBytesIO(io.BytesIO):
+            def seekable(self):
+                return False
+
+            def seek(self, *args, **kwargs):
+                raise io.UnsupportedOperation("underlying stream is not seekable")
+
+        stream = io.TextIOWrapper(NonSeekableBytesIO(b"test 1.1.1.1 ip"))
+        ips = Chickadee.file_handler(stream, ignore_bogon=True)
+        self.assertDictEqual(ips, {"1.1.1.1": 1})
+

</code_context>

<issue_to_address>
**suggestion (testing):** Add coverage for non-seekable *gzip* streams and peek-capable streams to fully exercise the new stream handling logic.

To more completely exercise the new `parse_file` behavior and prevent regressions, consider adding:

1) A test where the non-seekable stream contains gzip-compressed content, to cover the `BytesIO(two_bytes + stream_buffer.read())` and `GzipFile(fileobj=stream_buffer)` branches.

2) A test using a buffer that implements `peek()` (e.g., `io.BufferedReader` over `BytesIO`) to verify the non-destructive `peek` path and that full content is still parsed correctly after the header read.

This would cover both the `hasattr(stream_buffer, "peek")` and `seekable()` branches for non-seekable streams with gzip detection.
</issue_to_address>

### Comment 2
<location> `libchickadee/test/test_parser_base.py:47` </location>
<code_context>
             "100::517b:deaa:fb23:5013",
         ]  # nosec
         for ip in ip_list:
-            self.assertTrue(ParserBase.is_bogon(ip))
+            self.assertTrue(ParserBase.is_bogon(ip), msg=f"Failed bogon test for IP: {ip}")

     def test_nonbogon(self):
</code_context>

<issue_to_address>
**issue (testing):** Extend bogon tests to explicitly cover new IPv6 site-local and non-global logic, as well as clear non-bogon examples.

`ParserBase.is_bogon` now relies on `not ip.is_global()` plus explicit checks for multicast/link-local/reserved, and adds IPv6 site-local (`fec0::/10`). The current tests don’t exercise these new paths.

Please add:

1) Bogon cases that depend on the new logic, e.g.:
   - A site-local IPv6 like `"fec0::1"`.
   - A non-global address that is not private/multicast/link-local/reserved, to validate the `not ip.is_global()` behavior.

2) In `test_nonbogon`, explicit global IPv4 and IPv6 addresses with `assertFalse(ParserBase.is_bogon(addr), ...)` to confirm the new condition doesn’t over-classify bogons.

This will better anchor the tests to the new implementation and guard against regressions related to issue #69.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread libchickadee/test/test_chickadee.py
Comment thread libchickadee/test/test_parser_base.py

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Chickadee’s dependency set and adjusts parsing logic to better handle stdin streams (including non-seekable cases) while refining bogon detection, aligning with the “Cannot parse data from stdin” issue (#69).

Changes:

  • Bump dependency constraints (notably netaddr to >=1,<2 and widen python-evtx upper bound).
  • Update plain-text stream parsing to avoid destructive reads / seeking on stdin-like streams.
  • Refine bogon detection (including IPv6 site-local handling) and add/adjust unit tests.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
uv.lock Regenerates lockfile for updated dependency versions/ranges.
pyproject.toml Updates declared dependency constraints for netaddr and python-evtx.
libchickadee/parsers/plain_text.py Improves stdin/stream handling for gzip sniffing and non-seekable streams.
libchickadee/parsers/__init__.py Updates bogon detection logic and adds IPv6 site-local network constant.
libchickadee/test/test_chickadee.py Adds coverage for non-seekable stdin-like stream parsing.
libchickadee/test/test_parser_base.py Improves failure messaging for bogon test cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +66 to +70
if hasattr(stream_buffer, "seekable") and stream_buffer.seekable():
stream_buffer.seek(0)
else:
stream_buffer = BytesIO(two_bytes + stream_buffer.read())

Copilot AI Feb 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the stream buffer is non-seekable and does not support peek(), this path reads the remainder of the stream into memory to recreate the first two bytes. For large stdin inputs this can cause high memory usage and potentially hang/kill the process. Consider wrapping the underlying raw stream in a buffered reader that supports peeking, or using a small prefix buffer strategy that doesn't require materializing the full stream in RAM.

Copilot uses AI. Check for mistakes.
Comment thread libchickadee/parsers/__init__.py Outdated
@sonarqubecloud

sonarqubecloud Bot commented Feb 8, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarQube Cloud

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot parse data from stdin

2 participants