[codex] Reduce serializer hot-path allocations#317
Merged
Conversation
Avoid per-call set intersections in XML escaping, skip attribute dictionary copies on typed scalar and @attrs paths, avoid scalar list item attr dict allocation where metadata is already reusable, and pass namespace defaults through the fast wrapper without materializing an empty mapping.\n\nBenchmarked against 9463457 with 5 warm samples per case:\n- attrs_nested: 552.518ms -> 359.204ms mean, peak 16,590,571 -> 16,580,643 bytes\n- plain_strings: 800.088ms -> 530.016ms mean, peak 9,538,330 -> 9,528,594 bytes
Contributor
Reviewer's GuideOptimizes XML serialization hot paths to reduce per-value allocations by eliminating unnecessary data structure copies, refining attribute handling (including typed attributes and @attrs), reusing metadata, and aligning the fast-path entry point with the pure-Python implementation, with regression tests and documentation updates to lock in behavior. Sequence diagram for list item serialization with @attrs and IDssequenceDiagram
participant Client
participant dicttoxml as dicttoxml
participant convert_list as _append_convert_list
participant dict2xml as _append_dict2xml_str
participant rawitem as _append_rawitem
participant conv_dict as _append_convert_dict
Client->>dicttoxml: dicttoxml(obj, ids=True)
dicttoxml->>convert_list: _append_convert_list(items)
convert_list->>dict2xml: _append_dict2xml_str(item_dict, ids, attr_type, ...)
activate dict2xml
dict2xml->>dict2xml: detect has_custom_attrs
dict2xml->>dict2xml: val_attr = item["@attrs"] or attr
dict2xml->>dict2xml: rawitem = item
dict2xml->>rawitem: _append_rawitem(rawitem, ids, attr_type, ..., skip_attrs=True)
deactivate dict2xml
activate rawitem
rawitem->>rawitem: [rawitem is not scalar]
rawitem->>conv_dict: _append_convert_dict(rawitem, ids, ..., skip_key="@attrs")
deactivate rawitem
activate conv_dict
conv_dict->>conv_dict: iterate keys, skip "@attrs"
conv_dict-->>Client: write child elements without copying attrs
deactivate conv_dict
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #317 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 6
Lines 609 616 +7
=========================================
+ Hits 609 616 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. |
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="json2xml/dicttoxml.py" line_range="165" />
<code_context>
return f" {attrstring}"
+def make_typed_attrstring(attr: dict[str, Any], xml_type: str) -> str:
+ """Create XML attributes with a type value without copying caller attrs."""
+ if not attr:
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying new helpers and control flow (typed attributes, @attrs handling, list attr construction, and XML escaping) to reuse existing abstractions and avoid scattered special cases.
You can keep the behavioral changes while reducing the new complexity in three focused spots:
---
### 1. `make_typed_attrstring` can be simplified via `make_attrstring`
Right now `make_typed_attrstring` reimplements most of `make_attrstring`, adds branching, and makes attribute ordering harder to reason about.
You can reuse the existing abstraction and keep the “don’t mutate caller attrs” guarantee by shallow-copying:
```python
def make_typed_attrstring(attr: dict[str, Any], xml_type: str) -> str:
"""Create XML attributes with a type value without mutating caller attrs."""
if not attr:
return f' type="{xml_type}"'
typed_attr = dict(attr) # shallow copy
typed_attr["type"] = xml_type
return make_attrstring(typed_attr)
```
Usage sites (e.g. `convert_kv_valid_name`, `convert_bool_valid_name`, `convert_none_valid_name`) remain the same:
```python
attr_string = make_typed_attrstring(attr, get_xml_type(val)) if attr_type else make_attrstring(attr)
```
This removes the custom assembly logic and keeps attribute behavior centralized in `make_attrstring`.
---
### 2. Localize `@attrs` handling instead of `skip_attrs` / `skip_key` plumbing
The current flow:
- `_append_dict2xml_str` computes `has_custom_attrs`, `raw_attrs`, `val_attr`
- Passes `skip_attrs` into `_append_rawitem`
- `_append_rawitem` may call `_append_convert_dict` with `skip_key="@attrs"`
This spreads a single concern across three functions.
You can normalize the dict once in `_append_dict2xml_str` and then call `_append_rawitem`/`_append_convert_dict` without extra flags:
```python
def _append_dict2xml_str(...):
...
has_custom_attrs = "@attrs" in item
if has_custom_attrs:
raw_attrs = item["@attrs"]
val_attr = raw_attrs if isinstance(raw_attrs, dict) else dict(raw_attrs)
# strip @attrs when it's not the scalar payload
rawitem = item["@val"] if "@val" in item else {
k: v for k, v in item.items() if k != "@attrs"
}
else:
val_attr = attr
rawitem = item.get("@val", item)
...
if parentIsList and list_headers:
...
_append_rawitem(
output,
rawitem,
ids,
attr_type,
item_func,
cdata,
item_wrap,
item_name,
list_headers,
)
...
```
Then `_append_rawitem` and `_append_convert_dict` no longer need `skip_attrs` / `skip_key`:
```python
def _append_rawitem(...):
if rawitem is None:
return
if isinstance(rawitem, bool):
output.write("true" if rawitem else "false")
elif isinstance(rawitem, (str, numbers.Number)):
output.write(escape_xml(str(rawitem)))
else:
_append_convert(
output,
rawitem,
ids,
attr_type,
item_func,
cdata,
item_wrap,
item_name,
list_headers=list_headers,
)
def _append_convert_dict(...): # remove skip_key
for key, val in obj.items():
...
```
The behavior for `@attrs`/`@val` stays the same, but all the special-casing is localized to a single, easy-to-find place.
---
### 3. Factor out `attr` construction in `_append_convert_list`
The new version duplicates `attr` construction logic in each branch and repeats the `id` handling. You can compute a base attribute once per iteration and layer type-specific attrs on top:
```python
for i, item in enumerate(items):
base_attr: dict[str, Any] | None = None
if ids:
base_attr = {"id": f"{this_id}_{i + 1}"}
if isinstance(item, bool):
attr = dict(base_attr) if base_attr else {}
if item_name_attr:
attr.update(item_name_attr)
output.write(convert_bool_valid_name(item_name, item, attr_type, attr))
elif isinstance(item, (numbers.Number, str)):
attr = dict(base_attr) if base_attr else {}
if scalar_key_attr:
attr.update(scalar_key_attr)
output.write(
convert_kv_valid_name(
key=scalar_key,
val=item,
attr_type=attr_type,
attr=attr,
cdata=cdata,
)
)
elif hasattr(item, "isoformat"):
attr = dict(base_attr) if base_attr else {}
if item_name_attr:
attr.update(item_name_attr)
output.write(
convert_kv_valid_name(
key=item_name,
val=item.isoformat(),
attr_type=attr_type,
attr=attr,
cdata=cdata,
)
)
elif isinstance(item, dict):
attr = {} if not base_attr else dict(base_attr)
_append_dict2xml_str(..., attr=attr, ...)
elif isinstance(item, Sequence):
attr = {} if not base_attr else dict(base_attr)
_append_list2xml_str(..., attr=attr, ...)
elif item is None:
attr = dict(base_attr) if base_attr else {}
if item_name_attr:
attr.update(item_name_attr)
output.write(convert_none_valid_name(item_name, attr_type, attr))
...
```
This keeps the per-branch logic focused on type-specific behavior while centralizing the `id`/base-attr handling.
---
### 4. `escape_xml` early-return condition
The previous `_XML_ESCAPE_CHARS.intersection(s)` approach was shorter and more self-documenting than the expanded chain of `not in` checks. You can restore the named constant and keep the optimization:
```python
_XML_ESCAPE_CHARS = frozenset("&\"'<>")
def escape_xml(s: str | numbers.Number) -> str:
if isinstance(s, str):
if not _XML_ESCAPE_CHARS.intersection(s):
return s
...
```
This keeps the fast path readable without adding conceptual complexity.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
make_attrstring()while shallow-copying only when thetypevalue must be overlaid.@attrs/@valnormalization in dict element handling and remove the extraskip_attrs/skip_keyplumbing.@attrspairs, and fast-wrapper fallback behavior.Review Follow-Up
Addressed the Sourcery complexity review by:
make_typed_attrstring()assembly logic with a shallow copy plusmake_attrstring().@attrsonce inside_append_dict2xml_str(), then passing normal raw payloads through_append_rawitem().skip_attrsandskip_keyfrom the recursive append helpers._XML_ESCAPE_CHARS.intersection(s)for the XML escaping fast path.Benchmark
Compared branch head
00c0540against baseline commit9463457using Python 3.14.4. Payload construction happens before tracing; each case warms once, then records 5 conversion samples withtime.perf_counter()andtracemalloc.attrs_nested/ 8,000 recordsplain_strings/ 12,000 recordsPeak memory moves modestly because the final XML bytes dominate the traced peak; the main win is removing transient allocations and repeated work along recursive serializer paths.
Validation
python3 -m ruff check json2xml/dicttoxml.py json2xml/dicttoxml_fast.py tests/test_dicttoxml_unit.py tests/test_dict2xml.pypython3 -m pytest --cov=json2xml --cov-report=term-missing --cov-fail-under=100(401 passed,100.00%)lat check(passes; reports the existing missing LLM key warning for semantic search)git diff --check