Summary
Some files in the three-sixty directory are corrupted: they contain large sections of null bytes and malformed arrays, causing standard JSON parsers to fail.
Affected Files
-
data/three-sixty/3835338.json
- Line 181321: 16KB of null bytes in
"location" array.
- Structure:
"location": [ [nulls] 0.0, 80.0 ],
- Error: Unparseable by
json.load(), missing closing bracket.
-
data/three-sixty/3835342.json
- Line 171856: Corrupted
"visible_area" array with null bytes.
- Structure:
"visible_area": [ numbers, 83.8r" : false,
- Error:
json.JSONDecodeError on standard parse.
-
data/three-sixty/3845506.json
- Line 92794: Truncated text in JSON structure.
- Structure:
lse, (appears to be missing beginning of line)
- Error:
Expecting ',' delimiter on parse.
Reproduction
This issue can be reproduced by attempting to load the affected files using Pandas or JSON. For example:
# PANDAS
import pandas as pd
try:
df = pd.read_json("data/three-sixty/3835338.json")
except Exception as e:
print(f"FAILED: {e}")
# JSON
import json
with open("data/three-sixty/3835338.json") as f:
json.load(f) # Raises JSONDecodeError
Impact
- Standard tools (Python, Pandas, Polars) cannot load these files.
- Data ingestion pipelines have to skip the files.
Next Steps
I'm happy to help look into this further and contribute fixes if that would be useful.
Thank you for maintaining this dataset :)
Summary
Some files in the
three-sixtydirectory are corrupted: they contain large sections of null bytes and malformed arrays, causing standard JSON parsers to fail.Affected Files
data/three-sixty/3835338.json"location"array."location": [ [nulls] 0.0, 80.0 ],json.load(), missing closing bracket.data/three-sixty/3835342.json"visible_area"array with null bytes."visible_area": [ numbers, 83.8r" : false,json.JSONDecodeErroron standard parse.data/three-sixty/3845506.jsonlse,(appears to be missing beginning of line)Expecting ',' delimiteron parse.Reproduction
This issue can be reproduced by attempting to load the affected files using Pandas or JSON. For example:
Impact
Next Steps
I'm happy to help look into this further and contribute fixes if that would be useful.
Thank you for maintaining this dataset :)