-
Notifications
You must be signed in to change notification settings - Fork 5
feat(PLT-1660): hard cut — delete old semantic type system, wire extension type hashing #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3be642a
1f16a93
f432895
450dde8
55eea46
5843544
1f543f4
895f885
b965fc5
0b8ae99
21cedfc
305735c
86870eb
d170232
6bab2f4
aaa3070
ba3d977
4cf7001
f72832a
8038507
d34c504
14478b3
d29079c
28a0987
a79641f
764a1bf
5c12aa0
596333b
d71bf19
e8129d7
7caa8af
5c57b71
b63ff2d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,41 +1,13 @@ | ||
| { | ||
| "context_key": "std:v0.1:default", | ||
| "version": "v0.1", | ||
| "description": "Initial stable release with basic Path semantic type support", | ||
| "description": "Initial stable release with extension type hashing support", | ||
| "file_hasher": { | ||
| "_class": "orcapod.hashing.file_hashers.BasicFileHasher", | ||
| "_config": { | ||
| "algorithm": "sha256" | ||
| } | ||
| }, | ||
| "semantic_registry": { | ||
| "_class": "orcapod.semantic_types.semantic_registry.SemanticTypeRegistry", | ||
| "_config": { | ||
| "converters": { | ||
| "upath": { | ||
| "_class": "orcapod.semantic_types.semantic_struct_converters.UPathStructConverter", | ||
| "_config": { | ||
| "file_hasher": {"_ref": "file_hasher"} | ||
| } | ||
| }, | ||
| "path": { | ||
| "_class": "orcapod.semantic_types.semantic_struct_converters.PythonPathStructConverter", | ||
| "_config": { | ||
| "file_hasher": {"_ref": "file_hasher"} | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "arrow_hasher": { | ||
| "_class": "orcapod.hashing.arrow_hashers.StarfixArrowHasher", | ||
| "_config": { | ||
| "hasher_id": "arrow_v0.1", | ||
| "semantic_registry": { | ||
| "_ref": "semantic_registry" | ||
| } | ||
| } | ||
| }, | ||
| "type_converter": { | ||
| "_class": "orcapod.semantic_types.universal_converter.UniversalTypeConverter", | ||
| "_config": { | ||
|
|
@@ -78,52 +50,61 @@ | |
| } | ||
| } | ||
| }, | ||
| "function_info_extractor": { | ||
| "function_semantic_hasher": { | ||
| "_class": "orcapod.hashing.semantic_hashing.function_info_extractors.FunctionSignatureExtractor", | ||
| "_config": { | ||
| "include_module": true, | ||
| "include_defaults": true | ||
| } | ||
| }, | ||
| "type_handler_registry": { | ||
| "_class": "orcapod.hashing.semantic_hashing.type_handler_registry.TypeHandlerRegistry", | ||
| "python_type_handler_registry": { | ||
| "_class": "orcapod.hashing.semantic_hashing.type_handler_registry.PythonTypeHandlerRegistry", | ||
| "_config": { | ||
| "handlers": [ | ||
| [{"_type": "builtins.bytes"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.BytesHandler", "_config": {}}], | ||
| [{"_type": "builtins.bytearray"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.BytesHandler", "_config": {}}], | ||
| [{"_type": "pathlib.Path"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.PathContentHandler", "_config": {"file_hasher": {"_ref": "file_hasher"}}}], | ||
| [{"_type": "upath.core.UPath"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.UPathContentHandler", "_config": {"file_hasher": {"_ref": "file_hasher"}}}], | ||
| [{"_type": "uuid.UUID"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.UUIDHandler", "_config": {}}], | ||
| [{"_type": "types.FunctionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}], | ||
| [{"_type": "types.BuiltinFunctionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}], | ||
| [{"_type": "types.MethodType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}], | ||
| [{"_type": "builtins.type"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.TypeObjectHandler", "_config": {}}], | ||
| [{"_type": "types.GenericAlias"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler", "_config": {}}], | ||
| [{"_type": "types.UnionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.UnionTypeHandler", "_config": {}}], | ||
| [{"_type": "typing._GenericAlias"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler", "_config": {}}], | ||
| [{"_type": "typing._SpecialForm"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.SpecialFormHandler", "_config": {}}], | ||
| [{"_type": "pyarrow.Table"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}], | ||
| [{"_type": "pyarrow.RecordBatch"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}] | ||
| [{"_type": "builtins.bytes"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.BytesHandler", "_config": {}}], | ||
| [{"_type": "builtins.bytearray"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.BytesHandler", "_config": {}}], | ||
| [{"_type": "pathlib.Path"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.PathHandler", "_config": {"file_hasher": {"_ref": "file_hasher"}}}], | ||
| [{"_type": "upath.core.UPath"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.UPathHandler", "_config": {"file_hasher": {"_ref": "file_hasher"}}}], | ||
| [{"_type": "uuid.UUID"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.UUIDHandler", "_config": {}}], | ||
| [{"_type": "types.FunctionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_semantic_hasher"}}}], | ||
| [{"_type": "types.BuiltinFunctionType"},{"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_semantic_hasher"}}}], | ||
| [{"_type": "types.MethodType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_semantic_hasher"}}}], | ||
| [{"_type": "builtins.type"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.TypeObjectHandler", "_config": {}}], | ||
| [{"_type": "types.GenericAlias"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler", "_config": {}}], | ||
| [{"_type": "types.UnionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.UnionTypeHandler", "_config": {}}], | ||
| [{"_type": "typing._GenericAlias"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler", "_config": {}}], | ||
| [{"_type": "typing._SpecialForm"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.SpecialFormHandler", "_config": {}}], | ||
| [{"_type": "pyarrow.Table"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {}}], | ||
| [{"_type": "pyarrow.RecordBatch"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {}}] | ||
| ] | ||
| } | ||
| }, | ||
| "semantic_hasher": { | ||
| "_class": "orcapod.hashing.semantic_hashing.semantic_hasher.BaseSemanticHasher", | ||
| "_class": "orcapod.hashing.semantic_hashing.semantic_hasher.SemanticAwarePythonHasher", | ||
| "_config": { | ||
| "hasher_id": "semantic_v0.1", | ||
| "type_handler_registry": { | ||
| "_ref": "type_handler_registry" | ||
| "_ref": "python_type_handler_registry" | ||
| } | ||
| } | ||
| }, | ||
| "arrow_hasher": { | ||
| "_class": "orcapod.hashing.arrow_hashers.StarfixArrowHasher", | ||
| "_config": { | ||
| "hasher_id": "arrow_v0.1", | ||
| "type_converter": {"_ref": "type_converter"}, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fact arrow_hasher takes in type_converter and sematnic_hasher as its construtor argument actually makes arrow_hasher and sematic_hashser relationship circular in the default context. This strongly suggests we should unlink the circle by making one of them instantiate WITHOUT the other in the constructor. Rather, it should "optionally" accept the other (e.g. semantic_hasher) when invoking method on the arrow hasher.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Filed PLT-1826 to track the decoupling. The likely fix is to remove |
||
| "semantic_hasher": {"_ref": "semantic_hasher"} | ||
| } | ||
| }, | ||
| "metadata": { | ||
| "created_date": "2025-08-01", | ||
| "created_date": "2026-06-24", | ||
| "author": "OrcaPod Core Team", | ||
| "changelog": [ | ||
| "Initial release with Path semantic type support", | ||
| "Basic SHA-256 hashing for files and objects", | ||
| "Arrow logical serialization method", | ||
| "Introduced arrow_v0.1 StarfixArrowHasher using starfix ArrowDigester for cross-language-compatible Arrow hashing" | ||
| "Introduced arrow_v0.1 StarfixArrowHasher using starfix ArrowDigester for cross-language-compatible Arrow hashing", | ||
| "Hard cut: replaced shape-based SemanticTypeRegistry with extension-type hashing; renamed all hashing classes to cleaner names" | ||
| ] | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a follow up issue to make it such that the same handler can be registerd to multiple target classes and make use of MRO-based matching system already used many other places in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed PLT-1827 to track this. No code change in this PR.