Skip to content

Register the catalog-generated UDF dispatch surface#28

Open
estebanzimanyi wants to merge 30 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/generated-dispatch
Open

Register the catalog-generated UDF dispatch surface#28
estebanzimanyi wants to merge 30 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/generated-dispatch

Conversation

@estebanzimanyi

Copy link
Copy Markdown
Member

Stacked on #26. Adds GeneratedSpatioTemporalUDFs — emitted by the generator (#27) from the MEOS-API @sqlfn/@sqlop catalog (#18) — and registers it last in create(). It provides runtime type-dispatching overlaps / stbox(geom,time) / timeSpan: each String arg is classified by MEOS type (spans/stboxes/geometries as text, only temporals as hex, so the temporal parser is never misfed) and routed to the catalog-determined backing. Closes the BerlinMOD bench's overlaps/stbox gaps with generated, serialization-safe code — no hand UDFs, no MEOS-API growth. Validated: 17/18 suite queries run clean, no OOM.

tools/codegen_spark_udfs.py emits MobilitySpark UDF-registration classes from the
MEOS-API catalog (output/meos-idl.json), resolving each SQL name to its MEOS-C
backing via the @sqlfn / @sqlop map (MEOS-API MobilityDB#18). Two modes:
- SINGLE: one backing -> a 1:1 UDF (type-marshalling: each MEOS C type <-> its
  parse-from-String / serialize-to-String form).
- DISPATCH: an overloaded SQL name / operator (overlaps via &&, stbox(geom,time),
  timeSpan) -> ONE UDF that classifies each arg by its MEOS type and routes to the
  catalog-determined backing. Classification is MEOS-driven and wire-format-safe:
  spans/stboxes/geometries travel as TEXT, only temporals as hex, so the leading
  token disambiguates ('['/'(' span, STBOX stbox, hex temporal, else geometry) and
  temporal_from_hexwkb is never fed a non-temporal. Emitted lambdas call only static
  GeneratedFunctions (no captured state -> Spark-serializable). Zero hand heuristics,
  zero new MEOS functions.
…roup

Generalize the generator over the whole JMEOS public surface (was a 4-UDF POC):
mirror JMEOS FunctionsGenerator's marshalling conventions — temporals / spans /
sets / boxes / jsonb as hex-WKB or type text, TimestampTz as OffsetDateTime,
DateADT as int, and bool f(.., result) out-params dropped with their value
returned. Cross-check every emission against the JMEOS jar signatures (arity +
return kind) so a collapsed catalog type can never miscompile. Organize the
emitted UDFs into one class per doxygen @InGroup module — the reference-manual
structure, so a function is found in the same place across tools — excluding
meos_internal_*, and splitting oversized groups to stay under the JVM class
limits. Emits ~2200 1:1 UDFs, compiling green.
Add a dispatch pass: each portable comparison bare name (everEq/everNe/everLt/
everLe/everGt/everGe, alwaysEq.., tempEq.. — RFC #920 / contract families
everComparison/alwaysComparison/temporalComparison) is emitted ONCE wrapping its
MEOS superclass entrypoint (ever_<op>_temporal_temporal / always_<op>_temporal_
temporal / temporal_<op>), which dispatches every concrete temporal type
internally from the type-erased hex-WKB string — so Spark needs no per-type
overload and no Java type-inspection. 18 bare names, emitted into
GeneratedUdfs_portable_comparison, compiling green.
@estebanzimanyi estebanzimanyi force-pushed the feat/generated-dispatch branch from 1cb68f7 to de62e78 Compare June 13, 2026 12:01
estebanzimanyi added a commit to estebanzimanyi/MobilitySpark that referenced this pull request Jun 13, 2026
Pick up the generator's dispatch pass (MobilityDB#28): adds GeneratedUdfs_portable_comparison
with the 18 contract comparison bare names (everEq..everGe / alwaysEq..alwaysGe /
tempEq..tempGe) wrapping the MEOS superclass *_temporal_temporal entrypoints.
Compiles green; the bare names register alongside the hand PortableOperatorAliasUDFs.
Three fixes so the catalog-generated UDF surface compiles against the
pin-12l (4408) JMEOS jar:
- Restore the runtime substrate MeosMemory/MeosThread/MeosNative (memory +
  native-init helpers the generated UDFs depend on). These are infrastructure,
  not hand-written UDF surface, and were dropped with the hand layers.
- Drop the legacy org.mobiltydb typo-package placeholders (Main/PowerUDF/UDT)
  that imported the long-gone jmeos.functions package.
- Fix the generator (tools/codegen_spark_udfs.py) to CLEAN its output dir before
  emitting: a prior run's classes (a function later excluded by the jar
  arity/kind cross-check, or a now-empty/renamed group) otherwise linger and
  silently break the build.
The whole Spark UDF surface is now GENERATED from the catalog at build time
(North Star: bindings generated from MEOS, none hand-written or committed):
- exec-maven-plugin runs tools/codegen_spark_udfs.py at generate-sources,
  reading the vendored 4408 catalog (tools/meos-idl.json) + the org.jmeos:meos
  jar's actual symbols, emitting to target/generated-sources/spark.
- build-helper-maven-plugin adds that as a source root.
- target/ is gitignored; no generated .java is committed.
'mvn clean compile' produces 195 group classes (2216 1:1 UDFs) and BUILDs green.
NOTE: the org.jmeos:meos:1.0 jar must be in the local repo (mvn install:install-file
from the JMEOS build); CI installs it before the Spark build.
GeneratedSurfaceTest registers the catalog-generated UDFs
(GeneratedSpatioTemporalUDFs.registerAll) in a real SparkSession and asserts
results across families via spark.sql against libmeos: temporal_num_instants==3,
tint_start_value==1, tint_out renders the values, tnumber_integral is finite —
the safety gate proving the generated surface binds and executes before the hand
UDF layers (MobilityDB#22/MobilityDB#24/MobilityDB#25/MobilityDB#26) are retired. Adds junit-jupiter + surefire (fork per
class, JDK17 --add-opens) and bumps jnr-ffi to 2.2.17.

NOTE: these Spark-integration tests require JDK 17 — Spark 3.4 cannot init on
JDK 21 (DirectByteBuffer.<init>(long,int) removed) and the Java-17 JMEOS jar
cannot load on JDK 11. CI must run the Spark build/test on JDK 17.
@estebanzimanyi estebanzimanyi force-pushed the feat/generated-dispatch branch from 07da65e to 6b0ddd7 Compare June 14, 2026 08:12
The generator skipped every *_in WKT parser (tint_in/tfloat_in/tgeompoint_in/
geo_from_text-style) because 'char' was deferred in INTERNAL, so the generated
surface could operate on hex but not PARSE literals. jnr already marshals a single
const char* as a Java String, so map it through directly: arg_kind now emits a
StringType pass-through for 'char *'. Coverage 50% -> 53% (+98 UDFs incl. the
parsers). GeneratedSurfaceTest adds a full WKT parse->operate round-trip driven
only by generated UDFs (tint_in -> num_instants==3, tint_out renders) — 4/4 green
on JDK17. Also gitignore the default src/ generate location (build-time gen writes
to target/; a manual run with the default --out would otherwise duplicate classes).
Extends GeneratedSurfaceTest with the extended-type families (tcbuffer 2 instants
+ tcbuffer_radius non-null; tnpoint 2 instants) — widening the runtime safety net
across families before the hand UDF layers are retired. 5/5 green on JDK17.
(H3 is excluded: the installed libmeos is built H3-OFF, so th3index symbols are
absent at runtime; covered once an H3-built libmeos is available.)
- Map uint64_t <-> Java long (jnr) -> Spark LongType for both args and returns.
  Emits +61 1:1 UDFs (53%->54%), mostly the H3Index family; binding-verified by
  the jar arity/kind cross-check + compile (runtime-covered once libmeos is built
  -DH3=ON). Left 'long' UNMAPPED: the catalog uses it ambiguously for things the jar
  treats as Pointer/OffsetDateTime, which a blanket long->LongType would miscompile.
- Default --out to target/generated-sources/spark (the maven build dir), NOT the
  src/ source root: a bare 'python3 codegen' run could otherwise write generated
  classes into src/ that linger past 'mvn clean' (which only cleans target/) and
  duplicate-compile. Build-time generation owns target/.
Extend the generator's dispatch pass beyond the comparison families to the
whole portable operator contract: topology (overlaps/contains/contained/
adjacent), same, time position (before/after/overbefore/overafter), and
space Y/Z all emit once over a MEOS superclass entrypoint
(*_temporal_temporal / *_tspatial_tspatial), and distance maps tdistance ->
tdistance_tgeo_tgeo, nearestApproachDistance -> nad_tgeo_geo. Space X
(left/right/overleft/overright) is the only axis-ambiguous family: a thin
UdfMarshal.axisBool classifier inspects whether arg1 is a tnumber and
selects the value-axis vs the X-axis backing -- the operator's own existing
MEOS symbols, no operator logic. 41/41 contract bare names now generated,
reproducing the complete hand PortableOperatorAliasUDFs surface.

Remove the dead hand-written MeosNative jnr binding (444 lines): the
generated UDFs and the MeosMemory/MeosThread substrate call
functions.GeneratedFunctions from the org.jmeos:meos jar, and nothing
references MeosNative -- it is leftover hand binding superseded by JMEOS.

GeneratedSurfaceTest: add portable_bare_name_dispatch_surface exercising one
operator per family (overlaps/same/overbefore/tempEq/everEq/overleft/
tdistance); 6/6 green on JDK17.
The *_as_hexwkb / *_as_ewkb serializers return char* with a trailing
size_t *size_out out-param that JMEOS swallows (it returns the buffer/String
directly), and take an `unsigned char variant` flag that JMEOS maps to byte.
Both were generator exclusions: classify() now drops a trailing non-const
size_t* (the canonical buffer-length out-param), and unsigned char maps to
ByteType (removed from the INTERNAL skip set). +4 1:1 UDFs -- temporal/span/
set/spanset _as_hexwkb -- 2398 -> 2402. This closes the last canonical-name
gap behind the BerlinMOD bench's asHexWKB usage (the other five bench names
already map: atTime->temporal_at_tstzspan, eDwithin->edwithin_tgeo_tgeo,
eIntersects->eintersects_tgeo_tgeo, trajectory->tpoint_trajectory,
nearestApproachDistance is a portable bare name).

GeneratedSurfaceTest: add as_hexwkb_family_with_swallowed_size_out_param
(a hex round-trip through temporal_as_hexwkb). 7/7 green on JDK17.
…spatch

Every catalog function carries the canonical MobilityDB SQL name in its @sqlfn
tag (numInstants, eIntersects, atTime, asHexWKB, nearestApproachDistance ...)
— the name users and the portable BerlinMOD suite actually call. Emit each UDF
under its @sqlfn name; where one @sqlfn maps several C overloads that share a
marshalled signature (eIntersects <- eintersects_tgeo_tgeo / _tgeo_geo /
_geo_tgeo), emit ONE UDF that dispatches to the matching backing by parsing each
hex-WKB / WKT arg at runtime (parse-all-then-match, leak-free on every path).

325 @sqlfn names (80 arg-kind-dispatched); 1:1 surface 2402 -> 2725 (62%). The
whole pass is data-driven from the catalog — zero hand cases — and made safe by:
- isHex guard: MEOS's hex decoder CRASHES (segfaults) on non-hex input, so a
  *_from_hexwkb parser is only tried when the String is hex; a WKT literal falls
  through to the geo_from_text candidate instead of crashing the JVM.
- subtype dedup: overloads differing only by temporal subtype (tdistance_tgeo_tgeo
  vs _tnpoint_tnpoint) parse identically and can't be told apart — keep one per
  parse-kind tuple, preferring the tgeo/geo family the suite uses; siblings stay
  reachable under their C name.
- parser-safety filter: a dispatcher is emitted only when every overload
  discriminates via hex-WKB / WKT; text *_in overloads (stbox/tbox/cbuffer/npoint/
  pose) are dropped from dispatch (so nearestApproachDistance keeps just tgeo_tgeo
  / tgeo_geo), 12 wholly-unsafe names left to their C names (logged, not guessed).
- ever/always predicate convention: <e|a><Verb> @sqlfn returning int in C is
  boolean in SQL -> marshal to Boolean (== 1).

Drop the now-redundant portable distance pass (tdistance / nearestApproachDistance):
its single tgeo_geo backing wrongly nulled a trip-vs-trip call; the @sqlfn pass
supersedes it with correct arg-kind dispatch.

GeneratedSurfaceTest: add sqlfn_canonical_names_with_argkind_dispatch (eIntersects
tgeo/tgeo + tgeo/WKT, eDwithin 3-arg, nearestApproachDistance tgeo/tgeo, trajectory,
numInstants). 8/8 green on JDK17.
Regenerated meos-idl.json from MobilityDB ecosystem pin 14h (80ddc3d6c) with
the consolidated MEOS-API extractor (sqlfn-name-map + camelCase comparison
families + jsonb/jsonpath recovery + doxygen groups + the new SQL-arity pass).
The stale vendored catalog predated several upstream fixes; this bump flows
them into the generated Spark surface:

- eintersects_tgeo_geo now maps to @sqlfn eIntersects (was the copy-pasted
  aIntersects, MobilityDB #1200) -> eIntersects(tgeo, geometry) is restored,
  and the combined ea_* impls are tagged meos_internal (#1206) so eIntersects
  registers as the correct 2-arg UDF instead of leaking the 3-arg ea_ form.
- @InGroup doxygroups present -> the 618 genuinely-internal functions are
  excluded again (coverage reads an honest 61%, not the inflated 66% from a
  groupless catalog that leaked internals).
- sqlArity / sqlArityMax attached (MEOS-API#1) for the eventual SQL-faithful
  arity pass (trajectory/asHexWKB flag + out-param args).

GeneratedSurfaceTest 8/8 green on JDK17; eIntersects/eDwithin/nad/tempEq all
register and dispatch correctly against the refreshed surface.
The branch had no CI and a Spark 3.4 / Java 17 pom that diverged from the
repository's Java 21 / Spark 3.5 matrix (Spark 3.4 cannot initialise on
Java 21). Bump the pom to Spark 3.5.1 + Java 21, and add a Maven CI workflow
(Linux) that builds libmeos from ecosystem pin 14h (the same commit the
vendored catalog and the bundled JMEOS jar are generated against), installs
the JMEOS jar as org.jmeos:meos:1.0, then runs the catalog generator and the
GeneratedSurfaceTest. 8/8 green locally on Java 21 / Spark 3.5.1.

Bundles libs/JMEOS.jar (the generator reads its symbols at build time and the
UDFs call it at runtime; not on Maven Central).
Two defects surfaced by a Spark run of the canonical BerlinMOD suite:

1. SIGSEGV on wrong-type hex-WKB. A *_from_hexwkb parser reads the WKB layout
   for ONE type family and crashes (not returns null) on a valid-hex buffer of
   another family — e.g. a tstzspan hex (from timeSpan()) fed to
   temporal_from_hexwkb. The isHex guard only caught non-hex input. Fix: read
   the WKB type byte (byte1 = the MeosType enum value, sets generated from the
   catalog) and only call the C parser when the family matches, via type-safe
   UdfMarshal.{t,span,spanset,set}FromHex wrappers the PARSE map now uses. A
   foreign hex returns null instead of crashing, so the arg-kind dispatchers can
   safely try every candidate. Topology (overlaps/contains/contained/adjacent)
   now dispatches *_span_span vs *_temporal_temporal, so overlaps(tstzspan,
   tstzspan) routes to span_overlaps instead of crashing.

2. C-faithful arity instead of SQL. asHexWKB/trajectory were emitted at the C
   arity (the WKB variant / the linear-flag), so asHexWKB(temporal) failed
   WRONG_NUM_ARGS. Fix: consume the catalog's sqlArity (from MEOS-API#1) — expose
   the SQL-required args and supply HIDE_DEFAULT literals for the optional
   trailing flags, in both emit_single and emit_dispatch.

GeneratedSurfaceTest 8/8 green on Java 21/Spark 3.5; the probe re-run shows 0
crash markers (was 19) and asHexWKB/trajectory resolve 1-arg.
estebanzimanyi added a commit to estebanzimanyi/MobilitySpark that referenced this pull request Jun 14, 2026
Picks up MobilityDB#28's type-safe UdfMarshal.*FromHex parsers (no SIGSEGV on a foreign
hex-WKB family) + topology span/temporal dispatch + sqlArity-faithful arity, so
the canonical BerlinMOD suite runs without the overlaps crash and asHexWKB/
trajectory resolve 1-arg.
estebanzimanyi added a commit to estebanzimanyi/MobilitySpark that referenced this pull request Jun 14, 2026
Picks up MobilityDB#28's type-safe UdfMarshal.*FromHex parsers (no SIGSEGV on a foreign
hex-WKB family) + topology span/temporal dispatch + sqlArity-faithful arity, so
the canonical BerlinMOD suite runs without the overlaps crash and asHexWKB/
trajectory resolve 1-arg.
14i folds the WKB-reader type validation (#1212) that fixes the SIGSEGV on a
wrong-family hex-WKB buffer (the overlaps(span,span) / asHexWKB-dispatch crash
the BerlinMOD probe surfaced) at the MEOS source. The vendored catalog is
unchanged (14h == 14i catalog-wise; #1211/#1212 are .c-only), so only the
runtime libmeos moves.
atTime/minusTime are polymorphic over the time arg (timestamptz / tstzspan /
tstzset / tstzspanset), but Spark cannot overload a UDF name. The @sqlfn pass now
recognizes that pattern (overloads include temporal_<op>_timestamptz AND
temporal_<op>_tstzspan) and emits ONE String-arg UDF that classifies the arg at
runtime: "[..]/(..)" -> tstzspan_in, "{..}" -> tstzset_in / tstzspanset_in, else
a bare timestamp -> pg_timestamptz_in. Every branch parses through a MEOS function,
so the time/TZ resolution is ecosystem-uniform.

TZ is owned by MEOS (like SRID via geo_from_text): UdfMarshal.tsOdt now parses a
String timestamp via pg_timestamptz_in(text, -1) (which returns the OffsetDateTime),
not a Java/Spark offset fixup. No binding-level timestamp parsing.

GeneratedSurfaceTest: atTime over a period and over a bare timestamp both resolve.
8/8 green on Java 21 / Spark 3.5.
MEOS keeps locale/collation, session timezone, PROJ context and RNGs in
thread-local storage, so every thread that runs a UDF (Spark executor threads
run off the thread that called meos_initialize()) must run the per-thread init
guard before its first MEOS call. A missing guard is exactly what crashed
MobilityDuck's table functions (varstr_cmp on a garbage thread-local locale),
flaky-by-worker-thread and masked by ASan/valgrind/gdb thread serialization.

Every emit path already routes through MeosThread.ensureReady() (inline, or via
axisBool/restrictTime which call it first). Add a build-failing invariant so a
future emit path can't silently drop it as the surface is regenerated: after
writing the classes, scan every emitted udf().register(...) and require a guard
before the first GeneratedFunctions call, else print the offenders and exit 1.
…enerated

There is not a single hand-registered UDF: the entire surface is the generated
GeneratedSpatioTemporalUDFs.registerAll. The MeosThread.wrap() UDF1/2/3 helpers
(and the usage example) only existed to hand-wrap registrations and have zero
callers. Remove them and the now-unused org.apache.spark.sql.api.java import,
leaving only the per-thread guard (ensureReady / NOEXIT_ERROR_HANDLER) the
generated entry points use.
Re-vendor tools/meos-idl.json from the MEOS-API catalog regenerated against
ecosystem-pin-2026-06-14l (de8b322483, composed from the deliverable PRs: @sqlfn +
comparison-family aliases + jsonb recovery + doxygroups + sql-arity), drop in the
matching 14l JMEOS jar (4466 methods, 2-arg count accessors), and bump the CI pin.

Generated surface grows to 2731 UDFs; the per-thread MEOS-init invariant passes and
GeneratedSurfaceTest is 8/8 green against libmeos 14l.
The MEOS *_tgeoarr_tgeoarr kernels take Temporal** array args (so the 1:1 and
@sqlfn passes excluded them as `**`->internal), which blocked the index-less NxN
BerlinMOD queries on Spark. Add a generic array template that covers any such
kernel, not just these:

- tgeoarr_shape() recognizes (Temporal**, int) array pairs + optional double dist +
  out-params (int *count, SpanSet ***periods); emit_tgeoarr() emits one UDF per
  @sqlfn name. Each array pair becomes a Spark array<string> (count = array length).
- Scalar-return kernels -> Double UDF (minDistance).
- int*-return kernels -> array<struct<i,j[,periods]>> consumed via LATERAL explode
  (eDwithinPairs / tDwithinPairs / aDisjointPairs). The C kernel returns 0-based C
  indices (it is the PG SETOF wrapper that makes them 1-based), and the UDF calls the
  kernel directly via JMEOS, so the indices are already 0-based array offsets.
- UdfMarshal gains asStrArray (scala Seq/WrappedArray/List -> String[]), tArrNative
  (parse hex array -> native Temporal**), and readPairs/readPairsPeriods.

GeneratedSurfaceTest covers all four (minDistance=0 self, eDwithin/tDwithin one [0,0]
pair, tDwithin periods non-null, aDisjoint none) — 9/9 green; per-thread guard verified.
Re-vendor tools/meos-idl.json from the MEOS-API catalog regenerated against
ecosystem-pin-2026-06-14m (f9ab19a31, the array kernels + debug-build fix) and bump
the CI pin. The JMEOS surface is function-identical to 14l (4466 fns, none added or
removed), so the JMEOS jar is unchanged. The array-in NxN UDFs (eDwithinPairs /
tDwithinPairs / aDisjointPairs / minDistance) and th3index are all emitted; 9/9 green.
Proves the generated array<struct> UDFs are consumed via Spark's explode (the q06/q10
shape) — explode(eDwithinPairs(...)) yields the [0,0] pair as one row, indices indexing
back into the 0-based collect_list arrays.
A bare SQL literal like 10.0 is a Spark decimal (BigDecimal), not a double, so the
canonical q06/q10 (eDwithinPairs/tDwithinPairs(..., 10.0)) hit ClassCastException on a
Double dist param. Take dist as Object and coerce with ((Number) dist).doubleValue() so
bare numeric literals work, not just CAST(.. AS DOUBLE). Test covers the bare-literal case.
Re-vendor the catalog off pin 15a (4bfc0de8d, RFC #1173 §9 operator dialect): the
comparison operators are now eEq/aEq/tEq (?=/%=/#=), plus overlaps/distance/tDistance/
tAdd/setUnion/tConcat. The generator is data-driven (reads byOperator + @sqlfn from the
catalog), so it emits the new dialect with NO code change.

Make GeneratedSurfaceTest data-driven too: the dispatch assertions now read each bare
name from the catalog's byOperator map (op("?=") etc.) instead of hard-coding eEq/tEq/
tDistance — so a dialect rename updates the test automatically instead of breaking it.
9/9 green against libmeos 15a; full byOperator conformance (every dialect name registered).
Re-vendor the 15c catalog (d875308e) and the matching JMEOS jar (4466->4469: adds
the eintersects_tpcpoint_geo / nad_tpcpoint_geo pointcloud spatial rels). The operator
dialect is unchanged from 15a (eEq/tEq/tDistance/...); the generator + data-driven test
need no change. 9/9 green vs libmeos 15c.
Re-vendor the 15d catalog (0efd3e2a): the h3 conversion @sqlfn tags are now the canonical
empty-parens form, so geoToH3IndexSet extracts and is generated — the th3 columnar prefilter
load.sql (geoToH3IndexSet(geom, R) → geom_h3) now resolves on Spark. The C surface is unchanged
from 15c (the fix is SQL-name-only), so the JMEOS jar is reused. 9/9 green vs libmeos 15d.
15d's pinned MEOS tree did not compile — meos/test/npoint_test.c was two test
programs concatenated (duplicate main + stale pre-rename API), so the binding CI's
full-tree 'cmake --build' failed (libmeos-only local builds masked it). 15e restores
the canonical npoint_test.c; the full MEOS tree incl. test executables builds clean
(342/342, -Werror), verified locally the way CI builds. The C/IDL surface is byte-
identical to 15d (0 fns added/removed, 0 sig changes), so the catalog and JMEOS jar
are unchanged — this commit only repoints the pin tag. 9/9 green vs libmeos 15e.
…overload, hex-reject guard

Running the BerlinMOD th3 cell-set prefilter (q02/q04/q11–q17) surfaced three generator
gaps that left it non-functional or crashing:

1. Geometry args were parsed with geo_from_text(s, 0) — WKT-only, SRID 0. The canonical
   wire form is EWKT ("SRID=4326;POLYGON(...)"), so the SRID was dropped and H3
   (geo_to_h3index_set → ensure_srid_is_latlong) rejected SRID 0, leaving geom_h3 NULL and
   the whole prefilter empty. New UdfMarshal.geoFromText splits the "SRID=N;" prefix and
   passes N as geo_from_text's srid argument (plain WKT still parses at SRID 0).

2. The eEq/eNe comparison dispatch wrapped only the *_temporal_temporal superclass, so the
   prefilter's eEq(h3indexset, th3index) — whose first arg is a Set*, not a Temporal* —
   was unreachable and returned NULL. The comparison pass now folds in the non-temporal
   overloads (geo/set first arg) that share the (P,P)->Boolean signature with a distinct
   parse-tuple, emitting one parse-dispatching UDF.

3. geo_from_text also accepts hex-(E)WKB, so a temporal hex arg (a tgeompoint trip) handed
   to a (geo, tgeo) candidate was MIS-READ as a geometry — garbage npoints / free() crash.
   geoFromText now rejects any pure-hex string up front (a real WKT/EWKT always has non-hex
   chars), so a foreign WKB hex cleanly falls through to the next dispatch candidate.

Verified vs libmeos ecosystem-pin-2026-06-15e: GeneratedSurfaceTest 9/9 green; the th3
prefilter q02 now runs end-to-end (geoToH3IndexSet builds 100 region cell-sets, th3index
builds trip cells, eEq prefilters 16425 candidate pairs, eIntersects refines them).
estebanzimanyi added a commit to estebanzimanyi/MobilitySpark that referenced this pull request Jun 15, 2026
…overload, hex-reject guard

Vendored-generator sync with MobilitySpark MobilityDB#28: UdfMarshal.geoFromText (EWKT SRID-preserving,
hex-rejecting) for geo args + the eEq/eNe comparison dispatch now folds in the non-temporal
overloads (eEq(h3indexset, th3index) — the th3 cell-set prefilter). The prefilter q02/q04/
q11–q17 build geoToH3IndexSet/th3index/eEq and run end-to-end. 9/9 green vs libmeos 15e.
…oH3Cell)

15f restores the MEOS_TLS error-handler/PROJ/collation thread-safety that a favor-HEAD
merge had silently dropped — fixing the geo_from_text SIGSEGV the th3 bench hit under Spark
local[*] (now runs clean: q02 local[*] 15.5s, exit 0, vs crash on 15e). The h3 conversion
@csqlfn fix (#1204) also lands geoToH3Cell on the surface (both h3 conversions now emit).
Catalog: 4457 fns (15f folds all 93 open PRs; the 12 dropped bbox topological ops auto-fall
out of the surface). JMEOS jar reused (15f fns ⊆ 15e — superset). 9/9 green vs libmeos 15f.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant