EVCacheValue: opt-in compact binary serialization with backwards-compatible reads#196
EVCacheValue: opt-in compact binary serialization with backwards-compatible reads#196joegoogle123 wants to merge 1 commit into
Conversation
d443e30 to
3892873
Compare
d46bf97 to
3444f24
Compare
Introduces a length-prefixed binary envelope for EVCacheValue, which EVCache wraps around values when the canonical key is hashed (see EVCacheImpl.getEVCacheKey). Compared to the default Java ObjectOutputStream encoding it is materially smaller on the wire and avoids the reflective decode path. Wire format: [magic 0x0C][reserved 0x00] [int keyLen][key UTF-8 bytes] [int valLen][value bytes] [int flags][long ttl][long createTime] [...optional extension fields appended by newer writers...] - Opt-in per app via FastProperty <app>.envelope.binary.serialization.enabled (default false). Existing Java-serialized items still decode -- the reader is dual-format, so there is no wire break for clusters with in-flight cached items. - Forward-compat for additive optional fields: append at the end, gate with buffer.hasRemaining() in the reader, supply a graceful default when absent. - Breaking changes route through the reserved/version byte at byte 1 with reader-before-writer rollout (see class javadoc). - Bounds-checked length prefixes return null on bogus input, matching BaseSerializingTranscoder's resilience contract. Tests cover binary round-trip across empty/large/unicode/extremes, dual-format read, transcoder routing, malformed-input handling, and a pinned v0 byte array trip-wire so future required-field adds can't be missed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
75d8249 to
40446ae
Compare
| final boolean useBinarySerialization = propertyRepository.get(_appName + ".envelope.binary.serialization.enabled", Boolean.class) | ||
| .orElseGet("evcache.envelope.binary.serialization.enabled").orElse(false).get(); | ||
| final int maxValueSize = propertyRepository.get("default.evcache.max.data.size", Integer.class).orElse(20 * 1024 * 1024).get(); | ||
| this.evcacheValueTranscoder = new EVCacheTranscoder(maxValueSize, ENVELOPE_COMPRESSION_DISABLED, useBinarySerialization); |
There was a problem hiding this comment.
is it intentional to change the compression threshold from Integer.MAX_VALUE to default.evcache.max.data.size?
There was a problem hiding this comment.
sorry let me rephrase. I mean the missing evcacheValueTranscoder.setCompressionThreshold(Integer.MAX_VALUE);
it might be by mistake in the past since I also saw default.evcache.compression.threshold in the EVCacheTranscoder constructor. But just wanted to raise that after this change, there can be a behavior difference to the existing clusters. (changing from INT_MAX to the default 120)
There was a problem hiding this comment.
setCompressionThreshold is called in the constructor. I think this should preserve the existing behavior
this.evcacheValueTranscoder = new EVCacheTranscoder(maxValueSize, ENVELOPE_COMPRESSION_DISABLED, useBinarySerialization);
Can you elaborate on how we could erroneously go from Integer.MAX_VALUE?
There was a problem hiding this comment.
I might have mis-read the code... One last nit: the naming ENVELOPE_COMPRESSION_DISABLED seems a bit misleading...? (sounds like a boolean)
| evcacheValueTranscoder.setCompressionThreshold(Integer.MAX_VALUE); | ||
| // Whether the EVCacheValue envelope (hashed keys) is written using the compact binary format | ||
| // instead of native Java serialization. | ||
| final boolean useBinarySerialization = propertyRepository.get(_appName + ".envelope.binary.serialization.enabled", Boolean.class) |
There was a problem hiding this comment.
nit: should we get this property also in EVCacheTranscoder like the other 2 properties there?
b7fdc7f to
40446ae
Compare
Summary
Hashed-key values are wrapped in an
EVCacheValueenvelope(key, value, flags, ttl, createTime)that is currently serialized with JavaObjectOutputStream, adding ~50–80 bytes of structural overhead per item. This adds a compact, length-prefixed binary format for the envelope while remaining fully backwards-compatible on reads.📊 See
benchmark.mdfor live A/B measurements — under realistic secure configuration (use.secure=true+ clusterevcacheAuthZEnforce=true) the binary envelope saves 317 bytes / 63.4% per item. No write failures, no read-latency regression in either security mode.What changed
EVCacheValueSerdeclass (com.netflix.evcache.pool) — public-final-non-instantiable codec, owns the wire format and all error handling:static byte[] serialize(EVCacheValue)— length-prefixed binary layout:[magic 0x0C][reserved 0x00][int keyLen][key UTF-8][int valLen][value][int flags][long ttl][long createTime].static EVCacheValue deserialize(byte[])— bounds-checks length prefixes before allocating; on any corruption / unexpected exception warn-logs the failing field and a truncated hex dump (via Apache CommonsHex.encodeHexString, capped at 1024 bytes) and returnsnull. MatchesBaseSerializingTranscoder's resilience contract (corruption → cache miss → caller refills from source of truth) so a single corrupt entry never crashes a get / getBulk / async pipeline.static boolean isBinaryFormat(byte[])— exposed for the dispatcher.EVCacheTranscoderbecomes a thin dispatcher (no try/catch):serialize: gates onuseBinarySerialization && o instanceof EVCacheValue→EVCacheValueSerde.serialize; elsesuper.serialize(Java).deserialize: dispatches onEVCacheValueSerde.isBinaryFormat→EVCacheValueSerde.deserialize; elsesuper.deserialize.EVCacheValuestays a pure POJO (codec moved out; constructor unchanged from pre-PR).EVCacheImplreads a Feature Property at client construction and injects it into the (immutable) envelope transcoder.0xAC 0xED= legacy Java,0x0C= binary), so a new client decodes existing cache entries unchanged.Format-flag decision (reuse
SERIALIZED+ magic byte, not a fresh flag)The binary envelope keeps the existing
SERIALIZEDflag and is disambiguated from Java by the leading byte, rather than allocating a newCachedDataflag. Rationale:SERIALIZEDsemantically still means "serialized object →deserialize()"; the codec choice (Java vs binary) lives insidedeserialize(). No flag constant is reassigned or repurposed, anddecode()branch order is untouched.SERIALIZED(e.g. the admin inspector, cache-warmer) keep working without a new flag constant to propagate.SERIALIZEDthrowsStreamCorruptedException(fails loud) rather than silently decoding garbage — which a fresh low-byte flag would cause (decodeString) on old readers.EVCacheValueSerdeJavadoc):SERIALIZEDpayloads are self-describing by leading byte; a future third format must use a distinct non-colliding magic + the reserved version byte.Reserved version byte
Byte index 1 of the binary payload is reserved (always
0x00today). Reader read-and-ignores; not validated. Reason: forward-compat without an emergency reader rollout. If today's readers rejected any non-zero version, introducing a v2 in the future would require shipping reader support fleet-wide before any writer could emit a v2 byte, and a single misconfigured writer would crash all readers. By accepting any value today, future readers can branch on this byte to introduce breaking format changes backwards-compatibly.Feature Property (rollout gate)
<appName>.envelope.binary.serialization.enabled(global fallbackevcache.envelope.binary.serialization.enabled), defaultfalse.envelopeTranscoderinEVCacheMemcachedClient).Compatibility
Benchmarks
Full results in
benchmark.md. Four independent measurements:EVCacheValueSerdeTest.measureBinaryVsJavaOverhead(unit)EvCacheBaselineOverheadSmokeTest(consumer app → live cluster)use.secure=false)Secure mode shows ~2× larger savings because the auth/identity fields in
EVCacheValueblow up the OOS class-descriptor cost. Latency was within noise both directions in both modes; no write failures and no read regression.Testing
EVCacheValueSerdeTest— 17 cases via the publicEVCacheTranscoder.encode/decodeAPI:0x0C, reserved byte0x000xAC 0xED) but reads both formatsObjectOutputStream-serialized envelope still decodesEVCacheValuepassthrough (ArrayList stays on the Java path even with binary flag on)Plus two new measurement tests that produce the benchmark numbers:
measureBinaryVsJavaOverhead— prints raw + gzipped envelope sizes for binary vs OOS across 7 key/value shapesmeasureOnRealMemcachedNode— env-gated (MEMCACHED_HOST=...) writes N items in each format to a real memcached node and reads back theSTAT bytes/STAT curr_itemsdeltaFull evcache-core suite (
./gradlew :evcache-core:test): 28/28 green (EVCacheValueSerdeTest 17, NodeLocatorLookupTest 3, MockEVCacheTest 7, plus runtime tests in other modules).Chunked-payload integration is not covered by an automated test in this PR — chunking lives in
EVCacheClient.createChunks/assembleChunks, which are content-opaque (byte copy + CRC + manifest) and require a live client to exercise. The binary format introduces no new chunking risk by construction:assembleChunksreassembles bytes byte-for-byte and CRC-checks them against the manifest before handing the result to the transcoder.🤖 Generated with Claude Code