[salesforce] Order EventLogFile queries by the cursor field (CreatedDate)#19954
Open
shmsr wants to merge 3 commits into
Open
[salesforce] Order EventLogFile queries by the cursor field (CreatedDate)#19954shmsr wants to merge 3 commits into
shmsr wants to merge 3 commits into
Conversation
…ate) The Apex, Login, and Logout EventLogFile queries sorted results by `LogDate` while tracking the collection cursor on `CreatedDate`. Those two fields are not correlated for EventLogFile records: Salesforce can create a log for an earlier `LogDate` period after one for a later period, so the last record in `LogDate` order frequently does not carry the maximum `CreatedDate`. Because the input watermarks the cursor from the last processed record, the stored `event_log_file.last_event_time` could be set below the newest `CreatedDate` already ingested, so the next poll re-collected data it had already fetched. Order these queries by `CreatedDate` (the cursor field), matching the already-correct SetupAuditTrail query, so the watermark only advances. The change is limited to the ORDER BY clause; the WHERE filter and cursor field are unchanged, so existing persisted cursors remain valid and no data is skipped on upgrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
✅ Elastic Docs Style Checker (Vale)No issues found on modified lines! The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
|
✅ All changelog entries have the correct PR link. |
💚 Build Succeeded
cc @shmsr |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Orders the Apex, Login, and Logout
EventLogFilequeries byCreatedDate(the cursor field) instead ofLogDate, in both thedefaultandvaluequeries.Background: two unrelated timestamps
Every
EventLogFilerecord has two datetime fields that mean different things:LogDate— the period the log covers (e.g. the start of the hour/day).CreatedDate— when Salesforce actually generated the file. This lagsLogDate, and the lag is variable (minutes to days), soCreatedDateorder does not followLogDateorder.These data streams track the collection cursor on
CreatedDate(cursor.field: CreatedDate) and resume withWHERE CreatedDate > <cursor>, but the queries sorted the results byLogDate:The bug, with an example
The input watermarks the cursor from the last record it processes in a page. When the page is ordered by
LogDate, that last record is not necessarily the one with the greatestCreatedDate.Consider two log files where the one covering the later period happened to be generated first:
2026-06-22T00:00:00Z2026-06-24T11:28:06Z2026-06-23T00:00:00Z2026-06-24T11:08:05ZORDER BY LogDate ASCreturns them as #1 then #2. The input processes #1 (sets watermark11:28:06), then #2 (overwrites watermark with11:08:05). So the storedlast_event_timeends at2026-06-24T11:08:05Z— earlier than a record it already ingested (11:28:06).On the next poll:
…record #1 (
11:28:06) matches again and is re-collected. The cursor effectively lags behind the data and re-fetches already-ingested files each poll.This is not a contrived case — sorting real
EventLogFileresults byLogDateproduces multiple suchCreatedDateinversions per page whenever files are generated slightly out of period order (common for hourly logs).The fix
Order by the cursor field so the last record always carries the maximum
CreatedDate:With the example above,
ORDER BY CreatedDate ASCreturns#2then#1, so the watermark ends at2026-06-24T11:28:06Z(the true maximum) and the next poll (CreatedDate > 11:28:06) does not re-collect either file. This matches the already-correctSetupAuditTrailquery, which orders by its cursor field.Why is it important?
Prevents repeated re-collection of already-ingested
EventLogFiledata and keeps the collection cursor moving strictly forward.Compatibility / upgrade safety
The change is limited to the
ORDER BYclause. TheWHEREfilter andcursor.fieldare unchanged, so:CreatedDatevalues and are interpreted identically.Checklist
changelog.ymlfile.How to test this PR locally
elastic-package lintinpackages/salesforce.elastic-package test system -vfor theapex,login, andlogoutdata streams (the mock server query matchers in_dev/deploy/docker/files/config.ymlare updated to match the new ordering).