Skip to content

Parameterize offline-storage deletes and harden kill-switch response handling#1467

Open
bmehta001 wants to merge 19 commits into
microsoft:mainfrom
bmehta001:bhamehta/offline-storage-parameterized-deletes
Open

Parameterize offline-storage deletes and harden kill-switch response handling#1467
bmehta001 wants to merge 19 commits into
microsoft:mainfrom
bmehta001:bhamehta/offline-storage-parameterized-deletes

Conversation

@bmehta001

@bmehta001 bmehta001 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Builds OfflineStorage_SQLite::DeleteRecords as a parameterized prepared statement and tightens kill-switch response handling.

Changes

  • OfflineStorage_SQLite::DeleteRecords now builds a parameterized statement: column names are taken from a fixed whitelist and values are bound via sqlite3_bind_* on a single prepared statement instead of being concatenated into the SQL text. This is more robust and avoids quoting/escaping edge cases for values such as tenant tokens. Prepare/bind/execute results are checked and logged (with sqlite3_errmsg) on failure, and a non-numeric value for an integer filter column aborts the delete instead of coercing to 0.
  • DeleteRecords no longer issues a DELETE with an empty predicate, and fails closed (deletes nothing) on an unrecognized filter column.
  • KillSwitchManager::handleResponse rejects kill-token values containing control characters (and over-long values) before storing them, while keeping opaque tokens — which may legitimately contain spaces/quotes — killable.
  • KillSwitchManager::handleResponse parses the Retry-After and kill-duration response headers through a non-throwing helper that accepts only an RFC 7231 1*DIGIT value surrounded by HTTP OWS (SP/HTAB) and clamps the result to a safe maximum, so a malformed value (the HTTP-date form of Retry-After, a leading sign, stray CR/LF, or an overflow-inducing magnitude) is ignored rather than propagating out of the SDK worker thread or wrapping the computed expiry time.
  • Adds OfflineStorageTests_SQLite regression tests for the parameterized DeleteRecords: deletion by tenant_token (including unusual characters and a SQL-injection attempt), deletion by a numeric column, multi-column AND filters, and fail-closed behavior on an empty filter, an invalid numeric value, and an unrecognized column.
  • Adds KillSwitchManagerTests covering valid input plus malformed Retry-After / kill-duration values (non-numeric, HTTP-date, out-of-range, trailing/leading garbage, leading sign, huge/clamped), control-character vs opaque kill-tokens, and :all suffix stripping. These tests now also build and run on the Android and iOS native test targets.

Validation

  • KillSwitchManagerTests — 17 tests (the full new suite): valid Retry-After/kill-token input; malformed Retry-After/kill-duration (non-numeric, HTTP-date, out-of-range, trailing garbage, trailing/leading OWS, trailing CR/LF, leading sign, huge-clamped); control-character vs opaque kill-tokens; and :all suffix stripping. Now built/run on the host unit tests, Android, and iOS.
  • OfflineStorageTests_SQLite — 32 tests total, of which 7 exercise the parameterized DeleteRecords: SQL-injection attempt, delete-by-tenant_token (incl. unusual chars), delete-by-numeric-column, multi-column AND, empty filter, invalid numeric filter, and fail-closed on an unrecognized column.
  • The core parameterized-delete subset was validated locally on Windows x64 Debug; the full suites (including the additions above) run in CI across the host unit-test platforms, with the kill-switch tests additionally covered on Android and iOS.

Build OfflineStorage_SQLite::DeleteRecords as a parameterized prepared statement: column names come from a fixed whitelist and values are bound via sqlite3_bind_* on a single statement instead of being concatenated into the SQL text. This is more robust and avoids quoting/escaping edge cases for values such as tenant tokens. DeleteRecords also no longer issues a DELETE with an empty predicate.

Validate kill-token values against the expected tenant-token character set before storing them, and add OfflineStorageTests_SQLite regression tests covering deletion by tenant_token (including values with unusual characters).

Validated: full OfflineStorageTests_SQLite suite (27 tests) passes on Windows x64 Debug, including the new tests.

Files changed:

- lib/offline/OfflineStorage_SQLite.cpp

- lib/offline/KillSwitchManager.hpp

- tests/unittests/OfflineStorageTests_SQLite.cpp

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 requested a review from a team as a code owner June 8, 2026 07:01
The Retry-After and kill-duration response headers were parsed with
std::stoi without guarding against non-numeric or out-of-range input.
Because the SDK worker thread executes queued tasks without an exception
guard, an unparseable header value (for example the standards-compliant
HTTP-date form of Retry-After permitted by RFC 7231) could let the
exception escape the worker thread and terminate the process. Route both
parses through a non-throwing helper that ignores values it cannot parse.

Add KillSwitchManager unit tests covering valid input plus malformed
Retry-After / kill-duration values and a rejected malformed kill-token.

Files changed:
- lib/offline/KillSwitchManager.hpp
- tests/unittests/KillSwitchManagerTests.cpp (new)
- tests/unittests/CMakeLists.txt
- tests/unittests/UnitTests.vcxproj
- tests/unittests/UnitTests.vcxproj.filters

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens offline-storage deletion and kill-switch handling by switching OfflineStorage_SQLite::DeleteRecords to a parameterized SQLite prepared statement, avoiding concatenation of untrusted values into SQL and preventing accidental table-wide deletes.

Changes:

  • Reworks OfflineStorage_SQLite::DeleteRecords to use a whitelisted set of column names and sqlite3_bind_* value binding, and to bail out when no valid predicate is provided.
  • Makes KillSwitchManager::handleResponse resilient to malformed Retry-After / kill-duration values and rejects malformed kill-token tenant IDs.
  • Adds unit tests covering deletion by tenant_token (including SQL-injection-shaped input) and new KillSwitchManager header parsing behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unittests/UnitTests.vcxproj.filters Adds the new KillSwitchManager unit test source to VS project filters.
tests/unittests/UnitTests.vcxproj Adds the new KillSwitchManager unit test source to the unit test project build.
tests/unittests/OfflineStorageTests_SQLite.cpp Adds regression tests validating tenant-token deletes and SQL-injection-shaped input handling.
tests/unittests/KillSwitchManagerTests.cpp Introduces coverage for Retry-After, kill-token, and kill-duration parsing/validation.
tests/unittests/CMakeLists.txt Adds KillSwitchManager tests to the CMake unit test target sources.
lib/offline/OfflineStorage_SQLite.cpp Implements parameterized DELETE statement creation/binding and avoids empty-predicate deletes.
lib/offline/KillSwitchManager.hpp Adds safe duration parsing helper and tenant-token character validation before storing kill tokens.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/offline/OfflineStorage_SQLite.cpp Outdated
Comment thread lib/offline/OfflineStorage_SQLite.cpp
Comment thread lib/offline/OfflineStorage_SQLite.cpp Outdated
Comment thread lib/offline/KillSwitchManager.hpp
@bmehta001 bmehta001 changed the title Use parameterized queries for offline storage deletes Parameterize offline-storage deletes and harden kill-switch response handling Jun 8, 2026
bmehta001 and others added 3 commits June 8, 2026 09:48
Describe what the token validation and response-header parsing do without
spelling out threat-model detail. No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Log the rejected column name when DeleteRecords ignores an
  unrecognized filter column, to aid diagnosing filter-key typos.
- Check sqlite3_bind_* return codes and abort the delete (logging
  sqlite3_errmsg) instead of executing with unbound parameters.
- Treat a non-numeric value for an integer filter column as an invalid
  filter and abort, rather than coercing to 0 and deleting rows that
  happen to match 0.
- Check the result of SqliteStatement::execute() and log on failure so
  a failed DELETE is not silent.
- KillSwitchManager::tryParseSeconds sets outSeconds=0 on parse failure
  so callers cannot read a stale value.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Self-review follow-up: MemoryStorage's matcher treats an unknown filter
column as "no match" and deletes nothing, but the SQLite path dropped the
unknown column and ran the remaining predicate. That could delete more
rows than intended and diverge from MemoryStorage even though
OfflineStorageHandler sends the same filter to both storages. Abort the
delete (remove nothing) when any filter column is outside the whitelist,
and add a regression test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread lib/offline/KillSwitchManager.hpp
tryParseSeconds used std::stoll without checking how many characters were
consumed, so a numeric prefix followed by garbage (e.g. "120; foo") was
accepted as 120 - contradicting the "ignore malformed values" contract and
potentially activating Retry-After / kill-duration from a malformed header.
Validate full consumption (allowing only trailing whitespace) and add
regression tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread lib/offline/KillSwitchManager.hpp Outdated
The header uses std::vector (handleResponse) and std::list (getTokensList)
but relied on transitive includes. Include them directly so the header is
self-contained and not sensitive to include order.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread lib/offline/KillSwitchManager.hpp Outdated
bmehta001 and others added 2 commits June 8, 2026 18:57
isValidTenantToken enforced an alnum + [-_.] allowlist, which would silently
drop legitimate kill-tokens: tenant tokens are opaque elsewhere in the SDK
(they may contain spaces/quotes) and the offline-storage DELETE is already
parameterized, so any value is safe to act on. Over-restricting meant a
stored tenant token could never be killed if the collector returned it
verbatim.

Relax the validator to reject only genuinely unsafe content -- control
characters (including CR/LF, to avoid log injection) and over-long values --
while accepting any other byte. Update tests: a control-char token is
rejected; an opaque token with quotes/spaces remains killable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread lib/offline/KillSwitchManager.hpp Outdated
bmehta001 and others added 2 commits June 9, 2026 11:39
Copilot (KillSwitchManager.hpp:192): tryParseSeconds tolerated any
std::isspace() whitespace (incl. CR/LF, form-feed, vertical-tab) after the
number, which is broader than HTTP OWS (SP / HTAB per RFC 7230). A value like
"120\r\n" was accepted as 120, contradicting the intent to reject malformed
header values.
  -> Replaced std::isspace with an explicit SP/HTAB check, so CR/LF and other
     control whitespace are treated as malformed and the header is ignored.
     Removed the now-unused <cctype> include. Added regression test
     handleResponse_RetryAfterWithTrailingCRLF_IsIgnored. The existing
     TrailingWhitespace test already used "120  " (spaces), so it still passes.
  Verified at lib/offline/KillSwitchManager.hpp:182-191; logic confirmed with an
  isolated compile (10/10 cases: "120  "->120, "120\t"->120, "120\r\n"->rejected).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 requested a review from Copilot June 9, 2026 16:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread lib/offline/KillSwitchManager.hpp Outdated
Copilot (KillSwitchManager.hpp:181): std::stoll skips leading C-locale
whitespace (incl. CR/LF) and accepts a leading '+'/'-' sign, so "+120" and
"\r\n120" parsed even though trailing CR/LF was rejected -- an asymmetric,
non-RFC7231 acceptance.
  -> Rewrote tryParseSeconds to trim only HTTP OWS (SP / HTAB) from both ends and
     require the remainder to be 1*DIGIT before calling std::stoll, so leading and
     trailing handling are symmetric and signs / CR-LF / garbage are all rejected.
     Added tests for leading OWS (accepted) and leading sign (ignored). Validated
     all edge cases via isolated compile (15/15: "  120"->120, "+120"->rejected,
     "\r\n120"->rejected, overflow->caught, no crash).
  Verified at lib/offline/KillSwitchManager.hpp.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comment thread lib/offline/OfflineStorage_SQLite.cpp
Comment thread lib/offline/KillSwitchManager.hpp
Comment thread lib/offline/KillSwitchManager.hpp
Copilot round 3 on b7abaee:
- KillSwitchManager.hpp:49 / :87 (overflow): getUtcSystemTime() + timeinSecs
  could overflow (signed-overflow UB) for a huge but in-range Retry-After /
  kill-duration, wrapping the computed expiry time. Both values flow through
  tryParseSeconds, so clamp its result to ~100 years (well below INT64_MAX and
  above any legitimate value); the timestamp addition can no longer overflow.
  Added test handleResponse_HugeRetryAfter_IsClampedNotWrapped.
- OfflineStorage_SQLite.cpp:459 (null handle): added an explicit
  stmt.handle()==nullptr guard before binding. sqlite3_bind_* on a null stmt
  already returns SQLITE_MISUSE (caught by the existing rc check), so this was
  not a crash; the guard makes the intent explicit and logs a clearer "failed
  to prepare" error. Verified SqliteStatement::handle() returns m_stmt
  (SQLiteWrapper.hpp:806).

Clamp validated via isolated compile (now + clamped no longer overflows).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread tests/unittests/KillSwitchManagerTests.cpp
Comment thread lib/offline/OfflineStorage_SQLite.cpp
Copilot (OfflineStorage_SQLite.cpp:462): the prepare-failure log added last
round omitted the underlying SQLite error text, unlike the bind/execute error
paths in the same function. Append g_sqlite3Proxy->sqlite3_errmsg(*m_db) so a
prepare failure (OOM, schema issues) is as diagnosable as the others.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

bmehta001 and others added 2 commits June 9, 2026 15:03
The comments justified accepting opaque tokens via the parameterized DELETE but
overstated ("any value is safe") and attributed the control-char rejection to
log injection, even though the kill-token value is not logged in this path.
Reword to the verified reasoning: every token sink (parameterized SQLite bind /
Room DAO DELETE, in-memory string compare) handles raw bytes safely, so opaque
tokens must remain killable; control characters and over-long values are rejected
as defensive hygiene -- notably an embedded NUL would be truncated by the Room
NewStringUTF(c_str()) call and act on the wrong token.

Comment-only change; no behavior or test impact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…te form

Expand the tryParseSeconds comment to explain the deliberate choice to accept
only delay-seconds and ignore the (RFC-legal) HTTP-date form of Retry-After:
the target collector sends delay-seconds; an absolute date would have to be
evaluated against an untrusted device clock (the SDK has a separate clock-skew
channel); a three-format date parser on the no-exception-guard worker thread
adds risk for no practical benefit; and ignoring an unparseable value degrades
safely to the SDK's own retry back-off.

Comment-only change; no behavior or test impact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
// token), and it keeps the value safe if it ever reaches a log/display sink.
static bool isValidTenantToken(const std::string& token)
{
if (token.empty() || token.size() > 256)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an authoritative 256-byte max for tenant tokens? I don't see the same limit enforced when apps create loggers, so this could make a valid stored tenant impossible to kill if its token is longer than 256 bytes.

@bmehta001 bmehta001 Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have messaged the Collector team to ask - I was just keeping a large upper bound bc my token is 74 bytes, but I don't actually know if very long tokens could be possible.

Comment thread tests/unittests/CMakeLists.txt
Comment thread tests/unittests/KillSwitchManagerTests.cpp
Comment thread tests/unittests/OfflineStorageTests_SQLite.cpp
Comment thread lib/offline/KillSwitchManager.hpp

@lalitb lalitb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work. I left a few non-blocking comments, but I think it would be good to confirm the tenant token length limit before merging.

bmehta001 and others added 2 commits June 10, 2026 16:23
…chManagerTests on Android/iOS

Addresses lalitb's review suggestions on microsoft#1467:
- KillSwitchManagerTests: add huge-value (clamped, not wrapped) and out-of-range
  kill-duration cases, exercising the addToken() expiry path that kill-duration
  takes (mirrors the existing Retry-After coverage).
- OfflineStorageTests_SQLite: add fail-closed DeleteRecords regression tests -
  an empty filter (no predicate -> deletes nothing) and an invalid numeric value
  ("0 OR 1=1" for retry_count -> rejected, not coerced to 0 and matched).
- Register KillSwitchManagerTests.cpp in the Android native test list
  (lib/android_build/app/src/main/cpp/CMakeLists.txt) and the iOS unittests
  xcodeproj, so the platform-independent kill-switch tests run on those targets too.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Comment thread tests/unittests/KillSwitchManagerTests.cpp
Comment thread tests/unittests/OfflineStorageTests_SQLite.cpp
- KillSwitchManagerTests: add handleResponse_KillTokenWithAllSuffix_BlocksBaseToken,
  covering the ':all' suffix stripping (kill-tokens "<tenant>:all" -> blocks <tenant>).
- OfflineStorageTests_SQLite: add DeleteRecordsByNumericColumnDeletesOnlyMatching
  (positive int64-bind path via latency) and DeleteRecordsMultiColumnFilterDeletesOnlyRowsMatchingAll
  (AND semantics across two whitelisted columns).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment on lines 65 to 69
if (pos != std::string::npos)
{
// Strip suffix and assume ':all' events of that tenant are killed
token.erase(pos, token.length() - pos);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants