Parameterize offline-storage deletes and harden kill-switch response handling#1467
Parameterize offline-storage deletes and harden kill-switch response handling#1467bmehta001 wants to merge 19 commits into
Conversation
Build OfflineStorage_SQLite::DeleteRecords as a parameterized prepared statement: column names come from a fixed whitelist and values are bound via sqlite3_bind_* on a single statement instead of being concatenated into the SQL text. This is more robust and avoids quoting/escaping edge cases for values such as tenant tokens. DeleteRecords also no longer issues a DELETE with an empty predicate. Validate kill-token values against the expected tenant-token character set before storing them, and add OfflineStorageTests_SQLite regression tests covering deletion by tenant_token (including values with unusual characters). Validated: full OfflineStorageTests_SQLite suite (27 tests) passes on Windows x64 Debug, including the new tests. Files changed: - lib/offline/OfflineStorage_SQLite.cpp - lib/offline/KillSwitchManager.hpp - tests/unittests/OfflineStorageTests_SQLite.cpp Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Retry-After and kill-duration response headers were parsed with std::stoi without guarding against non-numeric or out-of-range input. Because the SDK worker thread executes queued tasks without an exception guard, an unparseable header value (for example the standards-compliant HTTP-date form of Retry-After permitted by RFC 7231) could let the exception escape the worker thread and terminate the process. Route both parses through a non-throwing helper that ignores values it cannot parse. Add KillSwitchManager unit tests covering valid input plus malformed Retry-After / kill-duration values and a rejected malformed kill-token. Files changed: - lib/offline/KillSwitchManager.hpp - tests/unittests/KillSwitchManagerTests.cpp (new) - tests/unittests/CMakeLists.txt - tests/unittests/UnitTests.vcxproj - tests/unittests/UnitTests.vcxproj.filters Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens offline-storage deletion and kill-switch handling by switching OfflineStorage_SQLite::DeleteRecords to a parameterized SQLite prepared statement, avoiding concatenation of untrusted values into SQL and preventing accidental table-wide deletes.
Changes:
- Reworks
OfflineStorage_SQLite::DeleteRecordsto use a whitelisted set of column names andsqlite3_bind_*value binding, and to bail out when no valid predicate is provided. - Makes
KillSwitchManager::handleResponseresilient to malformedRetry-After/kill-durationvalues and rejects malformed kill-token tenant IDs. - Adds unit tests covering deletion by
tenant_token(including SQL-injection-shaped input) and newKillSwitchManagerheader parsing behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unittests/UnitTests.vcxproj.filters | Adds the new KillSwitchManager unit test source to VS project filters. |
| tests/unittests/UnitTests.vcxproj | Adds the new KillSwitchManager unit test source to the unit test project build. |
| tests/unittests/OfflineStorageTests_SQLite.cpp | Adds regression tests validating tenant-token deletes and SQL-injection-shaped input handling. |
| tests/unittests/KillSwitchManagerTests.cpp | Introduces coverage for Retry-After, kill-token, and kill-duration parsing/validation. |
| tests/unittests/CMakeLists.txt | Adds KillSwitchManager tests to the CMake unit test target sources. |
| lib/offline/OfflineStorage_SQLite.cpp | Implements parameterized DELETE statement creation/binding and avoids empty-predicate deletes. |
| lib/offline/KillSwitchManager.hpp | Adds safe duration parsing helper and tenant-token character validation before storing kill tokens. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Describe what the token validation and response-header parsing do without spelling out threat-model detail. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Log the rejected column name when DeleteRecords ignores an unrecognized filter column, to aid diagnosing filter-key typos. - Check sqlite3_bind_* return codes and abort the delete (logging sqlite3_errmsg) instead of executing with unbound parameters. - Treat a non-numeric value for an integer filter column as an invalid filter and abort, rather than coercing to 0 and deleting rows that happen to match 0. - Check the result of SqliteStatement::execute() and log on failure so a failed DELETE is not silent. - KillSwitchManager::tryParseSeconds sets outSeconds=0 on parse failure so callers cannot read a stale value. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Self-review follow-up: MemoryStorage's matcher treats an unknown filter column as "no match" and deletes nothing, but the SQLite path dropped the unknown column and ran the remaining predicate. That could delete more rows than intended and diverge from MemoryStorage even though OfflineStorageHandler sends the same filter to both storages. Abort the delete (remove nothing) when any filter column is outside the whitelist, and add a regression test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tryParseSeconds used std::stoll without checking how many characters were consumed, so a numeric prefix followed by garbage (e.g. "120; foo") was accepted as 120 - contradicting the "ignore malformed values" contract and potentially activating Retry-After / kill-duration from a malformed header. Validate full consumption (allowing only trailing whitespace) and add regression tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The header uses std::vector (handleResponse) and std::list (getTokensList) but relied on transitive includes. Include them directly so the header is self-contained and not sensitive to include order. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
isValidTenantToken enforced an alnum + [-_.] allowlist, which would silently drop legitimate kill-tokens: tenant tokens are opaque elsewhere in the SDK (they may contain spaces/quotes) and the offline-storage DELETE is already parameterized, so any value is safe to act on. Over-restricting meant a stored tenant token could never be killed if the collector returned it verbatim. Relax the validator to reject only genuinely unsafe content -- control characters (including CR/LF, to avoid log injection) and over-long values -- while accepting any other byte. Update tests: a control-char token is rejected; an opaque token with quotes/spaces remains killable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot (KillSwitchManager.hpp:192): tryParseSeconds tolerated any
std::isspace() whitespace (incl. CR/LF, form-feed, vertical-tab) after the
number, which is broader than HTTP OWS (SP / HTAB per RFC 7230). A value like
"120\r\n" was accepted as 120, contradicting the intent to reject malformed
header values.
-> Replaced std::isspace with an explicit SP/HTAB check, so CR/LF and other
control whitespace are treated as malformed and the header is ignored.
Removed the now-unused <cctype> include. Added regression test
handleResponse_RetryAfterWithTrailingCRLF_IsIgnored. The existing
TrailingWhitespace test already used "120 " (spaces), so it still passes.
Verified at lib/offline/KillSwitchManager.hpp:182-191; logic confirmed with an
isolated compile (10/10 cases: "120 "->120, "120\t"->120, "120\r\n"->rejected).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot (KillSwitchManager.hpp:181): std::stoll skips leading C-locale
whitespace (incl. CR/LF) and accepts a leading '+'/'-' sign, so "+120" and
"\r\n120" parsed even though trailing CR/LF was rejected -- an asymmetric,
non-RFC7231 acceptance.
-> Rewrote tryParseSeconds to trim only HTTP OWS (SP / HTAB) from both ends and
require the remainder to be 1*DIGIT before calling std::stoll, so leading and
trailing handling are symmetric and signs / CR-LF / garbage are all rejected.
Added tests for leading OWS (accepted) and leading sign (ignored). Validated
all edge cases via isolated compile (15/15: " 120"->120, "+120"->rejected,
"\r\n120"->rejected, overflow->caught, no crash).
Verified at lib/offline/KillSwitchManager.hpp.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot round 3 on b7abaee: - KillSwitchManager.hpp:49 / :87 (overflow): getUtcSystemTime() + timeinSecs could overflow (signed-overflow UB) for a huge but in-range Retry-After / kill-duration, wrapping the computed expiry time. Both values flow through tryParseSeconds, so clamp its result to ~100 years (well below INT64_MAX and above any legitimate value); the timestamp addition can no longer overflow. Added test handleResponse_HugeRetryAfter_IsClampedNotWrapped. - OfflineStorage_SQLite.cpp:459 (null handle): added an explicit stmt.handle()==nullptr guard before binding. sqlite3_bind_* on a null stmt already returns SQLITE_MISUSE (caught by the existing rc check), so this was not a crash; the guard makes the intent explicit and logs a clearer "failed to prepare" error. Verified SqliteStatement::handle() returns m_stmt (SQLiteWrapper.hpp:806). Clamp validated via isolated compile (now + clamped no longer overflows). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot (OfflineStorage_SQLite.cpp:462): the prepare-failure log added last round omitted the underlying SQLite error text, unlike the bind/execute error paths in the same function. Append g_sqlite3Proxy->sqlite3_errmsg(*m_db) so a prepare failure (OOM, schema issues) is as diagnosable as the others. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The comments justified accepting opaque tokens via the parameterized DELETE but
overstated ("any value is safe") and attributed the control-char rejection to
log injection, even though the kill-token value is not logged in this path.
Reword to the verified reasoning: every token sink (parameterized SQLite bind /
Room DAO DELETE, in-memory string compare) handles raw bytes safely, so opaque
tokens must remain killable; control characters and over-long values are rejected
as defensive hygiene -- notably an embedded NUL would be truncated by the Room
NewStringUTF(c_str()) call and act on the wrong token.
Comment-only change; no behavior or test impact.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…te form Expand the tryParseSeconds comment to explain the deliberate choice to accept only delay-seconds and ignore the (RFC-legal) HTTP-date form of Retry-After: the target collector sends delay-seconds; an absolute date would have to be evaluated against an untrusted device clock (the SDK has a separate clock-skew channel); a three-format date parser on the no-exception-guard worker thread adds risk for no practical benefit; and ignoring an unparseable value degrades safely to the SDK's own retry back-off. Comment-only change; no behavior or test impact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| // token), and it keeps the value safe if it ever reaches a log/display sink. | ||
| static bool isValidTenantToken(const std::string& token) | ||
| { | ||
| if (token.empty() || token.size() > 256) |
There was a problem hiding this comment.
Do we have an authoritative 256-byte max for tenant tokens? I don't see the same limit enforced when apps create loggers, so this could make a valid stored tenant impossible to kill if its token is longer than 256 bytes.
There was a problem hiding this comment.
I have messaged the Collector team to ask - I was just keeping a large upper bound bc my token is 74 bytes, but I don't actually know if very long tokens could be possible.
lalitb
left a comment
There was a problem hiding this comment.
LGTM. Nice work. I left a few non-blocking comments, but I think it would be good to confirm the tenant token length limit before merging.
…e-parameterized-deletes
…chManagerTests on Android/iOS Addresses lalitb's review suggestions on microsoft#1467: - KillSwitchManagerTests: add huge-value (clamped, not wrapped) and out-of-range kill-duration cases, exercising the addToken() expiry path that kill-duration takes (mirrors the existing Retry-After coverage). - OfflineStorageTests_SQLite: add fail-closed DeleteRecords regression tests - an empty filter (no predicate -> deletes nothing) and an invalid numeric value ("0 OR 1=1" for retry_count -> rejected, not coerced to 0 and matched). - Register KillSwitchManagerTests.cpp in the Android native test list (lib/android_build/app/src/main/cpp/CMakeLists.txt) and the iOS unittests xcodeproj, so the platform-independent kill-switch tests run on those targets too. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- KillSwitchManagerTests: add handleResponse_KillTokenWithAllSuffix_BlocksBaseToken, covering the ':all' suffix stripping (kill-tokens "<tenant>:all" -> blocks <tenant>). - OfflineStorageTests_SQLite: add DeleteRecordsByNumericColumnDeletesOnlyMatching (positive int64-bind path via latency) and DeleteRecordsMultiColumnFilterDeletesOnlyRowsMatchingAll (AND semantics across two whitelisted columns). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| if (pos != std::string::npos) | ||
| { | ||
| // Strip suffix and assume ':all' events of that tenant are killed | ||
| token.erase(pos, token.length() - pos); | ||
| } |
Builds
OfflineStorage_SQLite::DeleteRecordsas a parameterized prepared statement and tightens kill-switch response handling.Changes
OfflineStorage_SQLite::DeleteRecordsnow builds a parameterized statement: column names are taken from a fixed whitelist and values are bound viasqlite3_bind_*on a single prepared statement instead of being concatenated into the SQL text. This is more robust and avoids quoting/escaping edge cases for values such as tenant tokens. Prepare/bind/execute results are checked and logged (withsqlite3_errmsg) on failure, and a non-numeric value for an integer filter column aborts the delete instead of coercing to0.DeleteRecordsno longer issues aDELETEwith an empty predicate, and fails closed (deletes nothing) on an unrecognized filter column.KillSwitchManager::handleResponserejects kill-token values containing control characters (and over-long values) before storing them, while keeping opaque tokens — which may legitimately contain spaces/quotes — killable.KillSwitchManager::handleResponseparses theRetry-Afterandkill-durationresponse headers through a non-throwing helper that accepts only an RFC 72311*DIGITvalue surrounded by HTTP OWS (SP/HTAB) and clamps the result to a safe maximum, so a malformed value (the HTTP-date form ofRetry-After, a leading sign, stray CR/LF, or an overflow-inducing magnitude) is ignored rather than propagating out of the SDK worker thread or wrapping the computed expiry time.OfflineStorageTests_SQLiteregression tests for the parameterizedDeleteRecords: deletion bytenant_token(including unusual characters and a SQL-injection attempt), deletion by a numeric column, multi-columnANDfilters, and fail-closed behavior on an empty filter, an invalid numeric value, and an unrecognized column.KillSwitchManagerTestscovering valid input plus malformedRetry-After/kill-durationvalues (non-numeric, HTTP-date, out-of-range, trailing/leading garbage, leading sign, huge/clamped), control-character vs opaque kill-tokens, and:allsuffix stripping. These tests now also build and run on the Android and iOS native test targets.Validation
KillSwitchManagerTests— 17 tests (the full new suite): validRetry-After/kill-token input; malformedRetry-After/kill-duration(non-numeric, HTTP-date, out-of-range, trailing garbage, trailing/leading OWS, trailing CR/LF, leading sign, huge-clamped); control-character vs opaque kill-tokens; and:allsuffix stripping. Now built/run on the host unit tests, Android, and iOS.OfflineStorageTests_SQLite— 32 tests total, of which 7 exercise the parameterizedDeleteRecords: SQL-injection attempt, delete-by-tenant_token(incl. unusual chars), delete-by-numeric-column, multi-columnAND, empty filter, invalid numeric filter, and fail-closed on an unrecognized column.