fix(disk buffer): recover from decode errors during initialization seek#25691
fix(disk buffer): recover from decode errors during initialization seek#25691apurvanisal5 wants to merge 1 commit into
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02dc8b9b1b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| self, | ||
| ReaderError::Checksum { .. } | ||
| | ReaderError::Deserialization { .. } | ||
| | ReaderError::Decode { .. } |
There was a problem hiding this comment.
Keep decode failures from rolling the data file
When a record's payload fails to decode but its length, archive, and checksum are valid, only that record has been consumed and the reader can continue to the next length-delimited record while gap accounting handles the missing ID. Adding Decode to is_bad_read() also affects the normal BufferReader::next() path, which calls roll_to_next_data_file() for any bad read; a single undecodable record in the middle of the current file therefore abandons later valid, unacknowledged records in that same file (the new initialization test creates this layout but only checks that reopen succeeds). This turns the startup recovery into avoidable event loss or a stalled reader for buffers with mixed bad/good records in one data file.
Useful? React with 👍 / 👎.
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
Add ReaderError::Decode to is_bad_read() so seek_to_next_record during buffer startup skips decode failures like checksum and partial_write errors, instead of failing topology build with InvalidProtobufPayload.
02dc8b9 to
c7d57e2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7d57e284f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| self, | ||
| ReaderError::Checksum { .. } | ||
| | ReaderError::Deserialization { .. } | ||
| | ReaderError::Decode { .. } |
There was a problem hiding this comment.
Don't consume the next record after a seek-time decode error
When a decode error occurs during seek_to_next_record, next() has already consumed that framed record but returns before track_read advances last_reader_record_id (the decode happens at reader.read_record(token)?). Classifying Decode as a bad read makes the seek loop swallow the error and call next() again; with the new test's layout (bad record #1, good record #2, ledger_last = 1), the loop then reads and drops record #2 during initialization even though it was the first unacknowledged record, so the buffer reopens with user data silently skipped.
Useful? React with 👍 / 👎.
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
Problem
When a disk buffer (v2) contains a record that fails protobuf decode on restart,
seek_to_next_record()during buffer initialization returns an error immediatelybecause
ReaderError::Decodewas not classified as a "bad read".Customer-visible symptom:
error occurred when building buffer failed to seek to position where reader left off failed to decoded record: InvalidProtobufPayload
This can cause collectors to CrashLoop until the buffer directory is manually deleted.
Solution
Add
ReaderError::Decodetois_bad_read()so decode failures duringinitialization seek follow the same recovery path as checksum, deserialization,
and partial_write errors.
Test
reader_recovers_from_decode_error_during_initialization_seekcargo test -p vector-buffers disk_v2— all passReferences