fix: don't force redownload chunks that have already completed or are still in progress#830
Conversation
for more information, see https://pre-commit.ci
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #830 +/- ##
===================================
Coverage 81% 81%
===================================
Files 54 54
Lines 7617 7628 +11
===================================
+ Hits 6144 6159 +15
+ Misses 1473 1469 -4 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes two race conditions in PrepareChunksThread._force_download() that could destroy completed chunks or cause redundant deletions/redownloads when multiple workers queue force-downloads for the same chunk.
Changes:
- In
_force_download(), attemptFileLock(<chunk>.lock, timeout=0)first; if held by an active downloader, defer; otherwise double-check file size before deleting and redownloading. - Add
ChunksConfig.download_filepath()to expose the raw (pre-decompression) on-disk chunk path that matches the downloader's lock path. - Add two unit tests covering the "already complete" and "lock held by another worker" cases.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/litdata/streaming/reader.py | Restructures _force_download to acquire the downloader lock, recheck size, then delete/redownload; defers on Timeout. |
| src/litdata/streaming/config.py | Adds download_filepath() helper returning the raw (compressed) chunk path used as lock basename. |
| tests/streaming/test_reader.py | Adds tests for skipping complete chunks and deferring when the download lock is held. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Before submitting
What does this PR do?
This fixes two separate issues:
force_downloadcan delete a chunk that had successfully finished downloading. This can happen becauseforce_downloadunconditionally deletes the chunk on disk, even if the chunk had finished downloading in between whenforce_downloadwas enqueued and when it ran. This is fixed in e0f46b4 by double-checking the downloaded filesize before deleting it.force_downloadcan be queued multiple times across different workers, so we can end up deleting and attempting to redownload a chunk multiple times, even if a download is already in progress. This is fixed in b36f21b by attempting to acquire the chunk.lockfile first, which is otherwise only held by the downloader, meaning thatforce_downloadbecomes a no-op if the chunk is currently being downloaded.