Skip to content

fix: don't force redownload chunks that have already completed or are still in progress#830

Merged
tchaton merged 4 commits into
Lightning-AI:mainfrom
owbone:main
May 27, 2026
Merged

fix: don't force redownload chunks that have already completed or are still in progress#830
tchaton merged 4 commits into
Lightning-AI:mainfrom
owbone:main

Conversation

@owbone

@owbone owbone commented May 27, 2026

Copy link
Copy Markdown
Collaborator
Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

This fixes two separate issues:

  1. force_download can delete a chunk that had successfully finished downloading. This can happen because force_download unconditionally deletes the chunk on disk, even if the chunk had finished downloading in between when force_download was enqueued and when it ran. This is fixed in e0f46b4 by double-checking the downloaded filesize before deleting it.

  2. force_download can be queued multiple times across different workers, so we can end up deleting and attempting to redownload a chunk multiple times, even if a download is already in progress. This is fixed in b36f21b by attempting to acquire the chunk .lock file first, which is otherwise only held by the downloader, meaning that force_download becomes a no-op if the chunk is currently being downloaded.

@codecov-commenter

codecov-commenter commented May 27, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 68.42105% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 81%. Comparing base (5213544) to head (cd05579).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #830   +/-   ##
===================================
  Coverage    81%    81%           
===================================
  Files        54     54           
  Lines      7617   7628   +11     
===================================
+ Hits       6144   6159   +15     
+ Misses     1473   1469    -4     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes two race conditions in PrepareChunksThread._force_download() that could destroy completed chunks or cause redundant deletions/redownloads when multiple workers queue force-downloads for the same chunk.

Changes:

  • In _force_download(), attempt FileLock(<chunk>.lock, timeout=0) first; if held by an active downloader, defer; otherwise double-check file size before deleting and redownloading.
  • Add ChunksConfig.download_filepath() to expose the raw (pre-decompression) on-disk chunk path that matches the downloader's lock path.
  • Add two unit tests covering the "already complete" and "lock held by another worker" cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/litdata/streaming/reader.py Restructures _force_download to acquire the downloader lock, recheck size, then delete/redownload; defers on Timeout.
src/litdata/streaming/config.py Adds download_filepath() helper returning the raw (compressed) chunk path used as lock basename.
tests/streaming/test_reader.py Adds tests for skipping complete chunks and deferring when the download lock is held.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/litdata/streaming/reader.py
@tchaton tchaton merged commit 4b8c43d into Lightning-AI:main May 27, 2026
70 of 77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants