Skip to content

Releases: Lightning-AI/litData

v0.2.63

Choose a tag to compare

@owbone owbone released this 28 May 11:31
86a79c9

What's Changed

  • chore(deps): update coverage requirement from ==7.12.* to ==7.13.* by @dependabot[bot] in #788
  • fix(cli): fix missing __init__.py in cli/handler package by @nishantb06 in #799
  • chore(deps): bump actions/download-artifact from 7 to 8 in the gha-updates group by @dependabot[bot] in #796
  • feat: add support for video deserialization with torchcodec when torchvision>0.25 by @deependujha in #802
  • chore(deps): update pytest requirement from ==8.4.* to ==9.0.* by @dependabot[bot] in #795
  • fix: mypy errors by @deependujha in #806
  • chore(deps): bump lightning-sdk from 2025.12.17 to 2026.2.6 by @dependabot[bot] in #794
  • chore(deps): bump cryptography from 45.0.7 to 46.0.6 by @dependabot[bot] in #804
  • chore(deps): bump pytest-cov from 7.0.0 to 7.1.0 by @dependabot[bot] in #805
  • Fix circular import by @vini-fda in #812
  • chore(deps): bump the gha-updates group across 1 directory with 3 updates by @dependabot[bot] in #808
  • fix race condition by defaulting to local mount prior to R2 fetch by @lianakoleva in #821
  • chore: bump to 0.2.62 for release by @lianakoleva in #822
  • fix: don't force redownload chunks that have already completed or are still in progress by @owbone in #830
  • Fix row index miscalculation in ParquetLoader by @vini-fda in #810
  • Bump version to 0.2.63 by @owbone in #831

New Contributors

Full Changelog: v0.2.61...v0.2.63

LitData v0.2.61

Choose a tag to compare

@dhedey dhedey released this 20 Feb 10:42
83fae9a

Lightning AI ⚑ is excited to announce the release of LitData v0.2.61

Highlights

Fixes regression in 0.2.60 writing to Lightning Storage (#791)

What's Changed

  • chore(deps): bump mosaicml-streaming from 0.11.0 to 0.13.0 by @dependabot[bot] in #789
  • fix: Ensure uint64 fields are handled correctly in _create_dataset by @dhedey in #791
  • fix: Fixes various file/lock delete failures on windows to allow us to unpin lockfile by @dhedey in #792
  • chore: Bump to 0.2.61 for release by @dhedey in #793

New Contributors

Full Changelog: v0.2.60...v0.2.61

Weekly Release 0.2.60

Choose a tag to compare

@tchaton tchaton released this 28 Jan 14:26
016caed

What's Changed

  • fixed r2 refetch interval by @vlad-heidi in #777
  • Fix StreamingDataset len after drop_last update by @MagellaX in #778
  • chore(deps): update sphinx requirement from <7.0,>=6.0 to >=6.0,<9.0 by @dependabot[bot] in #763
  • chore(deps): bump pytest-rerunfailures from 16.0.1 to 16.1 by @dependabot[bot] in #764
  • chore(deps): bump the gha-updates group across 1 directory with 3 updates by @dependabot[bot] in #774
  • [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #779
  • fix: lint errors (UP007, UP045, UP006 & UP035) by @bhimrazy in #754
  • chore(deps): update coverage requirement from ==7.10.* to ==7.12.* by @dependabot[bot] in #762
  • chore: add & simplify concurrency setting to CI testing workflow by @bhimrazy in #780
  • Fix ParallelStreamingDataset with resume=True not resuming after loading a state dict when breaking early by @philgzl in #771
  • Bump SDK by @tchaton in #783
  • chore(deps): bump JamesIves/github-pages-deploy-action from 4.7.6 to 4.8.0 in the gha-updates group by @dependabot[bot] in #782
  • feat(litdata): Better support for filestore & co by @tchaton in #785
  • chore(litdata): Pre-release version bump 0.2.60 by @tchaton in #786

New Contributors

Full Changelog: v0.2.59...v0.2.60

LitData v0.2.59

Choose a tag to compare

@pwgardipee pwgardipee released this 13 Dec 00:24
5913181

Lightning AI ⚑ is excited to announce the release of LitData v0.2.59

Changes

Added
  • add CHANGELOG.md to track project updates by @deependujha in #733
  • feat: add support to disable external version checks by @sanggusti in #737
  • feat: Add Python 3.14 zstd builtin support by @bhimrazy in #749
  • feat: add align_chunking option to preserve deterministic chunk boundaries across workers by @deependujha in #768
Changed
  • pin: torchaudio to >=2.7.0,<2.9 by @deependujha in #738
  • ref(test): remove torchaudio dependency and update audio processing to just use soundfile by @bhimrazy in #739
Fixed
  • fix(ci): failing link checks by @bhimrazy in #748
  • fix : ZstdError handling for Python <3.14 & >=3.14 compatibility by @bhimrazy in #767
  • Fix ParallelStreamingDataset with resume=True not resuming after the second epoch when breaking early by @philgzl in #761
Chores
  • chore(deps): update transformers requirement from <4.53.0 to <4.57.0 by @dependabot[bot] in #723
  • chore(deps): bump lightning-sdk from 2025.8.1 to 2025.9.30 by @dependabot[bot] in #724
  • chore(deps): bump pytest-cov from 6.2.1 to 7.0.0 by @dependabot[bot] in #725
  • chore(deps): bump astral-sh/setup-uv from 6 to 7 in the gha-updates group by @dependabot[bot] in #735
  • chore(deps): update transformers requirement from <4.57.0 to <4.58.0 by @dependabot[bot] in #746
  • chore(deps): bump pytest-rerunfailures from 15.1 to 16.0.1 by @dependabot[bot] in #745
  • chore(deps): bump actions/download-artifact from 5 to 6 in the gha-updates group by @dependabot[bot] in #741
  • docs: add anchor links to feature sections in README for easy referencing by @VijayVignesh1 in #743
  • chore(ci): add Python 3.14 to the testing matrix by @bhimrazy in #747
  • chore: drop support for Python 3.9 (EOL) by @bhimrazy in #751
  • chore(deps): bump JamesIves/github-pages-deploy-action from 4.7.3 to 4.7.4 in the gha-updates group by @dependabot[bot] in #750

Full Changelog: v0.2.58...v0.2.59

πŸ§‘β€πŸ’» Contributors

We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make LitData better for everyone, nice job!

New Contributors

Thank you ❀️ and we hope you'll keep them coming!

Release 0.2.58

Choose a tag to compare

@tchaton tchaton released this 07 Oct 12:17
dad316e

What's Changed

Full Changelog: v0.2.57...v0.2.58

Release 0.2.57

Choose a tag to compare

@tchaton tchaton released this 06 Oct 20:31
695a314

What's Changed

Full Changelog: v0.2.56...v0.2.57

v0.2.56

Choose a tag to compare

@tchaton tchaton released this 23 Sep 02:49
df92bf8

What's Changed

New Contributors

Full Changelog: v0.2.55...v0.2.56

LitData v0.2.55

Choose a tag to compare

@pwgardipee pwgardipee released this 19 Sep 15:47
f990376

Lightning AI ⚑ is excited to announce the release of LitData v0.2.55

Highlights

[Fixed] Writing compressed data to a lighting_storage folder

This release focuses on fixing errors when writing compressed output data to a lightning_storage folder. Previously, a code snippet like the following would break.

from litdata import StreamingDataset, StreamingDataLoader, optimize
import time

def should_keep(data):
    if data % 2 == 0:
        yield data


if __name__ == "__main__":
    output_dir = "/teamspace/lightning_storage/my-folder-1/output"
    optimize(
        fn=should_keep,
        inputs=list(range(500)),
        output_dir=output_dir,
        chunk_bytes="64MB",
        num_workers=4,
        compression="zstd", # Previously, this would cause an error
    )
    time.sleep(20) 
    dataset = StreamingDataset(output_dir)
    dataloader = StreamingDataLoader(dataset, batch_size=32, num_workers=4)
    for _ in dataloader:
        # process code here
        pass

Changes

Fixed
  • Fix errors when using compression and r2 in optimize() by @pwgardipee in #715
Changed
Chores
  • chore(ci): Add step to minimize uv cache in CI workflow by @bhimrazy in #713

Full Changelog: v0.2.54...v0.2.55

πŸ§‘β€πŸ’» Contributors

We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make LitData better for everyone, nice job!

Key Contributors

@pwgardipee @bhimrazy

Thank you ❀️ and we hope you'll keep them coming!

LitData v0.2.54

Choose a tag to compare

@pwgardipee pwgardipee released this 10 Sep 14:02
b50d428

Lightning AI ⚑ is excited to announce the release of LitData v0.2.54

Highlights

Lightning AI Storage - Direct download

Lightning Studios have special directories for data connections that are available to an entire teamspace. LitData functions that reference those directories will experience a significant performance increase as uploads and downloads will happen directly from the bucket that backs the folder. LitData has supported existing folder types like S3 and GCS folders, and this release introduces support for lightning_storage folders which were recently launched.

For example, data will be downloaded directly from the my-data-1 Lightning Storage bucket in this example code.

from litdata import StreamingDataset

if __name__ == "__main__":
    data_dir = "/teamspace/lightning_storage/my-bucket-1/data"

    dataset = StreamingDataset(data_dir)

    for sample in dataset:
    	print(sample)

References to any of the following directories will work similarly:

  1. /teamspace/lightning_storage/...
  2. /teamspace/s3_connections/...
  3. /teamspace/gcs_connections/...
  4. /teamspace/s3_folders/...
  5. /teamspace/gcs_folders/...

Changes

Added
Changed

Full Changelog: v0.2.53...v0.2.54

πŸ§‘β€πŸ’» Contributors

We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make LitData better for everyone, nice job!

Key Contributors

@pwgardipee

Thank you ❀️ and we hope you'll keep them coming!

LitData v0.2.53

Choose a tag to compare

@pwgardipee pwgardipee released this 09 Sep 14:42
8a8e651

Lightning AI ⚑ is excited to announce the release of LitData v0.2.53

Highlights

Lightning AI Storage - Direct download and upload

Lightning Studios have special directories for data connections that are available to an entire teamspace. LitData functions that reference those directories will experience a significant performance increase as uploads and downloads will happen directly from the bucket that backs the folder. LitData has supported existing folder types like S3 and GCS folders, and this release introduces support for lightning_storage folders which were recently launched.

For example, output artifacts from this code will be directly uploaded to the my-data-1 Lighting Storage bucket.

from litdata import optimize

def should_keep(data):
    if data % 2 == 0:
        yield data

if __name__ == "__main__":
    optimize(
        fn=should_keep,
        inputs=list(range(1000)),
        output_dir="/teamspace/lightning_storage/my-data-1/output",
        chunk_bytes="64MB",
        num_workers=1
    )

Similarly, data will be downloaded directly from the my-data-1 Lightning Storage bucket in this example code.

from litdata import StreamingRawDataset

if __name__ == "__main__":
    data_dir = "/teamspace/lightning_storage/my-bucket-1/data"

    raw_dataset = StreamingRawDataset(data_dir)

    data = list(raw_dataset)
    print(data)

References to any of the following directories will work similarly:

  1. /teamspace/lightning_storage/...
  2. /teamspace/s3_connections/...
  3. /teamspace/gcs_connections/...
  4. /teamspace/s3_folders/...
  5. /teamspace/gcs_folders/...

Changes

Added
  • Add support for resolving directories in /teamspace/lightning_storage by @bhimrazy in #695
  • Add support for direct upload to r2 buckets by @pwgardipee in #705
  • Add readme docs for references to data connection dirs by @pwgardipee in #708
Changed
  • Remove unnecessary fixed sleep by adding predicate-based path check by @Red-Eyed in #700
  • ref(resolver): Refactors data connection resolution by adding a helper function and eliminating code duplication. by @bhimrazy in #706
Chores
  • chore(deps): bump actions/first-interaction from 2 to 3 in the gha-updates group by @dependabot[bot] in #693
  • chore(deps): update coverage requirement from ==7.8.* to ==7.10.* by @dependabot[bot] in #701
  • chore(deps): bump pytest-random-order from 1.1.1 to 1.2.0 by @dependabot[bot] in #703
  • chore(deps): bump cryptography from 45.0.4 to 45.0.7 by @dependabot[bot] in #704
  • chore(deps): bump the gha-updates group with 3 updates by @dependabot[bot] in #707

Full Changelog: v0.2.52...v0.2.53

πŸ§‘β€πŸ’» Contributors

We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make LitData better for everyone, nice job!

Key Contributors

@bhimrazy, @pwgardipee

New Contributors

Thank you ❀️ and we hope you'll keep them coming!