Skip to content

ref(resolver): Refactors data connection resolution by adding a helper function and eliminating code duplication.#706

Merged
tchaton merged 2 commits into
Lightning-AI:mainfrom
bhimrazy:ref/resolver-data-connection
Sep 8, 2025
Merged

ref(resolver): Refactors data connection resolution by adding a helper function and eliminating code duplication.#706
tchaton merged 2 commits into
Lightning-AI:mainfrom
bhimrazy:ref/resolver-data-connection

Conversation

@bhimrazy

@bhimrazy bhimrazy commented Sep 5, 2025

Copy link
Copy Markdown
Collaborator

What does this PR do?

Refactors data connection resolution by adding a helper function and eliminating code duplication.

idea from: #705 (comment)

What Changed

  • Added _resolve_data_connection() helper with @lru_cache(maxsize=5)
  • Refactored 5 resolver functions to use shared helper
  • Eliminated ~50 lines of duplicated code
  • Added comprehensive tests
  • Improved documentation

Review Notes

  • Consider whether we want caching for data connections at all (may want to remove @lru_cache if not needed)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

- Add _resolve_data_connection() with @lru_cache(maxsize=5)
- Refactor 5 resolver functions to use shared helper
- Eliminate ~50 lines of duplicated code
- Improve docstring with better documentation
- Add comprehensive test for _resolve_data_connection helper
- Use different connection names to avoid cache conflicts
@bhimrazy bhimrazy self-assigned this Sep 5, 2025
@bhimrazy bhimrazy added the enhancement New feature or request label Sep 5, 2025
@gitguardian

gitguardian Bot commented Sep 5, 2025

Copy link
Copy Markdown

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
5685611 Triggered Generic High Entropy Secret 24089e9 tests/streaming/test_resolver.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors data connection resolution by introducing a shared helper function to eliminate code duplication across multiple resolver functions. The change consolidates ~50 lines of duplicated code into a single cached helper function.

  • Adds _resolve_data_connection() helper function with LRU caching
  • Refactors 5 resolver functions to use the shared helper instead of duplicating connection resolution logic
  • Adds comprehensive test coverage for the new helper function and cache behavior

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/litdata/streaming/resolver.py Adds cached _resolve_data_connection() helper and refactors 5 resolver functions to use it
tests/streaming/test_resolver.py Adds comprehensive tests for the new helper function and updates existing tests with cache clearing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread src/litdata/streaming/resolver.py
Comment thread src/litdata/streaming/resolver.py
@codecov

codecov Bot commented Sep 5, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.30769% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84%. Comparing base (76d3bee) to head (24089e9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #706   +/-   ##
===================================
- Coverage    84%    84%   -0%     
===================================
  Files        52     52           
  Lines      7103   7078   -25     
===================================
- Hits       5980   5957   -23     
+ Misses     1123   1121    -2     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tchaton tchaton merged commit 5ce2ee3 into Lightning-AI:main Sep 8, 2025
36 checks passed
@bhimrazy bhimrazy deleted the ref/resolver-data-connection branch September 8, 2025 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants