ref(resolver): Refactors data connection resolution by adding a helper function and eliminating code duplication.#706
Conversation
- Add _resolve_data_connection() with @lru_cache(maxsize=5) - Refactor 5 resolver functions to use shared helper - Eliminate ~50 lines of duplicated code - Improve docstring with better documentation
- Add comprehensive test for _resolve_data_connection helper - Use different connection names to avoid cache conflicts
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 5685611 | Triggered | Generic High Entropy Secret | 24089e9 | tests/streaming/test_resolver.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Pull Request Overview
This PR refactors data connection resolution by introducing a shared helper function to eliminate code duplication across multiple resolver functions. The change consolidates ~50 lines of duplicated code into a single cached helper function.
- Adds
_resolve_data_connection()helper function with LRU caching - Refactors 5 resolver functions to use the shared helper instead of duplicating connection resolution logic
- Adds comprehensive test coverage for the new helper function and cache behavior
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/litdata/streaming/resolver.py | Adds cached _resolve_data_connection() helper and refactors 5 resolver functions to use it |
| tests/streaming/test_resolver.py | Adds comprehensive tests for the new helper function and updates existing tests with cache clearing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #706 +/- ##
===================================
- Coverage 84% 84% -0%
===================================
Files 52 52
Lines 7103 7078 -25
===================================
- Hits 5980 5957 -23
+ Misses 1123 1121 -2 🚀 New features to boost your workflow:
|
What does this PR do?
Refactors data connection resolution by adding a helper function and eliminating code duplication.
What Changed
_resolve_data_connection()helper with@lru_cache(maxsize=5)Review Notes
@lru_cacheif not needed)PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃