Skip to content

[Fix] Fix msdatasets split issue#1704

Merged
wangxingjun778 merged 6 commits into
modelscope:masterfrom
wangxingjun778:fix/plugin
May 5, 2026
Merged

[Fix] Fix msdatasets split issue#1704
wangxingjun778 merged 6 commits into
modelscope:masterfrom
wangxingjun778:fix/plugin

Conversation

@wangxingjun778
Copy link
Copy Markdown
Member

@wangxingjun778 wangxingjun778 commented May 4, 2026

Fixes & Refactoring

  • Split-based Filtering: Implemented support for loading specific data splits in datasets.
  • API Updates: Updated load_dataset and load_dataset_builder functions to accept and handle the split parameter.
  • Utility Functions: Added new utilities for:
    • Parsing split specifications.
    • Filtering data files based on split criteria.
  • Validation Logic: Added validation to ensure requested splits exist in the dataset metadata before loading.
  • Code Structure: Consolidated redundant inline imports to the top level of modules to improve readability and adhere to best practices (based on reviewer feedback).
  • msdatasets: Fix engine config key error and improve robust issue for preview loading

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements split-based filtering for datasets, enabling users to load specific data splits. It introduces utility functions for parsing split specifications and filtering data files, along with validation logic to ensure requested splits are present in the metadata. The load_dataset and load_dataset_builder functions were updated to support the split parameter. Reviewer feedback focuses on improving code structure by removing redundant inline imports and consolidating them at the top level.

Comment thread modelscope/msdatasets/utils/hf_datasets_util.py
Comment thread modelscope/msdatasets/utils/hf_datasets_util.py
Comment thread modelscope/msdatasets/utils/hf_datasets_util.py Outdated
@wangxingjun778 wangxingjun778 merged commit 13064d9 into modelscope:master May 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

希望给数据集数据预览增加手动触发重新部署功能

2 participants