Skip to content

BUG: fix max_rows and chunked string/datetime reading in loadtxt#26836

Merged
charris merged 1 commit into
numpy:maintenance/2.0.xfrom
charris:backport-26762
Jul 3, 2024
Merged

BUG: fix max_rows and chunked string/datetime reading in loadtxt#26836
charris merged 1 commit into
numpy:maintenance/2.0.xfrom
charris:backport-26762

Conversation

@charris
Copy link
Copy Markdown
Member

@charris charris commented Jul 3, 2024

Backport of #26762.

Within function _read(), called by loadtxt() method, large files are read in chunks of maximum _loadtxt_chunksize=50000 lines, by calling iteratively the function _load_from_filelike().
When max_rows exceeded _loadtxt_chunksize, the latter was still called with the option max_rows=max_rows, instead of max_rows=chunk_size, that cannot be greater than _loadtxt_chunksize. This caused the function to load chunks of max_rows lines while still thinking having loaded chunks of maximum _loadtxt_chunksize lines.
The option max_rows=max_rows at line 1058 has been changed with max_rows=chunk_size, and added a test function to check loadtxt() for different sizes in the file numpy/lib/tests/test_loadtxt.py.

Closes #26754.

  • fixed bug at line 1058 in file numpy/lib&npyio_impl.py; in function _read(), called by loadtxt() method, when files are read in chunks to reduce memory overhead, max_rows lines were always loaded every time, also in the case max_rows>_loadtxt_chunksize, in which case it loaded chunks with the wrong size. A test has been added in numpy/lib/tests/test_loadtxt.py, to check for the array size loaded for different max_rows, less and greater than _loadtxt_chunksize.

  • changed numpy/lib/tests/test_loadtxt.py; added further tests in functions at lines test_maxrows_exceeding_chunksize() and test_parametric_unit_discovery() to check if loadtxt() method loads correctly files as a whole and in chunks. It seems that the function _load_from_filelike() works well with file-like streams, but not with file objects.

  • changed value of filelike variable in file numpy/lib/_npyio_impl.py at line 1045; file was converted to iterable, but not accounted for, then _load_from_fillelike() was not able to read the stream properly until the end.

  • I forgot to add the new version of test_loadtxt.py with the updated test functions for reading files in chunks...

  • within file numpy/lib/tests/test_loadtxt.py I reduced the size of the arrays within function test_maxrows_exceeding_chunksize()

  • add max_rows=10 in the call of loadtxt() within function test_field_growing_cases() to avoid memory allocation issues when the line grows too much.

  • Update numpy/lib/tests/test_loadtxt.py


…umpy#26762)

* fixed bug at line 1058 in file numpy/lib&npyio_impl.py; in function _read(), called by loadtxt() method, when files are read in chunks to reduce memory overhead, max_rows lines were always loaded every time, also in the case max_rows>_loadtxt_chunksize, in which case it loaded chunks with the wrong size. A test has been added in numpy/lib/tests/test_loadtxt.py, to check for the array size loaded for different max_rows, less and greater than _loadtxt_chunksize.

* changed numpy/lib/tests/test_loadtxt.py; added further tests in functions at lines test_maxrows_exceeding_chunksize() and test_parametric_unit_discovery() to check if loadtxt() method loads correctly files as a whole and in chunks. It seems that the function _load_from_filelike() works well with file-like streams, but not with file objects.

* changed value of filelike variable in file numpy/lib/_npyio_impl.py at line 1045; file was converted to iterable, but not accounted for, then _load_from_fillelike() was not able to read the stream properly until the end.

* I forgot to add the new version of test_loadtxt.py with the updated test functions for reading files in chunks...

* within file numpy/lib/tests/test_loadtxt.py I reduced the size of the arrays within function test_maxrows_exceeding_chunksize()

* add max_rows=10 in the call of loadtxt() within function test_field_growing_cases() to avoid memory allocation issues when the line grows too much.

* Update numpy/lib/tests/test_loadtxt.py

---------

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
@charris charris added this to the 2.0.1 release milestone Jul 3, 2024
@charris charris merged commit 00ee746 into numpy:maintenance/2.0.x Jul 3, 2024
@charris charris deleted the backport-26762 branch July 3, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants