Issue40564
Created on 2020-05-08 15:47 by bustawin, last changed 2020-09-09 23:04 by jaraco.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| zipfile.zip | bustawin, 2020-05-08 15:47 | |||
| Messages (8) | |||
|---|---|---|---|
| msg368449 - (view) | Author: Xavier (bustawin) | Date: 2020-05-08 15:47 | |
Given a .zip file with more than one inner file, when reading those inner files using zipfile.Path the zip module closes the .zip file prematurely, causing an error.
Given the following code (example zipfile is attached, but any should work).
with zipfile.ZipFile('foo.zip') as file:
for name in ['file-1.txt', 'file-2.txt']:
p = zipfile.Path(file, name)
with p.open() as inner:
print(list(inner)) # ValueError: seek of closed file
But the following code does not fail:
with zipfile.ZipFile('foo.zip') as file:
for name in ['file-1.txt', 'file-2.txt']:
with file.open(name) as inner:
print(list(inner)) # Works!
Python 3.8.2 macOS 10.15.4.
|
|||
| msg375919 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2020-08-26 08:12 | |
Bisection show this to be caused due to e5bd73632e77dc5ab0cab77e48e94ca5e354be8a . See also issue41640 for a similar report. |
|||
| msg375961 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2020-08-26 21:22 | |
I've attempted to replicate the issue in the [zipp](https://github.com/jaraco/zipp) test suite with this test: ```diff diff --git a/test_zipp.py b/test_zipp.py index a6fbf39..dc7c8aa 100644 --- a/test_zipp.py +++ b/test_zipp.py @@ -259,3 +259,10 @@ class TestPath(unittest.TestCase): def test_implied_dirs_performance(self): data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)] zipp.CompleteDirs._implied_dirs(data) + + def test_read_does_not_close(self): + for alpharep in self.zipfile_alpharep(): + for rep in range(2): + p_ = zipp.Path(alpharep, 'a.txt') + with p_.open() as inner: + print(list(inner)) ``` But the test passes. |
|||
| msg375962 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2020-08-26 21:44 | |
I am able to replicate the failure using the ondisk fixture:
```diff
diff --git a/test_zipp.py b/test_zipp.py
index a6fbf39..539d0a9 100644
--- a/test_zipp.py
+++ b/test_zipp.py
@@ -259,3 +259,11 @@ class TestPath(unittest.TestCase):
def test_implied_dirs_performance(self):
data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)]
zipp.CompleteDirs._implied_dirs(data)
+
+ def test_read_does_not_close(self):
+ for alpharep in self.zipfile_ondisk():
+ with zipfile.ZipFile(alpharep) as file:
+ for rep in range(2):
+ p_ = zipp.Path(file, 'a.txt')
+ with p_.open() as inner:
+ print(list(inner))
```
|
|||
| msg375963 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2020-08-26 21:55 | |
I suspect the issue lies in how the CompleteDirs.make [replaces one instance with another](https://github.com/jaraco/zipp/blob/8ad959e61cd4be40baab93528775c2bf03c8f4e1/zipp.py#L112-L114) in order to provide a complete list of implied directories and to memoize the names lookup. Because constructing a zipfile.Path object around a zipfile.ZipFile copies the underlying state, closing one will have the affect of closing the other. I believe this issue is the same root issue as issue41350. |
|||
| msg375964 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2020-08-26 22:06 | |
I see a few options here: - Implement CompleteDirs/FastLookup as adapters instead of subclasses, allowing the original ZipFile object to represent the state in a single place. This approach would likely be slower due to the indirection on all operations through the wrapper. - Instead of constructing a new object and copying the state, CompleteDirs.make could mutate the existing ZipFile class, replacing `source.__class__` with the new class. This approach is messy and the caller would still need to be aware that this change could be applied to the zipfile object. - Consider this use-case unsupported and document that any ZipFile object passed into Path is no longer viable and should not be referenced for another purpose. - Eliminate the class-based performance optimizations and replace them with some functional/imperative form. This approach may provide less separation of concerns. - Disable the close-on-delete behavior for the subclasses when state is copied from another. |
|||
| msg375965 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2020-08-26 22:26 | |
Implementing that last option:
```
diff --git a/zipp.py b/zipp.py
index 697d4a9..f244cba 100644
--- a/zipp.py
+++ b/zipp.py
@@ -111,6 +111,7 @@ class CompleteDirs(zipfile.ZipFile):
res = cls.__new__(cls)
vars(res).update(vars(source))
+ res.close = lambda: None
return res
```
Does appear to address the issue. I'm not super happy about the implementation, though.
|
|||
| msg376668 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2020-09-09 23:04 | |
In jaraco/zipp, I've implemented three of the options above: - https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-1 - https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-2 - https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-5 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2020-09-09 23:04:27 | jaraco | set | messages: + msg376668 |
| 2020-08-29 12:59:33 | xtreak | link | issue41640 superseder |
| 2020-08-26 22:26:58 | jaraco | set | messages: + msg375965 |
| 2020-08-26 22:06:56 | jaraco | set | messages: + msg375964 |
| 2020-08-26 21:55:17 | jaraco | set | messages: + msg375963 |
| 2020-08-26 21:44:02 | jaraco | set | messages: + msg375962 |
| 2020-08-26 21:22:55 | jaraco | set | messages: + msg375961 |
| 2020-08-26 08:12:56 | xtreak | set | nosy:
+ xtreak messages: + msg375919 |
| 2020-05-08 16:52:16 | brett.cannon | set | nosy:
+ jaraco |
| 2020-05-08 15:47:01 | bustawin | create | |
