New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] OneDrive SharePoint sync folder - incorrect values returned by os.listdir and win32file.FindFilesW #102993
Comments
|
OneDrive uses placeholder reparse points, which by default are disguised as regular files and directories if the process executable is outside of the "%SystemRoot%" tree. Placeholder reparse points are thus exposed to "cmd.exe" since it's inside "%SystemRoot%". You can check this via CPython has opted to use the default setting that disguises placeholder reparse points. It could be that the filesystem filter driver that handles OneDrive reparse points is failing in some way when placeholders are disguised. To rule out placeholder disguising as the cause of the different behavior, you could ask the SO user to try running the following code before calling import ctypes
ntdll = ctypes.WinDLL('ntdll')
PHCM_EXPOSE_PLACEHOLDERS = 2
ntdll.RtlSetProcessPlaceholderCompatibilityMode.argtypes = (ctypes.c_char,)
ntdll.RtlSetProcessPlaceholderCompatibilityMode(PHCM_EXPOSE_PLACEHOLDERS)
Apparently my old ctypes code that calls |
|
I am far from the Windows internals expertise required to parse all of the above. @GordonAitchJay and @eryksun : would you have a layman's summary on what this means for usability of python and OneDrive, any possible workarounds, and what the ETA might look like? I just started running into this issue this week, also with R. I wrote it off as a fluke until I ran into it with python as well. I can do |
|
@eryksun Thank you for your insight. It's very interesting! What are the benefits of placeholder reparse points being disguised as regular files and directories if the process executable is outside of the "%SystemRoot%" tree? It's clearly a deliberate decision. @jwhendy As far as I know, you're only the second Python user to have encountered this problem. I can't replicate it. Which directories do you have this problem with? It doesn't appear to be all directories managed by OneDrive, at least not all the time. Instead of using Please follow @eryksun's suggestion above which will prevent placeholder reparse points from being disguised as regular files and directories. Before calling import ctypes
ntdll = ctypes.WinDLL('ntdll')
PHCM_EXPOSE_PLACEHOLDERS = 2
ntdll.RtlSetProcessPlaceholderCompatibilityMode.argtypes = (ctypes.c_char,)
ntdll.RtlSetProcessPlaceholderCompatibilityMode(PHCM_EXPOSE_PLACEHOLDERS)Does @eryksun Assuming the call to I don't think there will be an ETA for a fix, since this similar issue raised by @eryksun was rejected because I suppose it would be possible to change the implementation of |
This is explained in the documentation. Some programs mistakenly handle all reparse points as if they're symbolic links, instead of checking the name-surrogate bit [*] of the reparse tag using the macro
A downside to exposing placeholder reparse points is that
It's unlikely that Python's standard library will ever call NTAPI system calls directly, such as The I've actually wanted to switch to using [*] Here is some optional background information on the two types of name-surrogate reparse points that are commonly used in the NTFS and ReFS filesystems, and how Python supports them. It's off topic, but I think it's important to understanding the overall system of reparse points, which is a complex subject that's limited to just Windows.
When passed |
|
Here's a first draft ctypes-based prototype of If you can reproduce the reported problem with OneDrive and import os
import stat
import msvcrt
import collections
import ctypes
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
ERROR_INVALID_FUNCTION = 1
ERROR_NO_MORE_FILES = 18
ERROR_NOT_SUPPORTED = 50
ERROR_INVALID_PARAMETER = 87
ERROR_MORE_DATA = 234
ERROR_DIRECTORY = 267
INVALID_HANDLE_VALUE = wintypes.HANDLE(-1).value
FILE_TYPE_DISK = 1
FILE_READ_DATA = 1
FILE_SHARE_READ = 1
OPEN_EXISTING = 3
FILE_FLAG_BACKUP_SEMANTICS = 0x02000000
O_OBTAIN_DIR = 0x2000 # os.open() flag that opens with backup semantics
FileBasicInfo = 0
FileIdBothDirectoryInfo = 10
FileIdBothDirectoryRestartInfo = 11
FileFullDirectoryInfo = 14
FileFullDirectoryRestartInfo = 15
FILE_INFO_BY_HANDLE_CLASS = wintypes.ULONG
LPSECURITY_ATTRIBUTES = wintypes.LPVOID
kernel32.CreateFileW.restype = wintypes.HANDLE
kernel32.CreateFileW.argtypes = (
wintypes.LPCWSTR, # In lpFileName
wintypes.DWORD, # In dwDesiredAccess
wintypes.DWORD, # In dwShareMode
LPSECURITY_ATTRIBUTES, # In_opt lpSecurityAttributes
wintypes.DWORD, # In dwCreationDisposition
wintypes.DWORD, # In dwFlagsAndAttributes
wintypes.HANDLE) # In_opt hTemplateFile
kernel32.GetFileInformationByHandleEx.argtypes = (
wintypes.HANDLE, # hFile
FILE_INFO_BY_HANDLE_CLASS, # FileInformationClass
wintypes.LPVOID, # lpFileInformation
wintypes.DWORD) # dwBufferSize
stat_result = collections.namedtuple('stat_result',
('st_mode', 'st_ino', 'st_dev', 'st_nlink', 'st_uid', 'st_gid', 'st_size',
'st_atime', 'st_mtime', 'st_ctime', 'st_btime', 'st_atime_ns',
'st_mtime_ns', 'st_ctime_ns', 'st_btime_ns', 'st_change_time',
'st_change_time_ns', 'st_file_attributes', 'st_reparse_tag'))
class FILE_BASIC_INFO(ctypes.Structure):
_fields_ = (('CreationTime', wintypes.LARGE_INTEGER),
('LastAccessTime', wintypes.LARGE_INTEGER),
('LastWriteTime', wintypes.LARGE_INTEGER),
('ChangeTime', wintypes.LARGE_INTEGER),
('FileAttributes', wintypes.DWORD))
class FILE_BASE_DIR_INFO(ctypes.Structure):
__slots__ = ()
@property
def FileName(self):
length = self._FileNameLength
if not length:
return ''
addr = ctypes.addressof(self) + type(self)._FileName.offset
size = length // ctypes.sizeof(wintypes.WCHAR)
return (wintypes.WCHAR * size).from_address(addr).value
@property
def EaSize(self):
# Since a reparse point cannot have extended attributes, the EaSize
# field is reused to store the reparse tag if the entry is a reparse
# point. This behavior is documented in [MS-FSCC].
# https://learn.microsoft.com/openspecs/windows_protocols/ms-fscc/e8d926d1-3a22-4654-be9c-58317a85540b
if not (self.FileAttributes & stat.FILE_ATTRIBUTE_REPARSE_POINT):
return self._EaSize
return 0
@property
def ReparseTag(self):
# See the comment about EaSize.
if self.FileAttributes & stat.FILE_ATTRIBUTE_REPARSE_POINT:
return self._EaSize
return 0
class FILE_FULL_DIR_INFO(FILE_BASE_DIR_INFO):
__slots__ = ()
_fields_ = (('_NextEntryOffset', wintypes.DWORD),
('_FileIndex', wintypes.DWORD),
('CreationTime', wintypes.LARGE_INTEGER),
('LastAccessTime', wintypes.LARGE_INTEGER),
('LastWriteTime', wintypes.LARGE_INTEGER),
('ChangeTime', wintypes.LARGE_INTEGER),
('EndOfFile', wintypes.LARGE_INTEGER),
('AllocationSize', wintypes.LARGE_INTEGER),
('FileAttributes', wintypes.DWORD),
('_FileNameLength', wintypes.DWORD),
('_EaSize', wintypes.DWORD),
('_FileName', wintypes.WCHAR * 1))
class FILE_ID_BOTH_DIR_INFO(FILE_BASE_DIR_INFO):
__slots__ = ()
_fields_ = (('_NextEntryOffset', wintypes.DWORD),
('_FileIndex', wintypes.DWORD),
('CreationTime', wintypes.LARGE_INTEGER),
('LastAccessTime', wintypes.LARGE_INTEGER),
('LastWriteTime', wintypes.LARGE_INTEGER),
('ChangeTime', wintypes.LARGE_INTEGER),
('EndOfFile', wintypes.LARGE_INTEGER),
('AllocationSize', wintypes.LARGE_INTEGER),
('FileAttributes', wintypes.DWORD),
('_FileNameLength', wintypes.DWORD),
('_EaSize', wintypes.DWORD),
('_ShortNameLength', wintypes.BYTE),
('_ShortName', wintypes.WCHAR * 12),
('FileId', wintypes.LARGE_INTEGER),
('_FileName', wintypes.WCHAR * 1))
class DirEntry:
__slots__ = ('_dirpath', '_info')
def __init__(self, dirpath, info):
self._dirpath = dirpath
self._info = info
def __repr__(self):
return '<{} {!r}>'.format(self.__class__.__name__, self.name)
@classmethod
def _listbuf(cls, buf, info_class, dirpath):
result = []
if info_class == FileIdBothDirectoryInfo:
info_struct = FILE_ID_BOTH_DIR_INFO
elif info_class == FileFullDirectoryInfo:
info_struct = FILE_FULL_DIR_INFO
else:
raise ValueError('unsupported information class')
base_size = ctypes.sizeof(info_struct) - ctypes.sizeof(wintypes.WCHAR)
offset = 0
while True:
tmp = info_struct.from_buffer(buf, offset)
if tmp._FileNameLength and tmp.FileName not in ('.', '..'):
info = info_struct()
size = base_size + tmp._FileNameLength
ctypes.resize(info, size)
ctypes.memmove(ctypes.byref(info), ctypes.byref(tmp), size)
entry = cls(dirpath, info)
result.append(entry)
if tmp._NextEntryOffset:
offset += tmp._NextEntryOffset
else:
break
return result
def _is_name_surrogate(self):
return bool(self._info.ReparseTag & 0x20000000)
def _is_reparse_point(self):
return bool(self._info.FileAttributes &
stat.FILE_ATTRIBUTE_REPARSE_POINT)
@property
def name(self):
if isinstance(self._dirpath, bytes):
return os.fsencode(self._info.FileName)
return self._info.FileName
@property
def path(self):
return os.path.join(self._dirpath, self.name)
def stat(self, follow_symlinks=True):
def nt_time_as_posix_ns(t):
if t == 0:
return 0
# NT has an epoch of 1601, and its time unit is 100 ns.
return (t - 116444736000000000) * 100
if (self._is_reparse_point() and
(follow_symlinks or not self._is_name_surrogate())):
return os.stat(self.path)
if self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK:
mode = stat.S_IFLNK
elif self._info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY:
mode = stat.S_IFDIR
else:
pipe_paths = ('\\\\.\\pipe', '\\\\?\\pipe')
drive = os.path.splitdrive(os.fsdecode(self._dirpath))[0]
if drive and os.path.normcase(drive) in pipe_paths:
mode = stat.S_IFIFO
else:
mode = stat.S_IFREG
file_id = getattr(self._info, 'FileId', 0)
atime_ns = nt_time_as_posix_ns(self._info.LastAccessTime)
mtime_ns = nt_time_as_posix_ns(self._info.LastWriteTime)
# BUGBUG: POSIX st_ctime should be the metadata change time, and
# st_btime should be the creation (birth) time. But Python
# follows the Windows C runtime implementation, which back in the
# days of MS-DOS in the 1980s, before there was even a POSIX
# standard, chose to redefine Unix st_ctime as the creation time.
# They should have added a new field for the creation time, and
# they should have ignored st_ctime until they had a filesystem
# that supported it, i.e. NTFS on Windows NT in 1993.
ctime_ns = nt_time_as_posix_ns(self._info.CreationTime)
btime_ns = nt_time_as_posix_ns(self._info.CreationTime)
change_time_ns = nt_time_as_posix_ns(self._info.ChangeTime)
return stat_result(
st_mode=mode,
st_ino=file_id,
st_dev=0,
st_nlink=0,
st_uid=0,
st_gid=0,
st_size=self._info.EndOfFile,
st_atime=atime_ns // 10**9,
st_mtime=mtime_ns // 10**9,
st_ctime=ctime_ns // 10**9,
st_btime=btime_ns // 10**9,
st_atime_ns=atime_ns,
st_mtime_ns=mtime_ns,
st_ctime_ns=ctime_ns,
st_btime_ns=btime_ns,
st_change_time=change_time_ns // 10**9,
st_change_time_ns=change_time_ns,
st_file_attributes = self._info.FileAttributes,
st_reparse_tag = self._info.ReparseTag)
def inode(self):
if (not hasattr(self._info, 'FileId') or
(self._is_reparse_point() and not self._is_name_surrogate())):
return os.stat(self.path).st_ino
return self._info.FileId
def is_dir(self, follow_symlinks=True):
if self._is_reparse_point():
if follow_symlinks or not self._is_name_surrogate():
return os.path.isdir(self.path)
if self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK:
return False
if self._info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY:
return True
return False
def is_file(self, follow_symlinks=True):
if self._is_reparse_point():
if follow_symlinks or not self._is_name_surrogate():
return os.path.isfile(self.path)
if self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK:
return False
if self._info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY:
return False
pipe_paths = ('\\\\.\\pipe', '\\\\?\\pipe')
drive = os.path.splitdrive(os.fsdecode(self._dirpath))[0]
if drive and os.path.normcase(drive) in pipe_paths:
return False
return True
def is_symlink(self):
return self._info.ReparseTag == stat.IO_REPARSE_TAG_SYMLINK
def is_junction(self):
return self._info.ReparseTag == stat.IO_REPARSE_TAG_MOUNT_POINT
def scandir(path=None):
"""Return an iterator of DirEntry objects for given path."""
if path is None:
path = os.getcwd()
def isdir():
info = FILE_BASIC_INFO()
if kernel32.GetFileInformationByHandleEx(
hFile, FileBasicInfo, ctypes.byref(info),
ctypes.sizeof(info)):
return info.FileAttributes & stat.FILE_ATTRIBUTE_DIRECTORY
return False
def readdir():
nonlocal info_class
if kernel32.GetFileInformationByHandleEx(
hFile, info_class, buf, ctypes.sizeof(buf)):
return True
error = ctypes.get_last_error()
if error == ERROR_NO_MORE_FILES:
return False
elif (info_class == FileIdBothDirectoryRestartInfo and
error in (ERROR_INVALID_FUNCTION,
ERROR_NOT_SUPPORTED,
ERROR_INVALID_PARAMETER)):
info_class = FileFullDirectoryRestartInfo
return readdir()
elif error == ERROR_MORE_DATA:
ctypes.resize(buf, ctypes.sizeof(buf) * 2)
return readdir()
raise ctypes.WinError(error)
def ScandirIterator():
try:
while True:
yield from DirEntry._listbuf(buf, info_class, dirpath)
if not readdir():
break
finally:
if close:
os.close(fd)
close = False
try:
if isinstance(path, int):
fd = path
hFile = msvcrt.get_osfhandle(fd)
if kernel32.GetFileType(hFile) != FILE_TYPE_DISK:
raise ValueError('if path is a file descriptor, it must '
'refer to a file on a volume device')
dirpath = ''
else:
path = os.fspath(path)
hFile = kernel32.CreateFileW(
os.fsdecode(path), FILE_READ_DATA, FILE_SHARE_READ,
None, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, None)
if hFile == INVALID_HANDLE_VALUE:
raise ctypes.WinError(ctypes.get_last_error())
try:
fd = msvcrt.open_osfhandle(hFile, os.O_RDONLY)
except:
kernel32.CloseHandle(hFile)
raise
close = True
dirpath = path
if not isdir():
raise ctypes.WinError(ERROR_DIRECTORY)
buf = (ctypes.c_char * 65536)()
info_class = FileIdBothDirectoryRestartInfo
readdir()
if info_class == FileIdBothDirectoryRestartInfo:
info_class = FileIdBothDirectoryInfo
elif info_class == FileFullDirectoryRestartInfo:
info_class = FileFullDirectoryInfo
except:
if close:
os.close(fd)
raise
return ScandirIterator()
def listdir(path=None):
"""Return a list containing the names of the files in the directory."""
return [e.name for e in scandir(path)] |
This is an issue experienced by a user on StackOverflow, so please excuse the lack of details and MRE. I'm hoping a Windows internals expert and/or a OneDrive dev can shed light on the situation.
Why does os.walk() (Python) ignore a OneDrive directory depending on the number of files in it?
The user has a directory which is a sync/shortcut of a SharePoint folder containing 897 files (all files can be opened, they are downloaded, not on-demand). When calling
os.listdirwith this directory, an exception is raised:OSError: [WinError 87] The parameter is incorrect:. However, if 2 files are deleted, it returns all the files (besides the 2 which were deleted). If the directory is copied somewhere outside the purview of OneDrive,os.listdirreturns all 897 files.Calling
win32file.FindFilesWbehaves the same asos.listdir. With 897 files it raises an exception:error: (87, 'FindNextFileW', 'The parameter is incorrect.'). After deleting 2 files, it returns all the files.When calling
win32file.FindFilesIteratorwhen the directory has all 897 files, 443 files are yielded before the error occurs.glob.glob()is the same but doesn't yield.or..(as expected). Strangely, if only 1 file is deleted,win32file.FindFilesIteratoryields only 25 files!If the directory is copied to the local OneDrive root directory,
os.listdirinitially works (when OneDrive had just started uploading the files). However, after a couple of minutes, once a number of the files have been uploaded,os.listdirresults inOSError: [WinError 87] The parameter is incorrect:again. Even before all files have synced,win32file.FindFilesIteratoryields only 443 files again.Explorer always shows the full list of files, and so does cmd's
dir, and powershell'slsandgci.Calling
NtQueryDirectoryFiledirectly with ctypes always shows the full list of filesI'm fairly sceptical that CPython is at fault here, but I find it utterly bizarre that cmd's
dir, and powershell'slsandgciwork, which all callFindNextFileW, yet when CPython calls the same function it predictably returns prematurely.The text was updated successfully, but these errors were encountered: