This is the mail archive of the cygwin-patches mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin Filesystem Performance degradation 1.7.5 vs 1.7.7, and methods for improving performance


Hi,

> There's also the problem of handling NFS shares.  However, I just had an
> idea how to speed up symlink_info::check without neglecting NFS shares.
> This will take some time, though since it turns a lot of code upside
> down.  Stay tuned.

This sounds great! Cygwin filesystem performance is a very important issue, and any improvement is more than welcome!

> I don't understand how you think this should work. The filter expression
> given to NtQueryDirectoryFile is either a constant string and has to match
> the filename exactly, or it contains wildcards. This is documented
> behaviour: http://msdn.microsoft.com/en-us/library/ff567047%28VS.85%29.aspx
> So, "foo" works, "foo*" works, but a list like "foo foo.exe foo.lnk"
> does not.


There are two options for stat() and other places the need file info (such as check_symlink):

1) CreateFile(the_dir), then NtQueryDirectoryFile("foo*") and retrieve all the info (including the hardlink), filter out the results in user-mode ("foo", "foo.exe", "foo.lnk"), and then call CloseHandle().

2) CreateFile(the_dir), NtQueryDirectoryFile("foo"), NtQueryDirectoryFile("foo.exe"), NtQueryDirectoryFile("foo.lnk"), CloseHandle(). The calls to NtQueryDirectoryFile() should be with RestartScan=1, so that the the_dir handle can be reused. Also ReturnSingleEntry=1 can be set to improve performance.

This is instead what is done today in cygwin:
3) CreateFile("foo"), NtQueryFileInformation(), CloseHandle() (and repeat this for "foo.exe" and "foo.lnk")


I did some performance tests comparing #1 #2 and #3.

I found out that #1 and #2 are both around 10x to 100x (!!!) times faster than #3.

I checked out why, and found out that #1 and #2 don't modify the access time of the file, whereas #3 does. This already immediately causes a huge performance penalty (and it is also not according to the posix standard: stat("foo") should not update atime of "foo").
Another reason is that the kernel NTFS driver performs automatically read-ahead of the file, thus just stat("foo") (which calls CreateFile("foo") in #3) causes the first 64k of "foo" to be read from the disk - slowing down performance tremendously. Think of "ls /bin" with 3500 files: NTFS reads the first 64K of all the 3500 files! no wonder it takes so long...
And yet another reason why #3 is way slower than #1 and #2 is the anti-viruses: Nearly all Windows users install an AV (or use Win7 MS AV). These trap and monitor all CreateFile() to regular files (not to directory files). Therefore CreateFile() to a regular file can take a lot lot longer than CreateFile() to a directory.


I would suggest using #2 over #1, since its simpler code-wise, and I did not see any serious performance difference between the two.

Yoni


On 14/9/2010 12:05 PM, Corinna Vinschen wrote:
On Sep 13 13:28, Yoni Londner wrote:
Hi,

However, isn't that kind of a chicken/egg situation?  If you want to
reuse the content of the FILE_BOTH{_ID}_DIRECTORY_INFORMATION structure
from a previous call to readdir, you would have to call the

I am not talking about reusing info from a previous readdir.


Every single file cygwin tries to access, it does it in a loop,
trying afterwards to check for *.lnk file.

Using the directory query operations, it is possible to get this
info faster:
instead of getting file info for FOO and then for "FOO.lnk",
Cygwin can query the directory info for "FOO FOO.LNK" (for the file
requested, plus its possible symlink file).

I don't understand how you think this should work. The filter expression given to NtQueryDirectoryFile is either a constant string and has to match the filename exactly, or it contains wildcards. This is documented behaviour: http://msdn.microsoft.com/en-us/library/ff567047%28VS.85%29.aspx So, "foo" works, "foo*" works, but a list like "foo foo.exe foo.lnk" does not.

There's also the problem of handling NFS shares.  However, I just had an
idea how to speed up symlink_info::check without neglecting NFS shares.
This will take some time, though since it turns a lot of code upside
down.  Stay tuned.


Corinna




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]