This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7.7: rm -rf sometimes fails - race condition?


On Dec 10 22:30, Steven Hartland wrote:
> 
> ----- Original Message ----- From: "Christopher Faylor"
> 
> >>This looks like either a premature return from a syscall or libcall, or like a
> >>genuine race in the system.
> >>
> >>Has anyone seen similar things?
> >
> >Yes and you seem to have nailed the problem - it happens when a virus checker
> >hooks into a syscall and allows it to return before completion.  I don't think
> >we want to modify Cygwin to not trust success return values from system calls.
> 
> Is this the age old delete on close raising its ugly head again?
> 
> So the rm kicks in a file is shared locked, rm uses the cygwin unlink code
> which "schedules" the file for deletion and returns success without actually
> succeeding, hence when it comes to delete the parent dir it fails as the file
> actually still exists.

You're over-simplifying a bit.

First of all, the underlying problem is a semantical problem.  Neither
UNIX nor Windows remove a file from the disk which is still in use by
some process.  However, and that's the important difference, while UNIX
unlink() removes the files dir entry in the parent directory, Windows
does not remove the dir entry of the file, as long as there's still an
open handle.

So on UNIX, if you unlink the only file in a directory (even if it's in
use), the directory is actually empty afterwards.  Hence a subsequent
rmdir(parent-dir) call succeeds, if you have permissions to do so.

Now, on Windows, on the native NT API level, there are three different
ways to delete a file (NtDeleteFile, Set delete disposition, delete on
close), but none of them removes the file's entry in the parent
directory as long as the file is still in use.  Consequently the parent
directory doesn't count as empty, and a subsequent try to delete the
parent dir fails with STATUS_DIRECTORY_NOT_EMPTY.

As you can see, it has nothing to do with "delete-on-close".  This
method is not used by Cygwin anyway, apart from in a very specific
border case.

So, what cygwin tries to do in the first place is to move files in use
into the recycle bin.  However, on Windows you need DELETE access rights
to be able to do so.  And, this doesn't work for remote drives.  On
remote drives we can only try to rename the file to some temporary
filename and hope for the best.  Afterwards Cygwin sets the delete
dispostion flag and returns success if setting the dispostion flag
succeeded.  After all, that's the maximum possible on Windows, and for
all we can tell the file has been deleted.  The fact that the directory
entry lingers until the last handle to the file has been closed is
something Cygwin has no control over.

If you want to see the whole mess, including lots of comments explaining
all the problems we experienced the long and hard way, see 
http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/cygwin/syscalls.cc?cvsroot=src
Search for the function called "unlink_nt".  It's the underlying
function called from unlink() and rmdir().


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]