This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: untarring symlinks with ../ fails randomly, silghtly OT


On 04/07/2011 8:21 AM, Ryan Johnson wrote:
On 04/07/2011 7:33 AM, Corinna Vinschen wrote:
On Jul 4 06:56, Ryan Johnson wrote:
On 04/07/2011 6:46 AM, Corinna Vinschen wrote:
On Jul 4 11:15, Wolf Geldmacher wrote:
As an aside:
    I also used to have some trouble with "rm -rf" of a directory
    hierarchy failing more or less reproducibly (like: 80% of the
    time) because files were presumably still "in use". Repeating
    the command several times would succeed, though.

    Downgrading from cygwin1.dll/1.7.9.1 to cygwin1.dll/1.7.8.1
    seems to have solved that issue as well - still have to see
    the first "retry to delete".

This may or may not be related to the original report, as it also reeks
of a race condition during file/directory operations.
I can neither reproduce the tar problem, nor can I reprocude the rm
problem. I tried this under 2008R2 which is basically the same as your
W7-64 bit. I used local and remote drives to test the issue but to no
avail.


Are you sure this isn't a BLODA problem which is triggered by the
changes in 1.7.9?

I just took a look through the changes between 1.7.8 and 1.7.9, and
the list of changes which affect filesystem access is pretty small:

[snip]

So, is it possible that the request for WRITE_DAC access in the call to
NtCreateFile triggers some hiccup of your virus checker? It could easily
explain both effects.
I have also seen the rm -rf problem occasionally on my w7-64
machine, and I don't think anything from BLODA is installed.
Also with 1.7.8?  Given the minor number of FS-related changes, it's
so very unlikely that they would cause a differnce between 1.7.8 and
1.7.9.

However, I haven't noticed the issue since disabling the search
indexer on my machine. I did this on the hunch that I often delete
large directory trees which aren't very old (e.g. after
untar/configure/make of some source package), and that it wouldn't
be a big surprise if indexing and cygwin's rm don't mix for whatever
reason.
Hard to imagine that setting the WRITE_DAC flag would interfere with the
search indexer.  On second thought, the flag is only set if a file does
not exist yet and NtCreateFile gets called to create the file.  That
makes it especially unlikely that this would affect unlinking.

However, given that you can reproduce the issue, could you test the
scenario again?  If the issue occurs, can you disable the following code
in fhandler.cc and see if it changes anything?

616 else if (!exists ()&& has_acls ())
617 /* If we are about to create the file and the filesystem supports
618 ACLs, we will overwrite the DACL after the call to NtCreateFile.
619 This requires a handle with additional WRITE_DAC access,
620 otherwise set_file_sd has to open the file again. */
621 access |= WRITE_DAC;


Sorry, I have no idea which version of the dll I had at the time. It was at least a month ago, maybe more.

However, I was wrong about not seeing the problem since. Choosing a random source dir to blow away:
$ rm -rf Python-2.6.6
rm: cannot remove `Python-2.6.6/Lib/lib2to3/tests': Directory not empty
$ rm -rf Python-2.6.6
$

This seems to happen more than half the time (different non-empty dir every time). Naturally, running under strace makes the problem go away (it doesn't help that strace kills stderr, where any error messages might have gone).


Running the following command 10x:

$ tar -xaf Python-2.6.6.tar.bz2 && sleep 3 && (rm -rf Python-2.6.6 || (echo 'Retrying...' && rm -rf Python-2.6.6))

I get six times with no error, two times with one error, one time each with two and three errors.

I'm currently updating and rebuilding my cygwin sources to try out your patch...
Updated, built, and reproduced, with and without the patch. If anything it's more common in my dev build -- it happened on the first try both times.

Any idea of how to debug this? We need some instantaneous version of lsof or something...

Ryan

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]