This is the mail archive of the cygwin@sources.redhat.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Make hung in WaitForMultipleObjects inside Cygwin


We're seeing occasional hangs in our builds where Make hangs in a call
to WaitForMultipleObjects inside cygwin1.dll, but we don't know what
it's waiting for because it in fact has no child processes to wait on.
I determined where Make is hung by attaching to it with the Developer
Studio debugger.

It's probably significant that we run Make with "-j2" on a dual
processor machine.  We are using cygwin-1.1.8-2 and make-3.79.1-2.

When I exited from the debugger, the Make process and its parent also
exited, but *its* parent, another Make process stayed hung.  It also
showed "I" in the first column of the output of "ps" (the lowermost
hung process to which I'd previously attached did not).  This process,
too, was hung in WaitForMultipleObjects inside cygwin1.dll.  This
time, when I exited from the debugger, my build restarted.

We're not sure what the best way to proceed is in terms of debugging
this.  We're going to try running builds over and over under strace
until the problem recurs, but this approach is problematic because (a)
strace produces a huge amount of output, (b) it's entirely possible
that strace will change the timing of the build enough that the
problem won't happen, and (c) strace slows down the build, which means
it'll take longer for us to get the problem to happen under smake (our
builds take an hour on our fastest machine, so we can't churn them out
particularly quickly when debugging something like this).

Another approach I'm considering is to compile both cygwin1.dll and
make with debugging symbols and install them without stripping.  If I
do that, and then Make hangs and I attach to it with gdb, will I
be able to get a useful backtrace showing which call to
WaitForMultipleObjects is hanging and what the various variables
related to that call are?

Please offer any advice you can about how to proceed at debugging this
so that we can track down the problem and fix it.

Thanks,

  Jonathan Kamens

--
Want to unsubscribe from this list?
Check out: http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]