This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Program randomly hangs after upgrading to cygwin 1.5.7


Hi!

#!/bin/sh
strace -o /tmp/dmake-trace-$$ --flush-period=10 dmake $*

Nice idea, I did something similar yesterday evening. I had to only keep the last strace output, because each file was in the order of 50MB to 500MB and I implemented a counter to get an idea for the number of started dmake processes. At the moment we are at 580, it is still running.

I caught one hanging process, but the strace is weird. First of all, in this case not dmake is hanging, but a tcsh process which is started by the dmake process. See the end of the following unsolicited strace log where it hangs. But the really weird thing are the ~500MB zeros in that strace log. See this <http://qa.openoffice.org/issues/show_bug.cgi?id=25037> Openoffice issue for the compressed complete strace until the freeze.

Here comes the end of the strace:

<snip>
76 4083280 [main] tcsh 1336 wait4: calling proc_subproc, pid -1, options 1
144 4083424 [main] tcsh 1336 proc_subproc: args: 4, 1628375144
77 4083501 [main] tcsh 1336 proc_subproc: wval->pid -1, wval->options 1
72 4083573 [main] tcsh 1336 checkstate: nchildren 0, nzombies 0
78 4083651 [main] tcsh 1336 checkstate: checking alive children
77 4083728 [main] tcsh 1336 checkstate: returning 0
71 4083799 [main] tcsh 1336 proc_subproc: waiting thread found no children
94 4083893 [main] tcsh 1336 proc_subproc: finished processing
terminated/stopped child
72 4083965 [main] tcsh 1336 proc_subproc: returning 1
69 4084034 [main] tcsh 1336 wait4: intpid -1, status 0x22694C, w->status 0,
options 1, res -1
82 4084116 [main] tcsh 1336 wait4: *** errno = 10
70 4084186 [main] tcsh 1336 set_signal_mask: oldmask 0x80002, newmask 0x2,
mask_bits 0x80000
75 4084261 [main] tcsh 1336 sig_send: returning 0x0 from sending signal -33
96 4084357 [main] tcsh 1336 handle_sigsuspend: oldmask 0x80002, newmask 0x2
42874012 46958369 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900212535 947170904 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900286982 1847457886 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900318304 -1547191106 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900220341 -646970765 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900171768 253201003 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900133829 1153334832 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900124349 2053459181 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900158918 -1341349197 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900151800 -441197397 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
358986345 -82211052 [win] tcsh 1336 wndproc 537 unknown 7 0
7273 -82203779 [win] tcsh 1336 wndproc 537 unknown 7 0
38318101 -43885678 [win] tcsh 1336 wndproc 537 unknown 7 0
502845672 458959994 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
126132228 585092222 [win] tcsh 1336 wndproc 537 unknown 7 0
2034 585094256 [win] tcsh 1336 wndproc 537 unknown 7 0
774006971 1359101227 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
847905325 -2087960744 [win] tcsh 1336 wndproc 537 unknown 7 0
52207351 -2035753393 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900501277 -1135252116 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900135003 -235117113 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900157738 665040625 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900145818 1565186443 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900164881 -1829615972 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900171339 -929444633 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900154244 -29290389 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900124088 870833699 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0
900134268 1770967967 [win] tcsh 1336 wndproc 30 WM_TIMECHANGE 0 0


*here are ~500MB zeros, really! Something is wrong*

900213069 142414857 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900286987 1042701844 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900317803 1943019647 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900220842 -1451726807 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900171780 -551555027 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900133317 348578290 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900124845 1248703135 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900158936 -2146105225 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
900151263 -1245953962 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
358987187 -886966775 [win] tcsh 1584 wndproc 537 unknown  7 0
 6934 -886959841 [win] tcsh 1584 wndproc 537 unknown  7 0
38318117 -848641724 [win] tcsh 1584 wndproc 537 unknown  7 0
502845669 -345796055 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0
126132235 -219663820 [win] tcsh 1584 wndproc 537 unknown  7 0
 2016 -219661804 [win] tcsh 1584 wndproc 537 unknown  7 0
774006983 554345179 [win] tcsh 1584 wndproc 30 WM_TIMECHANGE 0 0


No clue how to read this.


Also if you compile dmake with debugging (gcc -g) you might then be able to
attach with CYGWIN=notty gdb --pid=<dmake-pid>.

I still cannot attach to any of the processes involved.


Volker

--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]