This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: cvs cygwin1.dll


On Sun, 22 Sep 2002 12:40:54 -0400, Christopher Faylor <cgf@redhat.com>
wrote:

>>>On Fri, Sep 20, 2002 at 11:26:42AM +0000, Guy Harrison wrote:
>>>>On Wed, 18 Sep 2002 15:35:53 -0400, Christopher Faylor <cgf@redhat.com>
>>>>wrote:
>>
>>Shame us non-developers can't get it "readonly".
>>http://cygwin.com/ml/cygwin-developers/2002-09/msg00071.html
>
>Yeah.  Life's a bitch, isn't it?

Yep!

>Anyway, you are looking at the wrong message.
>
>Remember this?

Yes, sorry. I neglected to acknowledge your comment. Too many dratted
interruptions around here.

>On Fri, Sep 20, 2002 at 11:56:57AM -0400, Christopher Faylor wrote:
>>I suspect that this is actualy due to a deadlock in the code init.cc
>>which was recently discussed in cygwin-developers.
>
>If you look at my message which immediately follows the one that you mention
>which actually *mentions init.cc*, you will see the cause of at least one
>deadlock-on-exit in cygwin.
>
>Robert Collins has vowed to fix this problem this weekend.  Until then,
>however, I have commented out the code in question.

Okidoki.

>>>I don't think it has anything to do with suspended threads.  You can
>>>certainly verify this by adding code to kill the threads specifically,
>>>though, and see what happens.
>>
>>I did. I declared threads[1]. All the work gets shoved onto
>>cygthread::simplestub which neither suspends nor stays resident.
>
>Not the same thing at all.

I don't follow the implication. Had the fault been in info->func() then
::simplestub would have done same. It didn't. Explicitly terminating the
thread would bypass the dll detach/race thereby imposing a much greater
impact on code behaviour.

>>Hung process:
>>
>>Name---------Pid-Pri-Thd--Hnd----Mem-----User-Time---Kernel-Time---Elapsed-Time
>>sh-----------344---4---1---67---1832---0:00:00.020---0:00:00.080----0:02:29.935
>>----------------------VM------WS---WS-Pk----Priv---Faults-NonP-Page-PageFile
>>------------------351732----1832----1964----1476------492----3---21-----1476
>>-Tid-Pri----Cswtch------------State-----User-Time---Kernel-Time---Elapsed-Time
>>-548---4---------1---Wait:Suspended---0:00:00.000---0:00:00.000----0:02:29.825
>>
>>Relevent log:
>>
>>Quick Key:
>><GetCurrentProcessId/GetCurrentThreadId> 90 GetCommandLine() chars
>>[n/32] =threads[n] of NTHREADS=32
>>mti    =main_thread_id
>>nam    =ignore fixed on "mti" here
>>sdc    =SD_count (member added to cygthread class) suspend count
>>av     =threads[].avail
>>id     =threads[].id
>>h      =threads[].h
>>sus    =another suspend count
>>gle    =GetLastError() for failed "sus"
>>
>><344/509> cli(90):J:\cygwin\bin\sh.exe
>>pid=344 tid=509[0/32]{mti:509}: nam=[main] sdc=-999 av=877 id=0 h=296
>>sus=2 gle=0 
>>pid=344 tid=509[1/32]{mti:509}: nam=[main] sdc=-999 av=212 id=0 h=300
>>sus=2 gle=0 
>>pid=344 tid=509[2/32]{mti:509}: nam=[main] sdc=-999 av=894 id=0 h=304
>>sus=2 gle=0 
>>pid=344 tid=509[3/32]{mti:509}: nam=[main] sdc=-999 av=482 id=0 h=308
>>sus=2 gle=0 
>>
>>The ::SuspendThread() and ::ResumeThread() calls in cygthread.cc assign
>>their result directly to SD_count. I set it explicity to silly negative
>>values at these points:
>>
>>-999 in cygthread::runner() after their ::CreateThread()
>>-99 in cygthread::stub just prior to init_exceptions()
>>-2 cygthread::exit_thread ::SetEvent()
>>-9999 cygthread::stub ::ExitThread()
>>
>>Nothing else touches 'SD_count'. The above output is generated by a
>>function 'SD_DumpLiving()' inserted immediately prior to ::ExitProcess()
>>within _pinfo::exit().
>>
>>Our hung process is definately suspended.
>
>Not necessarily.  I see nothing in the above which would disprove the
>theory that this is the problem which I raised in cygwin-developers.  Of
>course, I am not 100% sure that I understand the above data.

No worries. One big multi meg file or [ahem] 16,000+ tiny ones. Didn't
think I ought to post that. The main point is (non-static)
cygthread::SD_count is explicitly initialised by me. I've often seen the
later members of the threads[] array zero.

I've looked into that aspect a bit more. Other perfectly functional
processes are having this occur. For instance, the "sig" thread isn't
always getting allocated into threads[NTHREADS-1] but earlier up and
threads["sig"+1]..threads[NTHREADS-1] are all zero. AFAIKT this isn't a
bug, the current cygthread::new() code looks able to cope. I mention it
on the off-chance that being able to "new cygthread" prior to all
threads[] being "up" is contrary to design (I set SD_count non-zero in
cygthread::init).

>However,
>I'm not going to devote too much time to studying it since there is an
>obvious problem in the cygwin DLL now and you haven't, AFAICT, addressed
>that.  This makes your analysis suspect until you generate a version of
>the DLL without the already known problem.

If you don't think threads[1] served any useful purpose then so be it.

>However, if you want to provide an actual analysis of how the thread
>could be in a suspended state with someone waiting for it, that would be
>welcome.

If I knew unix I probably could - it's preventing me understanding
crucial aspects of the cygwin dll. Nevertheless, methinks we're both
correct. I've been thinking along these lines...

wait_sig() is sat upon its WaitForMultipleObjects(catchem) and an
unrelated cygthread::stub "X" is sat blocked. Suddenly that
WaitForMultipleObjects() is satisfied so whatever "X" was blocked on
quits, thereby signaling "X". "X" wakes up. Meantime the main thread has
progressed towards its goal of quitting. If main thread gets to that
goal before "X" comes out of its info->func() then "X" could attain main
thread status then promptly celebrates by suspending itself.

I can't write this in cygthread::stub directly...

if (!process_definately_quitting)
	::SuspendThread(info->h)
;

...because I can't find a "process_definately_quitting" so (a,b,c) below
are merely unsatisfactory attempts to do same in a roundabout fashion.

>So far, everything you've provided points to the fact that the
>process in question is stuck in a deadlock state during ExitProcess, which
>sort of confirms my theory.  That's why you can't debug it.

I hate to give up but without more intimate knowledge of unix I'm
flogging a dead horse. I'm happy to be guinea-pig for testing this.

I'll conclude by mentioning the areas where, if I'd been less rigorous
testing, I'd have announced "hoorah fixed".

(a)
We've done threads[1] but that's the only one that works 100%.


(b)
void __stdcall
sigproc_terminate (void)
{
//~~~
hwait_sig->func = 0;
//~~~
  hwait_sig = NULL;

  if (GetCurrentThreadId () == sigtid)

...works indefinately provided machine isn't very badly stressed.


(c)
DWORD WINAPI
cygthread::stub (VOID *arg)
{
  DECLARE_TLS_STORAGE;
  exception_list except_entry;

  /* Initialize this thread's ability to respond to things like
     SIGSEGV or SIGFPE. */
  init_exceptions (&except_entry);

  cygthread *info = (cygthread *) arg;
  info->ev = CreateEvent (&sec_none_nih, TRUE, FALSE, NULL);
  while (1)
    {
      if (!info->func)
	ExitThread (0);

      /* Cygwin threads should not call ExitThread directly */
      info->func (info->arg == cygself ? info : info->arg);
      /* ...so the above should always return */

#ifdef DEBUGGING
      info->func = NULL;	// catch erroneous activation
#endif
    {//~~~
     info->avail = 0;
     info->h = ::CreateThread(
        &sec_none_nih,
        0,
        cygthread::stub,
        info,
        CREATE_SUSPENDED,
        &info->id
        )
     ;
    }//~~~

      SetEvent (info->ev);
      info->__name = NULL;
    ::ExitThread(0);//~~~
//~~~      SuspendThread (info->h);
    }
}

...fine, until I invoked 'ssh' whose behaviour duplicated
while (keypress != CTRL-C)
  idle_with_all_threads_running()
;
Oh dear, need unix programmer so give up.


Couple of things that in no way locate the bug (I was initially looking
for a plain race) and only serve to make the problem *much* harder to
find...

(1)
cygthread::runner() fires non-suspended threads and each ::stub
initially suspends itself.

(2)
cygthread::new() loops on a ::Sleep() until both ends: threads[0] and
threads[NTHREADS-1] have handles.

Yeah (2) is asking for a hang (ie ::runner has a ::CreateThread()
failure) - might it be worth a "critical_number_of_threads" counter in
::runner for a graceful error exit. Good old resource starved Win9x
could be falling foul of this well before NT?


I'm probably the last to discover it, www.sysinternals.com has
'pssuspend' in their PsTools download. "pssuspend -r" will restart those
pesky single thread hangs. Saves a reboot to replace cygwin1.dll!


-- 
swamp-dog@ntlworld.com

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]