This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Instability with signals and threads


On Nov 20 11:00, Corinna Vinschen wrote:
> Hi Mikulas,
> 
> On Nov 19 17:42, Mikulas Patocka wrote:
> > Hi
> > 
> > I have a program that sets a repetitive timer with setitimer and spawns 
> > several threads.
> > 
> > The program is very unstable on cygwin, it locks up in few minutes.
> > 
> > The bug manifests itself in the following way: the signal thread calls 
> > cygheap->find_tls to find a thread to deliver the signal to. find_tls 
> > generates an exception when scanning the threadlist, jumps to the __except 
> > block and calls threadlist[idx]->remove(INFINITE).
> > 
> > The method threadlist[idx]->remove is called with invalid "this" pointer 
> > (sometimes it is zero, sometimes it points to unmapped memory), generates 
> > another exception on "initialized = 0" line and becomes stuck on this 
> > assignment.
> 
> Now that you mention it, it makes sense.  The exception gets triggered
> by accessing an invalid member of threadlist.  Using the very same
> member in another method call looks.... borderline, to say the least.
> 
> > I found out that when I modify the remove_tls method so that it always 
> > acquires the lock and removes the thread from the threadlist (change 
> > "tls_sentry here(wait)" to "tls_sentry here(INFINITE)"), the bug goes away 
> > and the multithreaded program is stable.
> 
> [Noted your augmenting comment preceeding the testcase in your other mail]
> 
> > Alternativelly - the crash can be fixed if we change "_my_tls.remove (0)" 
> > to "_my_tls.remove (INFINITE)" in thread_wrapper (though, there is another 
> > _my_tls.remove (0) call in dll_entry in winsup/cygwin/init.cc and it could 
> > trigger the same crash)
> 
> I don't think so.  In dll_init, the call is done inside a DLL_THREAD_DETACH
> for this very thread, so &_my_tls is still a valid pointer.

Never mind that.  I can fix your testcase by calling _my_tls.remove with
INFINITE as parameter in both places.  If I drop one of them, your
testcase will invariable fail at one point.  With both INFINITE params
in place, your testcase is now running half an hour without problems.

Thinking about it, the fact that _cygtls::remove allows to apply
a non-INFINITE wait is rather strange, isn't it?  Calling remove_tls
with a 0 wait, it allows to return the function silently, without
actually having removed the thread from the list.  This is bound to
go downhill at one point and looks like a kludge to me to circumvent
some potential hang in another situation...

I'm not exactly sure if that works as intended.  I will apply this patch
and create a new Cygwin snapshot on https://cygwin.com/snapshots/ in a
couple of minutes.  I'd appreciate if you and others would give it an
exhaustive test.  New spurious hangs or SEGVs in other situations which
so far worked fine would be good indicators for another problem in the
code.

Other than that, there's certainly some room for improvement.  Calling
threadlist[idx]->remove from the find_tls exception handler looks
extremly hairy to me.  I wonder if that should be called at all at this
point, or if there shouldn't be better some "simplified" removal
operation which doesn't require the _cygtls pointer.  If the thread
doesn't exist anymore, so does its _cygtls area.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgpiTyoCs3SPF.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]