This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: How to make child of failed fork exit cleanly? (solved)
On 03/05/2011 2:49 PM, Ryan Johnson wrote:
Very strangely, when every child dies (including those automatically
respawned by Windows), the parent also seg faults when calling
gcc_deregister_frame on the same dll! If even one child survives (even
if many had previously crashed), then no error arises. Even more
strangely, if I break into a first child which has a good layout (no
previous failures, current fork will succeed) and delay it long enough
that the parent times out, the parent still suffers the seg fault!
What shared state is there that could cause this to happen?
Disabling dll finalization completely when in_forkee==1 gets rid of
the above problem, but occasionally I'll get a new error in the child:
CloseHandle(pinfo_shared_handle<0x610031BF>) failed void
pinfo::release():1040, Win32 error 6
110356 [main] fork 10556 fork: child -1 - died waiting for longjmp
before initialization, retry 0, exit code 0x100, errno 11
Sometimes, when the child dies as above, the parent will again seg
fault while deregistering a dll (but not always).
Eureka!
Turns out that the pinfo class constructor was empty, leaving its fields
uninitialized. In particular, pinfo::destroy and pinfo::procinfo were
highly likely to both contain non-zero garbage values. Later, a call to
pinfo::init() is supposed to initialize both. However, as the fork error
says, the child "died... before initialization," causing the parent to
jump to cleanup and run pinfo::~pinfo ()... which tries to release()
garbage. That's why the bug doesn't arise if even one child makes it
past this point -- pinfo::init would then be called and the destructor
would do the right thing.
The problem would have bit folks off and on before, but my added
fail-fast code path makes forks which were going to fail usually do so
"before initialization."
The fix is easy, at least (pinfo.h):
- pinfo () {}
+ pinfo () : procinfo(NULL), destroy(false) {}
At this point, the only thing left -- besides cleaning up my fork
handling code changes to make a patch -- is to verify that it's ok to
not run any dll finalizers in the child if the fork fails. Empirically
it seems to do the right thing (child processes no longer fault), but I
don't know enough about the code base to say with confidence that no
corner cases exist.
Ryan