This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

ipc, sockets and windows sp2


There seems to be odd problems with windows sp2 (and some sp1 with undetermined updates).

I work on windows version of drqueue, which is an opensource distributed rendering management software (for use with maya rendering for exemple), designed for unix, so it uses IPC ans sockets.

The port works well for the most of it, except for the server itself (the master program).
The unix version has no problem on all this, it works on linux, bsds, irix..


Please take a look in the main loop (main function), of this short file :
http://www.drqueue.org/svn/trunk/drqueue/master.c

basicly, the program do this :
-init config
-load saved database of jobs
-set signals handlers
-get shared memory (IPC shared memory and semaphores)
-fork a consitency checking task (it is not involved in the problem, i tested)
-bind a port (it's server!)
-then go the usual main loop which forks childs process to accept connections.


on windows sp2 (and some sp1 with updates), the master keep yelling a strange error :

*** MapViewOfFileEx (0xF10000), Win32 error 487. Terminating.

error 487 means "Attempt to access invalid address."


So the listening child process dies immediatly, which has for effects to write again and again this error, as new child process are launched when others are leaving to keep a minimum of MASTERNCHILDREN ready to listen process (to support high load from network, anyway, with MASTERNCHILDREN=1, it does the same).


After debugging, i saw that the child process hangs with this error (from cygwin1.dll), as soon as it forks.

I took a look in cygwin sources, and found that MapViewOfFileEx was used in shared memory and mmap stuff.
So i tested with shared memory code disabled, and the problems disappeared !


Moreover, i tested several combinaisons and found that putting get_socket function AFTER the fork corrected the problem !
get_socket function simply create a socket and bind the usual way, it is defined here if you want to take a look :
http://www.drqueue.org/svn/trunk/drqueue/communications.c


So in the current master.c, you'll see a short #ifdef __CYGWIN, to have get_socket called after the forks instead of before.

The problem is that every child process binds, which is not the correct way to manage the socket, the bind has to be done one time at beginning. So my windows version of master with this trick will accept one connection but will die (at least on sp2) with communication errors when some clients connect, even with only 2 clients.

After googling a bit, i found that there were some issues in sp2 with modified behaviour of undocumented windows system functions that caused some problems in some part of cygwin. I think this has something to do with this (i can't tell more on this).

So, there are bugs in cygwin dll when you use shared memory (attached!), forks and socket..
Moreover, the problem is the fork, if you fork BEFORE a bind OR AFTER socket closing, there is no problem, but if you fork SOMEWHERE in the socket process (bind, listen, accept, read on socket), you get this error.


To resume : I think that using a socket file descriptor that has been created in a parent process for any socket operation is just not possible if you use shared memory (under some circumstances), with those windows version.

Does someone know this problem? Does someone has a workaround to keep my master program running on lastest windows version, waiting a cygwin fix ?

Thanks,

Kraken



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]