This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] winsup/cygwin: Protect fork() against dll- and exe-updates.


Hi Michael,

On Jul 28 18:40, Michael Haubenwallner wrote:
> Hi Corinna,
> 
> On 07/27/2015 09:50 AM, Corinna Vinschen wrote:
> > On Jul 24 17:43, Michael Haubenwallner wrote:
> >> When starting to port Gentoo Prefix to Cygwin, the first real problem
> >> discovered is that fork() does use the original executable's location
> 
> > Unfortunately there's some red tape to get over with, first.  We need a
> > copyright assignment from you before we can go much further.
> 
> Copyright assignment is in progress.

Cool.

> > - /proc is already available as virtual filesystem as on Linux.
> >   [blah]
> >   Also, using the Windows PID as dir name seems a bit weird, given that
> >   the virtual /proc obviously uses the Cygwin PID.  This sounds like a
> >   source for confusion.
> 
> There's no particular reason for /proc/ actually - just came to my mind
> first. I've also seen /run/ on recent Linux boxes...

Yeah, /run might be a good option, albeit there may be installations
out there already using this path for their own dubious purposes.
Reusing a path existing in a cygwin installation by default would
avoid collisions.  /var/run perhaps.

> This is the functional reason to keep these hardlinks optional:
> I don't want Cygwin itself to require NTFS, but Gentoo Prefix only - which
> IMHO is a corner use-case for Cygwin, but requires an updates-protected fork.

Some people use Cygwin from a USB stick.

> However, I've been using Interix before - and Cygwin feels faster even
> with hardlinks enabled.

FTR: Me too, and I have not the faintest idea why, given that Interix
can fork natively while Cygwin has to go to great lengths to emulate it.

> And I do prefer "slow" over "broken" setups.

...with a catch...

> >   Did you run tests to find out the cost of this additional overhead?
> 
> On a 4-core Windows Server 2012r2 VM, three times building cygwin takes:
> $ ( time for x in {1..3}; do
>            rm -rf cygwin-2.2.0-0.1.x86_64 ;
>            cygport -8 cygwin.cygport prep compile install ;
>          done ) > outfile 2>&1
> 
> vanilla cygwin-2.2.0-0.1:
>   real    31m35.530s
>   user    30m11.245s
>   sys     22m20.365s
> 
> patched cygwin-2.2.0-0.1:
>   real    52m26.049s
>   user    30m39.820s
>   sys     26m41.637s

So roughly 66% slowdown.  It's quite a lot...

> >> *) dll-redirection for LoadLibrary using "app.exe.local" file does operate on
> >>    the dll's basename only, breaking perl's Hash::Util and List::Util at least.
> >>    So creating hardlinks for dynamically loaded dlls is disabled for now.
> >>    Eventually, manifests and/or app.exe.config could help here, but I'm still
> >>    failing to really grok them...
> > 
> > Hmm.  The DLLs are loaded dynamically anyway, so they will be loaded
> > dynamically in the child as well in dll_list::load_after_fork_impl.  Why
> > not simply hardlinking them using a unique filename (e.g. using the
> > inode number), storing the unique number or name in the dll struct and
> > then calling LoadLibrary on this name?
> 
> This might be necessary in the initial dlopen() already: I've tried hardlinks
> for loaded dlls mangling the full path into the hardlink's filename, but
> encountered different load addresses in the child - most likely due to the
> now different dll's filename.

Huh?  That shouldn't happen.  The address is determined by the file's
PE/COFF header, not by the name.  However, did you reuse the name field
in the dll structure or did you create another name field for the
mangled name?  In the first case there may be some checks in dll_init.cc
not working.  That's why I said to use an extra field for the mangled
name.

> > - What if a EXE/DLL is replace more than once during the lifetime of
> >   a process?
> 
> This wouldn't make any difference: The hardlinks are created upon the first
> use of some exe/dll in parent (even if that process won't ever use fork),

So, here's a question.  What if the directory is only created on
first fork?  Given that only few processes actually call fork, shouldn't
that speed up typical usage profiles a lot?  Even with `configure' or
`make', at least half of the involved processes don't fork.

> and the forked child gets the parent's first-use versions. Still there is
> a short timeframe between process startup and hardlink creation, but that
> is not a real problem (yet).

This may be even academical, but something to keep in mind.

> > - What about reducing the overhead by implementing some kind of generic
> >   exe/dll cache used by all processes?  It would reduce the requirement
> >   to cleanup, reduce the footprint of the cache, speed up subsequent
> >   forks.
> 
> I'm all for it, but I've no idea of currently available cross-process
> mechanisms in cygwin/windows that could help here ...

Yeah, scratching my head myself, but we might want to discuss it
nevertheless.  Maybe sombody has a good idea?

> But: From the LoadLibrary docs I've got the impression that even an exe
> can be dynamically loaded. Iff it would be possible to branch to an exe's
> main() - what if there is some pool of cygwin-starter processes, that do
> nothing but wait for some cygwin-process-start event, to dynamically load
> the exe in question and branch to it's main()? And if that starter does
> create the new exe's (and its linked dll's) optional hardlinks before
> dynamically loading them, even the timeframe mentioned above would be gone.

rhis sounds hellishly complicated.  Apart from the code complexity,
who's roing to set up the process pool?  Who's managing the pool?

> >   the cache(s) can take as much memory.  This really asks for some
> >   cleanup mechanism.
> 
> Of course unused hardlinks have to be removed at some time. However,
> without performing updates (of dlls in use), only hardlinks would be
> cached, which shouldn't consume a lot of diskspace.

Performing updates was the idea, but I see your point.

> > - The heretical question of course:  Is the underlying problem really
> >   worth the additional overhead?  The patch is pretty intrusive.
> 
> The underlying problem is:
> Gentoo Prefix breaks on Cygwin with current fork implementation. OTOH,
> both - enabling the hardlink creation plus the performance overhead - is
> acceptable to me (and for now) to allow for Gentoo Prefix on Cygwin.
> The alternative - to not have some POSIX-like buildsystem on Windows (since
> Interix is gone) for our otherwise portable application - is... an issue.

Here's the catch.  What you're doing is a deviation from how Cygwin
is trying to operate.  If at all possible, Cygwin applications should
run in any environment.  Cygwin is just some "operating system", and
despite striving for POSIX compatibility, we can't manage it under all
circumstances.

This in turn usually requires porting.  Any application running under
multiple OSes has code to make sure differences in the various OSes
(and there are lots of them, even between the supposedly POSIX compatible
ones) are handled gracefully.

So you'd usually port gentoo prefix to Cygwin, not vice versa.  And
to close the loop, your change to Cygwin requires to change the users'
environment, plus a noticable slowdown of the entire installation, just
to be able to run your application.

I'd expect that gentoo prefix, if there *is* an interest to port it to
Cygwin, would try to run under Cygwin as is.  And it should preferredly
run under Cygwin in any environment, not only in the environment adding
the exe/dll hardlinks.

Do you understand what bugs me?

> >   Is there a simpler way to achieve the same or, at least, a similar
> >   result?
> 
> Hmm - most likely there is a faster way than the current patch,
> but I doubt there is a simpler way...

Your patch is rather intrusive.  It's not "simple" as I understand it.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgp4B6AwmgxV2.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]