This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fw: File name too long problem -- maybe fix coming?


On Dec 29 18:15, Dave Korn wrote:
> On 19 December 2007 10:36, Corinna Vinschen wrote:
> > Maybe that goes without saying, after all this is an Open Source
> > project, but I could really need some help here.  The progress is
> > extremly slow.  There's just too much code to keep an eye upon.
> > When I started I imagined we could release 1.7.0 in 2007 but as long as
> > I have to do this conversion to unicode paths alone, it will take a lot
> > more months.  2008?  Well, maybe...
> 
>   Is there an overall TODO list?  Any notes/designs/specs/back-of-an-envelope
> sketches?  Are you following an overall strategy to do the conversion?

It would be too much to say "yes" here.  The whole plan is a (partly
diffuse) design idea in my mind.  I will try to outline it.

For a start a few well-known facts, in order of appearance:

I.  Long path names and unicode characters only work fine with the
    Win32 fooW functions or by using NT native functions.

II. Long paths in Win32 speak start with \\?\ or \\?\UNC
    Long paths in NT speak start with \??\ or \??\UNC

III.Long path names using the above syntax are obviously always
    absolute paths.  Since all other paths are restricted to
    MAX_PATH == 260 chars in Win32, any relative path is restricted
    to 260 chars as well.

IV. Talking about relative paths, NT native functions have the
    additional advantage to allow directory handle relative paths.
    I don't know if the 260 char restriction for relative Win32 paths
    is also true for these native NT directory handle relative paths,
    but I doubt it.  That's something I didn't test so far, though.

My idea what should be done in Cygwin goes roughly like this:

1.  POSIX paths should be handled in the current codepage as before.
    Potentially this is a multibyte codepage like UTF-8.  Make sure
    that we handle multibyte paths correctly.
    
    TBD: Always use UTF-8?  What about existing installations with
    symlinks/mount points using arbitrary codepages?

2.  Windows paths should always be handled as wide char paths.  The
    most natural form is the OBJECT_ATTRIBUTES structure, because
    only the OBJECT_ATTRIBUTES structure allows directory relative
    paths to implement the openat family of functions correctly.

3.  The path_conv class would ideally do the Windows path handling
    in OBJECT_ATTRIBUTES structures, using native NT functions as much
    as possible.  Calling functions should request the path as
    OBJECT_ATTRIBUTES using the path_conv::get_object_attr method
    if possible.  If it's necessary to call Win32 functions, there
    is a path_conv::get_wide_win32_path method.

4.  Path-related case insensitive comparisons should always be done
    in wide char to avoid language problems.  The only exception
    should be comparisons against fixed strings with only ASCII
    chars in it.

5.  Right now, mount points are stored with the POSIX path as key name,
    using the single- or multibyte charset of the current codepage.
    This restricts the POSIX path length of mount points to 255 chars.
    The native path is stored as value which doesn't have any length
    restriction problem.

    TBD: Keep as is, thus sticking to mount points <= 255 chars,
    or inventing a mounts v3?  Stop using the registry and use
    files like /etc/fstab, ~/.fstab?

6.  As for the TODO list:  It's basically looking through the code
    and convert what can be converted.  I have no order for the
    tasks.  Which leads to the last point.
    
7.  As Chris already mentioned, part of the problem is that path_conv
    is not yet converted.  It's not an easy job and I believe there's
    still a lot to discuss, especially about the external interfaces to
    path_conv and the handling of relative paths.  One part of the
    problem is that path_conv calls methods in various fhandlers, which
    in turn have to be converted.  It's quite tricky to convert one
    without the other, and I admit that I trashed many lines of new code
    last year when I found yet another chicken-egg situation.  The
    interlocking of the path handling functions is not always easy to
    unlock, so I'm (and Chris is certainly as well) open to new ideas
    and especially patch snippets.  There's much code left which is
    as yet untouched.

Anything I left out?  Probably.  Just ask.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]