This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Console codepage setting via chcp?


On Sep 25 12:13, Andy Koppe wrote:
> 2009/9/25 Corinna Vinschen:
> >> The important thing is that file names, user names, and env variables
> >> are represented by the same byte sequences throughout the life of a
> >> program. Determining their translation at program startup ensures
> >> that.
> >
> > But that's not the only important point.  If two applications having
> > different locales talk to each other, they have different ideas of the
> > filenames.  Even their CWD could look different, even though it's
> > actually the same.
> 
> True. However, the environment locale setting is constant throughout a
> process tree unless the user explicitly changes it.
> 
> This is different from the issue with filenames depending on whether
> setlocale is called, because that's outside the user's control.

The problem is that processes don't just open files, they inherit files
through the exec call.  So, if the file got opened in the parent process
(a shell, for instance), then the parent process forks, changes the
locale var, and execs the child process with a different locale (that's
what happens when you start nano as in your example given a few mails
back), then the filenames stored in the fhandlers in the child
process are using the wrong charset, and subsequent calls using that
filename are wrong.

This *might* be a pathological case, given that Cygwin uses the native
NT name of the file which is stored in the fhandler as UTF-16 string.
However, I'm not sure that this is really always the case, so we would
have to expect breakage.

> > The problem is that some of these strings are fetch only once, when
> > the first Cygwin process starts.  The user name, for instance, which is
> > inherited by child processes.  The CWD is also typically only fetched
> > once, when the first process starts up or when a process calls chdir.
> > Then it's inherited by child processes as is.
> 
> Would it be possible to fetch them at every process startup instead?

Performance, there you go...

> > it seems to me that the best approach is to stick to one
> > single representation of system object names throughout the lifetime
> > of a process tree.
> 
> This would make it futile to have the locale and charset settings in
> mintty 0.5's options dialog, which provide a simple user interface to
> Cygwin's locale system (and which took quite a bit of work to get
> right, although of course that's irrelevant).
> 
> Since mintty is itself a Cygwin process, the filename encoding would
> already be fixed before mintty gets a say about it, hence people would
> either need to invoke it through a batch script, or set the locale
> variables somewhere in the system properties, neither of which is
> intuitive.

mintty could be a native Win32 wrapper, calling mintty-cygwin via
  SetEnvironmentVariable ("LC_CTYPE", "blub");
  CreateProcess ();

Just an idea.

> Furthermore, this approach would remove other uses of program-specific
> locale settings, e.g. specifying a charset for file archiving
> operations as in the tar example I'd given, or dealing with non-UTF-8
> programs.

I thought we already were beyond this point.  Applications which
misbehave if the C locale is using the UTF-8 charset will have a hard
time anyway.  The example you gave is a nice feature, but it's not
actually necessary to support it now.

In the first place an application should treat a filename as a byte
stream, isn't it?  Only for printing purposes, it would be more or less
safe to convert the filename to wchar_t or another multibyte charset.
Other than that, the application should take the filename as it is.
And as long as Cygwin always understands the filename, there should be
no harm.

> Please reconsider. I'm sorry for raising these issues at such a late
> stage, and I wish I'd understood locales sooner.

I'm reconsidering every 10 minutes for the last 3 days.  However, the
longer I think about it, the more I think that using UTF-8 exclusively
for system object names is the right thing to do in the long run.  If
there's really a sense in switching the charset for those objects, it
shouldn't change within a session.  At least for 1.7.1!  We can always
macerate this concept later, but right now, it looks like making the
changes you're asking for could easily take a couple more months.  We
should not turn the code entirely upside down at this point in time, but
get 1.7.1 out first, and I would like to have a concept in 1.7.1 which
is at least safe enough to let me sleep at night.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]