This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Console codepage setting via chcp?


2009/9/24 Corinna Vinschen:
>> > Last but not least, the alternative would be to store the Console
>> > character set as an environment variable, just as the
>> > "CYGWIN=codepage:[ansi:oem]" setting back in 1.5 times. ÂSigh.
>>
>> Hmm. How about simply using the standard locale variables again?
>> Except unlike now the console charset would be set at startup rather
>> than by setlocale.
>
> Yeah., it's just... what bugs me with this approach is
>
> - If you want to switch to a certain charset, you must do this before
> Âthe first Cygwin process in the console starts (via batch file or
> Âsystem-wide setting of LC_ALL/LC_CTYPE/LANG.

I'd actually assumed that LC_ALL/LC_CTYPE/LANG would be read at
program startup rather than Cygwin DLL startup. Having pondered this a
bit more now, I'm quite convinced now that this is the right approach
to take, for the following reasons:

- It addresses the issue with the console encoding depending on
whether or not a program happens to call setlocale(LC_CTYPE, "").
- Yet it still allows the charset to be changed, by invoking a program
with a changed variable setting.
- It fulfils programs' assumption that the terminal charset is the
same as what's set in their initial environment.
- Users (and documentation) only have to worry about one setting.
- No non-standard tools are needed.

Additionally, it occurred to me that the issue you described regarding
ssh output also applies to filenames. For example, for a file archiver
like tar, the charset doesn't matter, because it can assume that
filenames are simply sequences of bytes. Therefore, it may or may not
call setlocale(LC_CTYPE, "").

With Cygwin's filename charset currently being set by setlocale,
however, this does make a big difference: if tar does call setlocale,
the filenames will be translated according to the user's preferences,
but if not, it'll use the C locale.

Therefore, I think the same approach as for the console should be
applied to filenames: the charset is set according the environment
variable settings at program startup, and setlocale calls do not
change it.

Advantages over the current approach:
- setlocale would have no effects beyond what's expected on Linux.
- Filenames do not change across setlocale calls.
- It adheres to Linux programs' assumption that filenames are encoded
in the charset set in the initial environment.
- It reduces the importance of the C locale.


> - If you want to switch the console to another charset you can't do that
> Âon the fly in Cygwin.

You can't in xterm or rxvt either, at least not without the likes of luit.

You can in mintty, and also in gnome-terminal and KDE Konsole, but to
be honest it's a rather questionable feature, because applications
don't get to know about such an on-the-fly character set change, hence
things won't work correctly.

Reading the console charset from the environment at program startup,
on the other hand, does allow changing charset in a consistent manner.

This would also provide an easy solution for applications that aren't
yet UTF-8-ready, independent of whether they invoke setlocale. For
example:

$ LC_CTYPE=en.CP1252 nano

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]