This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: "C" character set (again)


2010/1/7 Dave Korn:
> ÂYeah. ÂI'm for WJM here. ÂTakeup of i18n is years behind where it should be.
> ÂIf it's ever going to be anything but a joke, we're going to have to switch
> it on always everywhere by default and go through the pain sooner or later.
> Same goes for all the big linux distros; some of them are just starting to
> dabble their toes in the water, they're running up against similar problems.
> Might be kind of nice for Cygwin to be leading the pack amongst *[iu]x distros
> for a change instead of lagging some way behind!

Right, one more shot at this.

There's an important distinction here between the C locale and the
defaut locale. The C locale is what you get if you don't call
setlocale at all, whereas the default locale is what you get if you
call setlocale(LC_FOO, "") and the relevant environment variables are
all unset or empty.

The default locale uses UTF-8, and I most certainly agree that this
should stay as is. The charset of the filesystem and the console are
both controlled by the default locale (unless overridden in the
environment). They are independent of the C locale's charset or
whether an application calls setlocale.

No, this is about the C locale only. Lots of people and programs make
assumptions about the C locale which may not be valid according to
POSIX, but which nevertheless hold true for Linux and most (if not
all) other Unices, including Cygwin 1.5. The most important assumption
is that the C locale is 8-bit clean.

Changing the C locale's charset back to ASCII/ISO-8859-1 only affects
situations where such assumptions are made. That might be a program
that doesn't call setlocale yet that calls character functions with
non-portable characters. Or it might be a user setting LC_ALL to C in
the expectation that it bypasses all character set stuff. Or it's
xterm (and the rest of X), which uses ISO-8859-1 when the C locale is
selected, independent of what Cygwin thinks, resulting in a clash
between terminal and apps running in it.

Again, the filesystem, console, and locale-aware applications are all
unaffected by changing the C locale's charset. Same for
non-locale-aware applications that don't go beyond POSIX. UTF-8
remains the default. But Cygwin becomes more Linux-compatible and we
save ourselves unnecessary complaints.

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]