This is the mail archive of the cygwin-patches mailing list for the Cygwin project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, I stumbled across an issues with locale initialization when the "C" locale is specified in the environment. $ cat test.c #include <stdlib.h> #include <stdio.h> #include <locale.h> #include <langinfo.h> int main(void) { char cs[8]; puts(nl_langinfo(CODESET)); printf("%i\n", wctomb(cs, 0x80)); return 0; } The program doesn't call setlocale, so it should be using the "C" locale with its ASCII charset, which means the wctomb() call with a codepoint outside the ASCII range should fail. And that's exactly what happens as long as the locale set in the environment is something other than "C", e.g.: $ LC_ALL=C.UTF-8 ./test ANSI_X3.4-1968 -1 $ LC_ALL=en_GB.ISO-8859-15 ./test ANSI_X3.4-1968 -1 However, if the environment locale is "C", the charset is still reported as ASCII (aka ANSI_X3.4-1968), but the wctomb call suddenly succeeds: $ LC_ALL=C ./a ANSI_X3.4-1968 2 That's due to a combination of three things: Cygwin newlib starts with the __wctomb and __mbtowc function pointers set to the UTF-8 variants (for conversions during early Cygwin initialization), yet the LC_CTYPE locale is set to "C", and setlocale() does nothing if the requested locale is the same as the previous one. Hence, with the locale set to "C" in the environment, both the setlocale call from initial_setlocale(), which asks for the environment locale for filename conversion, and the setlocale() just before main() that sets the "C" locale, end up doing nothing. Thus the conversion functions remain set to the UTF-8 variants instead of being set to the ASCII ones as intended for the "C" locale. The attached small patch addresses this by starting with the LC_CTYPE locale set to "C.UTF-8" and lc_ctype_charset set accordingly too. This means that setting the "C" locale is recognised as a change and that the conversion function pointers are updated accordingly. It also has the happy side effect that the setlocale call from initial_setlocale() will be short-circuited if the default "C.UTF-8" locale has not been overridden in the environment. Additionally, I think it's time to drop the "temporarily" #if 0'd code for making UTF-8 the charset for the "C" locale. It's a newlib patch, but it's entirely Cygwin-specific, so it seemed more appropriate to send it here. * libc/locale/locale.c [__CYGWIN__] (current_categories, lc_ctype_charset): Start with the LC_CTYPE locale set to "C.UTF-8", to match initial __wctomb and __mbtowc settings. (lc_message_charset, loadlocale): Settle on ASCII as the "C" charset. Andy
Attachment:
lc_ctype.patch
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |