This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
LC_MESSAGES implementation
- From: Corinna Vinschen <corinna-cygwin at cygwin dot com>
- To: cygwin-developers at cygwin dot com
- Date: Mon, 8 Feb 2010 11:30:27 +0100
- Subject: LC_MESSAGES implementation
- Reply-to: cygwin-developers at cygwin dot com
Hi guys,
I have finally found a method to implement the locale-specific
LC_MESSAGES info.
Basically, I have two application which generate the yesexpr, noexpr,
yesstr and nostr strings from foreign local data. The first application
is finished and generates the data from GLIBC locale data. The second
application is almost finished and generates the data from the CLDR
project (http://cldr.unicode.org) locale data.
In both cases the data is generated as a header file containing a big
array like this:
truct lc_msg_t
{
const char *locale;
const wchar_t *yesexpr;
const wchar_t *noexpr;
const wchar_t *yesstr;
const wchar_t *nostr;
};
static struct lc_msg_t lc_msg[] =
{
{ "aa_DJ", L"\x005e\x005b\x006f\x004f\x0079\x0059\x005d\x002e\x002a", L"\x005e\x005b\x006d\x006e\x004d\x004e\x005d\x002e\x002a", L"", L"" },
[...]
};
The subsequent code called from newlib's loadlocale() function fetches
the locale data from this array using bsearch() with the key being the
locale, and converts it into the correct charset.
Here are two questions:
- First of all, I'm not sure if I should use the GLIBC or the CLDR data.
What speaks for GLIBC:
- The GLIBC data contains the more relaxed and simpler yesexpr and
noexpr strings.
- Quite often the CLDR entries are just placeholder using the default
C/POSIX strings. THis almost never happens in GLIB.
- The locale names match our locale names exactly, while CLDR uses the
RFC 4646 strings just like Windows. This simplifies generation of
the locale data while it requires conversion in CLDR.
What speaks for CLDR:
- The number of supported locales is bigger than in GLIBC, and they
match more of the locales supported by Windows.
- In GLIBC the (deprecated) yestrs and nostr strings are quite often
not available at all. This never happens in CLDR.
Which one would you prefer? Of course I could generate two arrays
and mix the data, but that's not easy to automate.
- Second, in my current implementation the data is stored within Cygwin,
as a big array of about 24K (in the GLIBC case, the CLDR case should
be comparable).
Since all of the other locale classes, LC_COLLATE, LC_CTYPE,
LC_MONETARY, LC_NUMERIC, and LC_TIME, are implemented internally,
mostly using data already available in Windows, I have a hard time
to implement a file-based solution just for the single LC_MESSAGES
case. It just doesn't seem right, and 24K isn't *that* big, is it?
So, here's the question:
- Do you think it's ok to keep the data internally and regenerate it
from time to time with a new Cygwin version when a new GLIBC version
or a new CLDR version has been released?
- Or, would you prefer a file-based solution using a precompiled
single file containing the data, which could be memory-mapped into
Cygwin when necessary?
- Or, would you prefer a file-based solution using single locale-specific
LC_MESSAGES files?
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat