This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: More about charsets


On Mar 27 18:24, Corinna Vinschen wrote:
> On Mar 27 16:11, Andy Koppe wrote:
> > Corinna Vinschen:
> > > while looking into the GB18030 issue once again, I found that we still
> > > may have two holes which might be important to support.
> > >
> > > - GB2312 aka EUC-CN
> > >
> > > ?We already support GBK, codepage 936. ?GB2312/EUC-CN is a subset
> > > ?of GBK and apparently GBK is often used while still labeled as
> > > ?GB2312. ?See the discussion here:
> > > ?http://www.mail-archive.com/unicode@unicode.org/msg03516.html
> > >
> > > ?So the question is, should we just allow GB2312 and EUC-CN as
> > > ?codeset names, but use the GBK conversion functions for them?
> > 
> > Might as well. As you saw, mintty already does that. Thomas Wolff's
> > mined goes even further and handles both GB2312 and GBK with its
> > GB18030 codec, because GBK is a subset of GB18030.
> 
> I think I'll opt for GBK for now, given that GB18030 doesn't exist yet.

I also intend to make GB2312 the default name, rather than GBK since
that's the default for these languages in Linux.

Btw., apart from EUC-TW, what's missing as well is BIG5-HKSCS.  I read
http://en.wikipedia.org/wiki/HKSCS and the Windows specific section,
but I'm still puzzled how this is supposed to work.  Does Vista's
codepage 950 contain the HKSCS elements or not?!?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]