This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: bug in mbrtowc?


  The bug is in O.P.'s code as &s is not being passed to mbrtowc.

  I'm on Ubuntu. I do not have Cygwin here.

  I should consume some calories before trying to debug anything.

On Tue, Jul 28, 2009 at 6:14 AM, Corinna
Vinschen<corinna-cygwin@cygwin.com> wrote:
> On Jul 27 22:56, Andy Koppe wrote:
>> I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
>> Here's an example:
>>
>> #include <stdio.h>
>> #include <locale.h>
>> #include <stdlib.h>
>> #include <wchar.h>
>>
>> int main(void) {
>> Â wchar_t wc;
>> Â size_t ret;
>> Â mbstate_t s = { 0 };
>> Â puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
>> Â printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
>> Â printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
>> Â printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
>> Â printf("%x\n", wc);
>> Â return 0;
>> }
>>
>> The sequence E2 94 84 should translate to U+2514. Instead, the second
>> and third calls to mbrtowc report encoding errors. It does work
>> correctly if the three bytes are passed to mbrtowc() in one go:
>>
>> Â printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));
>
> That's a bug in the newlib function __utf8_mbtowc. ÂI'm really surprised
> that this bug has never been reported before since it's in the code for
> years, probably since it has been introduced in 2002.
>
> I'll follow up on the newlib list.
>
>
> Thanks for the report and especially thanks for the testcase,
> Corinna

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]