This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line


Alexey Borzenkov wrote:
> It might be safe for you, but not for other people. If you have a
> Russian default codepage and ever need to work with chineese/japanese
> filenames and cygwin uses default codepage for filesystem operations
> (as in 1.5 right now), then you are really screwed. In my opinion
> utf-8 is a silver bullet here, and I'm very glad it went that way.

I must be missing something here. Suppose you have a default Russian code page, with LANG unset (ie. cygwin 1.7 uses UTF-8). Now, if you're using any non-Unicode, non-CodePage aware, native application to create a Russian filename, isn't Windows going to convert the filename from the Russian code page into UTF-16 for storage in NTFS? If that is the case, and then you do an ls from cygwin 1.7, aren't you going to get the wrong filename displayed? ie. interoperability with non-Unicode, non-CodePage aware native applications will be broken for you too with the current default cygwin 1.7 behaviour.

Or is this, not a case that you care about and you *only* use cygwin applications?

Regards,
-Edward

Alexey Borzenkov wrote:
On Sat, May 30, 2009 at 12:10 AM, Edward Lam <edward@sidefx.com> wrote:
Thanks for explaining the UTF8 changes in cygwin 1.7. However, the decision
to use UTF-8 for the C locale is questionable.

Not at all, because utf-8, as far as I understand, is used for communication with the system in this context, and does not force anything to the application. Most modern unixes use utf-8 nowadays, it means that even if you have a C locale your terminal outputs text in utf-8, your input is utf-8, your filenames are utf-8 (well, not really, but the rest of the system sees them that way). Same stuff here, except that launching non-cygwin processes is communication with the system as well, and it needs conversion. And where is conversion there is always possible loss of data. One way or the other.

It seems to me that it would be much safer to use the SYSTEM DEFAULT code
page (ie. the return value of the system GetACP() function) for CYGWIN
instead, ensuring compatibility for the large class native Windows
applications that are non-Unicode, non-CodePage aware.

It might be safe for you, but not for other people. If you have a Russian default codepage and ever need to work with chineese/japanese filenames and cygwin uses default codepage for filesystem operations (as in 1.5 right now), then you are really screwed. In my opinion utf-8 is a silver bullet here, and I'm very glad it went that way.

I think it's very bad that changing LANG can result in a truncated *command
line*, that has nothing to do with printf. The printf in the code was just
for testing. The HUGE bug is that the application gets the  WRONG NUMBER OF
ARGUMENTS.

No, the bug is not that it gets wrong number of arguments. In fact, Windows has no concept of arguments, only C runtime does, which parses the command line. If command line is truncated, then C runtime will have missing arguments when it tries to parse it.

I mentioned wprintf because recently I was wondering why
mkpasswd/mkgroup had a strange truncating behavior with russian
usernames and it turned out that wprintf, when it can't encode some
characters, stops right there and returns an error code. But, honesly,
who ever checks return codes from printf?

Here might be something similar. When constructing command line some
function is called and can't encode some character, returns error
status, but it's never checked, and you get truncated command line.

And btw, I'm not cygwin developer here, I'm just a speculating user
right now, because I haven't been searching this problem in the code.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]