This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: stat() lstat() not able to read long filename with cyrillic chars?
- From: Andrey Repin <anrdaemon at yandex dot ru>
- To: Corinna Vinschen <cygwin at cygwin dot com>, cygwin at cygwin dot com
- Date: Fri, 25 Dec 2015 03:04:51 +0300
- Subject: Re: stat() lstat() not able to read long filename with cyrillic chars?
- Authentication-results: sourceware.org; auth=none
- References: <20151223194440 dot 5B2A98CFEA at edrusb dot is-a-geek dot org> <20151224192448 dot GB4275 at calimero dot vinschen dot de>
- Reply-to: cygwin at cygwin dot com
Greetings, Corinna Vinschen!
>> First, I have read the FAQ and this mailing archive :)
>>
>> Here is the problem I meet:
>>
>> In a directory are placed three files using windows 8's explorer:
>> - a short Cyrillic filename "ÐÐÐÐÐ.txt"
>> - a long Cyrillic filename
>> "ÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐÐ.txt"
>> - a long Latin filename
>> "ababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa.txt"
>>
>>
>> >From a C program compiled under Cygwin, I can obtain the corresponding
>> filename strings using readdir_r()...
>>
>> "\320\260\320\261\320\262\320\260\320\261.txt"
>> "\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snipped]"
>> "abababababaababababa [snipped]"
>>
>> ... but passing these strings in turn to lstat() or stat() returns 0 as
>> expected for all except for the long Cyrillic filename.
> NAME_MAX is 255. On Windows this is the number of UTF-16 chars
> unfortunately. On POSIX systems (as on Cygwin) this is the number of
> bytes. Long UTF-16 strings in cyrillic take twice as much UTF-8 chars
> as it has UTF-16 chars, so NAME_MAX in utf-8 cyrillics translates into
> a maximum of 127 UTF-16 chars.
Aren't POSIX restrictions are a bit different?
Namely 128 bytes per path element and 4096 bytes for file name?
> If you need access to UTF-16 filenames with more characters, you can
> switch to a one-byte charset temporarily, e.g.
> $ LC_ALL=ru_RU your_app
> to switch to iso-8859-5 or
> $ LC_ALL=ru_RU.CP1251
> to switch to Windows codepage 1251. See
> https://cygwin.com/cygwin-ug-net/setup-locale.html
> HTH,
> Corinna
--
With best regards,
Andrey Repin
Friday, December 25, 2015 03:03:51
Sorry for my terrible english...