This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [ANNOUNCEMENT] Updated: dash-0.5.8-3


On Jan 31 11:04, Corinna Vinschen wrote:
> On Jan 28 14:44, Houder wrote:
> > On Wed, 25 Jan 2017 16:14:00, Steven Penny wrote:
> > > Obviously Bash is not the problem, nor readline as Dash doesnt use readline. So
> > > it appears the issue this time is again with cygwin1.dll, or perhaps the Dash
> > > package.
> > 
> > .. uhm, it appears to me that Windows is the issue here.
> > 
> > As those in the know do not feel inclined to respond, I will provide some
> > guesses that are my own:
> > 
> >  - in terms of input buffer management, utf-8 encoded characters will not
> >    be recognized in case of bash and dash ... (they are under Fedora)
> >     - see the output of stty -a: iutf8 is not present (it is under Fedora)
> >  - readline provides bash with input buffer management for utf-8 encoded
> >    characters on Windows (that is why it 'works' in case of bash)
> >  - bash has support for utf-8 encoded characters ...
> >    (e.g. ls -l ? will include one-character filenames in case the name is
> >     made up of only one multi-byte character)
> >  - dash has no such support ... [1][2]
> > 
> > Consequently, dash is only partly useful, even more so on Windows (as it
> > would require an additional "helper" on Windows in order to obtain proper
> > line-editing). Helper? readline, libedit ...
> > 
> > However, I am only guessing ... (only Erik and Corinna can provide expert
> > details here).
> 
> I'm not quite sure yet but apparently the problem is in the handling of
> VERASE in the termios implementation.  In cooked mode it fills a char
> buffer with what has been typed.  The code doesn't know if the bytes in
> the buffer are UTF-8 chars or just random bytes.  So VERASE erases
> exactly one byte, which means, in case of UTF-8 chars it only erases the
> last byte of of a mulitbyte character.
> 
> It seems the Linux termios implementation is different in that it
> still knows which bytes constitute a single keypress and thus knows
> how much byte it has to erase.

Ok, here's what happens on Linux:  The termios code support a flag
IUTF8.  This flag determines if the termios code checks for UTF8
characters in the input when performing an ERASE.  It checks if the
IUTF8 flag is set and if so, it checks in a loop if the just erased byte
is a UTF-8 continuation character.  If so, it erases another byte.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: signature.asc
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]