This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Erroneous line endings (cat,gawk,text mount)


On Tue, 22 Apr 2003, Peter S Tillier wrote:

> Roman Belenov wrote:
> > I encountered that cygwin tools can generate file with strange line
> > endings in certain situation. I have a file (name it foo.txt) with
> > dos-style line endings in  text mounted directory. If I do
> >     gawk {print;} <foo.txt >bar.txt
> > or
> >     cat foo.txt >bar.txt
> > I get a copy of foo.txt. But if I do
> >     cat foo.txt | gawk {print;} >bar.txt
> > I get 0xd doubled in line separators (so lines are separated with 0xd
> > 0xd 0xa in bar.txt).
> >
> > <disclaimer>
> > This is just a bug report, I don't expect timely reaction of any
> > kind.
> > </disclaimer>
> >
> > --
> >   With regards, Roman.
>
> This is very interesting as I couldn't reproduce Roman's results at
> all, although I did get some results that I didn't expect.  Details
> follow.
>
> System: Win98SE
> Cygwin: 1.3.22
> Gawk:   3.1.2-2
>
> $ echo "CYGWIN = $CYGWIN"
> CYGWIN = tty
>
> $ mount                                      # output wrapped at col 72
> C:\Cygwin\usr\X11R6\lib\X11\fonts on /usr/X11R6/lib/X11/fonts type
> system (binmode)
> C:\Cygwin\bin on /usr/bin type system (binmode)
> C:\Cygwin\lib on /usr/lib type system (binmode)
> C:\Cygwin on / type system (binmode)
> a: on /cygdrive/a type user (textmode)
> c: on /cygdrive/c type user (binmode,noumount)
> d: on /cygdrive/d type user (binmode,noumount)
>
> $ cd /cygdrive/a
>
> The following 3 commands give the output that I expect.
>
> $ od -ba foo.txt
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> $ cat foo.txt | od -ba      # same as above - as it should be: UUOC ;-)
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> $ cat foo.txt >bar.txt;od -ba bar.txt                     # as expected
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
>
> However, this doesn't:
>
> $ awk 1 foo.txt | od -ba
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> For a text mount I'd expect "\n" -> "\r\n" translation on output, but
> it doesn't seem to be happening.
>
> Other gawk Windows ports normally translate "\r\n" -> "\n" on input and
> "\n" -> "\r\n" on output, unless the BINMODE variable is used.  This is
> so that gawk can work internally with "\n" as a line ending, but handle
> the system's line endings correctly.  [See gawk manual]
>
> For the Cygwin port and a text mount I'd expect the same behaviour,
> i.e., "\r\n" -> "\n" on input and "\n" -> "\r\n" on output, unless the
> BINMODE variable was set.
>
>
> Next I took a file on the text mount with unix line endings:
>
> $ od -ba unixle.txt
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> $ cat unixle.txt | od -ba                            # no surprise here
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> $ cat unixle.txt >bar.txt;od -ba bar.txt   # s/b "\r\n" endings surely?
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> $ awk 1 unixle.txt | od -ba                # s/b "\r\n" endings surely?
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> For the above 2 commands the results seem odd again to me as I would
> expect the output files to be "\r\n" terminated.
>
> I re-read the rules in the Cygwin manual about line end translation and
> tried this:
>
> $ od -ba a:foo.txt                                        # as expected
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> $ awk 1 a:foo.txt >bar.txt;od -ba bar.txt
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> But:
>
> $ awk 1 a:foo.txt >a:bar.txt;od -ba bar.txt
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> As the manual says if you use a path for the file that includes a drive
> letter then the mount for that file is text, but shouldn't we get the
> same output without the drive letter as /cygdrive/a is text mounted?
>
> Interestingly (still on the text mounted /cygdrive/a):
>
> $ od -ba unixle.txt
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> $ awk 1 a:unixle.txt >bar.txt;od -ba bar.txt
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> $ awk 1 a:unixle.txt >a:bar.txt;od -ba bar.txt
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> $ awk 1 unixle.txt >a:bar.txt;od -ba bar.txt
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> These are as I would expect, given the manual's rules.
>
>
> How about a bin mount I thought?  So:
>
> $ cd ~
>
> $ od -ba foo.txt
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> $ cat foo.txt | od -ba
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> $ cat foo.txt >bar.txt;od -ba bar.txt       # mmm should cat translate?
> 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
>           1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
> 0000017
>
> $ awk 1 foo.txt | od -ba                            # well awk does ...
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> $ awk 1 foo.txt >bar.txt;od -ba bar.txt         # ... however you do it
> 0000000 061 012 062 012 063 012 064 012 065 012
>           1  nl   2  nl   3  nl   4  nl   5  nl
> 0000012
>
> $ # yes, I know the last two should work the same.
>
>
> So it seems that with gawk on a bin mount we get line end translation
> on output, but not on a text mount, unless you force Cygwin to do it by
> using a drive letter in the file path.
>
> Or am I missing something significant in the documentation?
>
> Peter S Tillier

Peter,

FYI, pipes ("|") are controlled by the "binmode" setting in the CYGWIN
environment variable, which is the default.  I suspect the BINMODE
variable will simply be ignored.  Try adding "nobinmode" to CYGWIN and
running some of the piped experiments again.

If you specify the drive (e.g., "a:bar.txt"), the mount table is bypassed
completely, and the file is accessed directly (and the mount is assumed to
be text, IIRC).

Also, "cat" will not translate line endings or anything at all -- it's
just a character-by-character copy of stdin to stdout.  You will get line
end translation behavior from programs that read the input line-by-line
and then print lines out with a "\n" (awk, sed, grep, etc).  The only
thing the mount type controls is how "\n" is interpreted.  If your program
never prints out a "\n" (that it didn't read from the input), the mount
type and "[no]binmode" setting won't matter, I think...
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha at cs dot nyu dot edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor at watson dot ibm dot com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

Knowledge is an unending adventure at the edge of uncertainty.
  -- Leto II


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]