This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

1.3.18: BUG: Piping DOS files to grep (v2.5) doesn't work properly


Mailing list search didn't find this, nor does it appear
in the FAQ... hopefully this isn't old news to all of you.

Files read from a pipe are treated differently by grep
than files read directly.  This results in some unexpected
(by me) behaviour when using grep on files which use
the a DOS line-end (cr/nl).  This looks like a bug to me.

I'd expect the following commands to have equivalent
results:

  grep myregex blah
  grep myregex < blah
  cat blah | grep myregex

They are equivalent when the regular file blah uses
Unix line ends, but they differ for a file blahdos which
uses DOS line ends.  It appears to me as though grep
is treating its input as binary when reading from a pipe,
but correctly using "undossify_input()" in other cases.

Here is an example.  I've created two files, blah (nl line-end)
and blahdos (cr/nl line-end).

   $ cat blah
   foobarTest
   $ od -Ax -a blah
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ od -Ax -a blahdos
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

These files should match the regex 'Test$' in all cases,
but grep on blahdos fails for this case:

   $ cat blahdos | grep 'Test$'
   $

And here's why (not the -v to invert the match so we have
something to look at):

   $ cat blahdos | grep -v 'Test$' | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

There's still a cr/nl on the output which wouldn't be there if
grep had interpreted its input as having DOS line ends.  Here's
what a successful grep of the UNIX line end file looks like:

   $ cat blah | grep 'Test$' | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b

In fact, if I read the blahdos file in any other way except through
a pipe, it successfully matches (note the stripped out cr on the output):

   $ grep 'Test$' blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ grep 'Test$' < blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b

Just in case you might think that this has something to do with cat
(I did), here's the output of cat for each file:

   $ cat blah | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ cat blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

Using head instead of cat gives the same results as well, just to 
completely remove cat from the picture.

I'm currently running these versions of tools on win2k:
  cygwin     1.3.18-1
  textutils  2.0.21 (cat, od, head)
  grep       2.5
  bash       2.05b.0(8)-release

I also tried this out with cygwin 1.3.17-1 with identical results.

If you need any further information, please cc me directly since I
don't read the mailing lists very often.

Stacey.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]