This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: grep treating my text files as binary!


Warren Young wrote:
>On Dec 25, 2014, at 11:41 AM, Thomas Wolff <towo@towo.net> wrote:
>
>> In any case the argument is quite artificial since the new behaviour
>> hits many files that are in fact text files.
>
>Please define the term ?text file? in a way that allows a C programmer
>to write a program that automatically does the correct thing for all
>members of the class ?text file? without involving locales, or an
>equivalent mechanism.
...
>If grep runs into a byte sequence that makes it think it is not legal
>for your current locale, it must treat the file as raw bytes, unless you
>give it -a.
>
>If you don?t like this behavior, say ?alias grep=grep -a? in your
>~/.bashrc, and forget the change ever happened.  It?ll be on you when
>some non-text file gets treated as text and grep spams your terminal
>with binary garbage, though.

It's better to use the "alias grep='LC_ALL=C grep'" method. It keeps the
old way of detecting binaries (for example it detects an .EXE as binary)
while allowing you to match mostly-ASCII files with some
mismatched-locale characters. The definition you ask for is already in
the code. For us non-english people detecting what is "mostly ASCII" is
mostly right, at least interactively.

I ran into this, actually. I keep a list of my directories and it is in
CP1252 for reasons of interfacing with CMD.EXE. Suddenly grep couldn't
match it. But I figured something was up and set my locale to CP1252 and
then it worked.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]