This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: Why text=binary mounts


I understand your logic, however, it belies the fundamental problem, namely
that no one is writing their tools with such well constructed text
libraries.  And more practically speaking there are simpler solutions once
we recognize that most of us use the unix tools on files of specific types.
 That is, i use tools like sed and awk only on text files.  I cannot think
of a specific example where i have mixed the file type (text vs. binary)
usage of any of the unix tools, except possibly cmp.  

While this may be a distinction which you don't feel i should have to draw,
practically speaking most of us do and it works well for us.  That is, when
i use tar, i use it on a file of a specific format which tar generates.
While i could use sed or awk on a tar file, i would never expect it to work
properly, simply because i don't expect the authors of these tools to have
given any thought to having sed or awk used on anything except a file which
has a notion of LINE delimited by some End-Of-Line character, or
characters.  This perspective, which you have pointed out should not be
necessary, in practice is necessary and practically speaking useful.  And
it is this perspective i am attempting to address when i discuss the notion
of native text mode handling.  

That is, if we can agree that certain tools are always applied solely to
text, then automatic conversion of the native EOL to the single '\n'
character is the solution which will make these tools work in any
environment.  If we don't agree, then we must always construct 'text' files
according to the UNIX description of a text file when using the gnuwin32
tools and so it is not easy to use tools of mixed origin on a non-UNIX
system (for example emacs under NT will rewrite files with CRLF while bash
will need these files in LF format).  I prefer to use the native format so
that i don't have to remember which tools produce which results and when i
have to convert between the two formats.

... jeff

At 11:39 AM 1/14/98 +0000, Richard Thomas wrote:
>>I guess the point I was trying to make is that it doesn't seem to me that 
>>there is a good argument for there to be text processing functionality in 
>>the fopen() family of functions (I know, it's a little late now!).  The 
>>difference I see between 'modes' and 'formats' is this:  we don't have a 
>>JPG mode or a WAV mode in fopen(), so why do we have a text mode?  When 
>>somebody wants to open up and manipulate a JPEG file, they use a JPEG 
>>library that gives them access to methods that are meaningful only on JPEG 
>>files.  I see text files in the same way.  If you want to read a line of 
>>text, it seems to me that the most logical thing to do would be to use a 
>>library which gave you access to functions such as fscanf() etc. which have 
>>no meaning for generic (binary) files.  This library then would be the 
>>place to do things like making all text files look the same to the 
>>programmer whether they're DOS/UNIX/Mac/whatever, in the same way that a 
>>PCX library might 'gloss over' the differences between the different PCX 
>>versions.
>
>Good point. It's also important to remember that not all text is ASCII or
>ANSI, there's EBDIC (?) and a whole bunch of others too. Maybe a decent
>text library could even handle unicode files or something (I know nothing
>about unicode so dont flame me please) as well. Personally, when I open a
>file, I expect to get what's there. That *should* be the default. A file is
>just a bunch of bytes and that's the way it should be treated. If you want
>some kind of filter or interpretation, get a library.
>
>A well written text processing program should recognise any combination of
><cr> and <lf> as an end-of line marker and should write either the
>operating system default (But the OS should have no concept of "text"
>files) or ansi standard (if there is such a beast) or maybe even a format
>selected by the user.
>
>Even better would be that your program could register a callback function
>with the text processing library allowing complete control. For example, I
>define a text file format with each line being a field of 81 characters,
>the first byte representing the length of text on the line, subsequent
>characters being represented by 2*the alphabet position +1 if upperrcase
>(a=2, A=3, b=4, etc.....). How does fopen (fname, "rt") handle this? It is
>a text file. It doesnt use ANSI characters but it could and it still
>wouldnt be handled correctly. So how is this "text mode"? It's not, it's
>"let's kludge the end-of-line" mode. Text mode should imply that there's no
>post-processing to be done on the input, you open the file with the proper
>format filter and treat it as text from there-on-in.
>
>Other advantages: Opening binary files with a hex-mode filter or even
>executables with a disassembly/assembly filter and, of course, using
>whichever editor you prefer as long as it's compiled with the text-library.
>
>Of course, it could be done by patching fopen but the behaviour for that is
>already standardised and it would be a cludge. What's needed is a properly
>designed library
>
>Rich
>
>-
>For help on using this list (especially unsubscribing), send a message to
>"gnu-win32-request@cygnus.com" with one line of text: "help".
>
>
--
Jeffrey C. Fried      Jeff@Fried.net

   Because a liar tells the truth does not mean the truth is a lie.

NOTICE: I charge $500.00 for each unsolicited advertisement i receive as email
to cover the cost of my time to review and possibly respond to your
advertisement.
-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]