This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Bogus assumption prevents d2u/u2d/conv/etal working on mixed files.


You guys are missing the point. Charles Wilson mentioned a side effect of the code at issue in the original post and suggested that it was valuable.

Personally, I don't care if they attempt to detect binary files or not. My point was (and is) that: *If detection of binary files is desirable*, then why not implement it in a more robust manner and inform the user rather than silently skipping "binary" files.


Hannu E K Nevalainen wrote:


From: David Fritz
Sent: Sunday, April 04, 2004 6:46 AM


Charles Wilson wrote:
[...]

(2) it's an attempt to prevent users from permanently

scrogging binary


files.  See: d2u, on a binary file, is an irreversible operation.  So,
if you do "d2u *" you'll probably kill something deep inside

some binary


file, and you can't fix it -- unless some minimal safeguards

are in place.


 u2d MAY be reversible -- IF there were no pre-exising \r\n
combinations in the file to begin with -- so when (OMG-fixit-)d2u is
run, obviously the first '\n' is preceeded by a (newly-added)

'\r\n', so


the prog merrily replaces ALL '\r\n' with '\n'...which MAY fix your
oops, but maybe not.


So, with the current code, if you snarf the first "line" -- all chars until the first '\n' -- if it's a binary file the odds are pretty low that the immediately-preceeding character is a '\r' -- so d2u as currently coded will bail out, and no harm is done.

It doesn't work so well in the other direction -- by the same logic
above, you'll almost never bail out early if you run 'u2d' on a binary
file -- but if you immediately do a 'd2u' you MIGHT be able to recover.)


[...]


If detection of binary files is desirable, why not use an
explicit test with a
more robust methodology?  GNU grep detects binary files by
looking for a '\0'
byte.  Such a test could be used by both d2u and u2d; they could
bail out with a
message like "skipping binary file".

Cheers


A more "foolproof" (? does such a thing exist) test would be to disallow
using d2u/u2d on anything in directories found in $PATH. But then that one
has its disadvantages too, but less so IMO.

 I find all this "safety" related stuff be a PITA at times. Any kind of test
is prone to fail at some instances; at other instances just a cause for
confusion most of the time -> a lot of bug-hunting - for so little gain.

 How about running d2u/u2d, say, on a regedit 5 file (ie; mostly ascii but
due to the coding every other character is a NUL)?
 Would that be considered "legal"? IMO it should, a fast and easy way to
strip the garbage - to create a file that can be used with normal tools.


Huh? u2d/d2u will not strip the "garbage". For that use iconv; as in,


$ iconv -f UTF-16LE -t UTF-8 < in > out


 IMO; stay away from all of this safety thingies, at _LEAST_ allow them to
be bystepped; e.g. --force. I will be using that switch all the time.

 There are a lot of these foolhardy "traps" one can fall into; e.g:
$ cd /;rm -rf *
are you gonna find a "safety" hatch for that too?


Noo... Please, remove all of these safety checks. There must be some kind of user sanity presupposition. Or else the tools soon will be crippled to a state where they are unusable for normal work.

Make Backups, Not War! -> MBNW! ;-P


OLOCA?


[...]

Cheers


-- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]