This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: tee piping to head gives error message


Buchbinder, Barry (NIH/NIAID <BBuchbinder <at> niaid.nih.gov> writes:
> > 
> >   Given that the purpose of head is to print the first few lines of a
> > file, it kind of makes sense to me that it would close the file after
> > it's read them rather than keeping the input file open and manually
> > reading-and-discarding the entire rest of it for no good reason.
> 
> Agreed.
> 
> >   So I reckon this is as-expected and by-design behaviour.
> 
> I might put it as "as-designed" rather than "by-design".  And for me, it
> certainly was unexpected.  tee and head are both part of coreutils.  I would
> expect that all coreutils would behave the same for head closing the pipe,
> but they don't.  And I would also expect that all utilities in a package
> that includes a utility that breaks pipes as a normal course of its
> operation would be silent when the two utilities are used together.  I would
> expect that tee pipes to head more often than something nasty happens and a
> pipe just breaks.

Coreutils is following POSIX, and the behavior of pipes is by design within 
POSIX.  POSIX requires that a failed write into a pipe raises SIGPIPE, and that 
the default action on SIGPIPE is to terminate the process.  Note that 
termination bypasses exit handlers registered with atexit().  POSIX also allows 
an application to ignore SIGPIPE; at which point it will detect failures in 
writing to a broken pipe but can continue in normal operation.  Furthermore, 
all of the coreutils are designed to check, using atexit() handlers, for any 
failed write to stdout.

Normally, tee and most other coreutils do nothing special with SIGPIPE, which 
means they only ignore SIGPIPE if their parent process was ignoring it.  So my 
guess is that somewhere in your shell you are setting up your environment to 
ignore SIGPIPE, so that applications spawned by your shell see write failures 
to broken pipes rather than the default of early termination.

Study this example, in bash, for more insight into child behavior when SIGPIPE 
is ignored or not:

$ trap - PIPE   # restore default handling to SIGPIPE
$ yes | tee /dev/null | head > /dev/null
$ echo ${PIPESTATUS[*]}
141 141 0       # yes and tee had SIGPIPE, head was successful
$ seq 10000 | tee foo | head > /dev/null
$ echo ${PIPESTATUS[*]}
141 141 0       # yes and tee had SIGPIPE, head was successful
$ wc foo
 2474  2475 11264 foo   # foo did not get the complete output of seq
$ trap '' PIPE  # now ignore SIGPIPE
$ seq 1000 | tee /dev/null | head > /dev/null
$ echo ${PIPESTATUS[*]}
0 0 0           # all 3 programs were successful
$ seq 10000 | tee foo | head > /dev/null
tee: standard output: Broken pipe
tee: write error
$ echo ${PIPESTATUS[*]}
0 1 0           # seq and head were successful, tee noticed the broken pipe
$ wc foo
10000 10000 48894 foo   # foo got the complete output of seq
$ yes | tee /dev/null | head > /dev/null
tee: standard output: Broken pipe

# At this point, yes and tee are in an infinite loop, hit ctrl-c
$ echo ${PIPESTATUS[*]}
130 130 0       # yes and tee had SIGINT from ctrl-c, head was successful
$ yes | tee -i /dev/null | head > /dev/null
tee: standard output: Broken pipe

# Again, an infloop, hit ctrl-c
$ echo ${PIPESTATUS[*]}
130 1 0         # yes had SIGINT, tee just regular failure from broken pipe

> 
> This seems like something the coreutils maintainer might want to address
> with the upstream maintainers, or to patch himself.  (I won't complain if he
> doesn't patch it.

Nope - as the cygwin coreutils maintainer, I won't patch coreutils, because the 
problem of an error message from writing to a broken pipe is not unique to 
cygwin (I ran the same tests on coreutils 5.3.0 on Solaris and saw similar 
behavior).  However, note that tee currently has the POSIX-mandated -i option 
to ignore SIGINT, where in prior versions of coreutils it was treating -i as 
ignoring all signals; the change in 5.3.0 for tee to terminate on SIGPIPE was 
intentional, added around April 2004 (see /usr/share/doc/coreutils-5.3.0/NEWS, 
or http://lists.gnu.org/archive/html/bug-coreutils/2004-04/msg00126.html).  You 
may have success if you propose upstream on the coreutils mailing list the 
addition of a new option to ignore SIGPIPE to allow the restoration of prior 
behavior while still complying with POSIX.  You may also want to ask for an 
interpretation from the POSIX folks as to whether write errors to stdout must 
force tee to fail, or if the current wording that tee return 0 only if "The 
standard input was successfully copied to all output files" allows success even 
if writes to stdout failed, basing your argument on the fact that stdout is not 
one of the output files on the command line.  
http://www.opengroup.org/onlinepubs/009695399/utilities/tee.html

If, as Dave Korn's followup pointed out, cygwin is hanging on some instances of 
pipe handling and process termination interaction, then that is a cygwin and 
not a coreutils bug, and I wouldn't know what to do to try to patch that.

>  Taking on coreutils was quite a commitment -- well
> deserving of the two gold stars -- and I know that fixing this may be a low
> priority.)  Unfortunately, though PTC, I'm not capable of providing a patch.
> In any case, tee seems to save its input as desired, so while the error
> message is annoying and misleading, I suppose that one can live with it.

  You can make the error messages about the broken pipe consistently go away, 
but only by risking early termination of tee.  Or, continue to ignore SIGPIPE 
and redirect tee's stderr to /dev/null; then tee will always run to completion, 
but you will miss any other error messages from tee.
$ cat foo | tee -i bar 2> /dev/null | head

> >   It's just that tee notices when a write to stdout fails, whereas
> > most applications are more loosely coded and don't check.

Actually, as explained above, all of the coreutils that write to stdout check 
if those writes failed, provided they weren't terminated by a signal.  That 
way, even something like `ls --help' will fail if stdout is redirected to a 
read-only file.

> > 
> >> But the number of lines/bytes at which the error disappears
> >> does not seem to be constant.
> > 
> >   Umm, no, .... it's equal to the number of lines in the source file.
> 
> No.  It should be equal to the numbers of lines in the source file but is
> not.  The error message went away around 126 or 130 lines, while the source
> file had 556.
> 
> (I would speculate that the disappearance of the error messages when enough
> lines are provided might have something to do with buffering, but I'm not a
> programmer and speculation by we mere "users" is sometimes discouraged.  As
> for why it is not consistent ...)

Bingo - there is buffering going on.  Note that POSIX disallows line buffering 
in tee, but does allow character buffering - the tee implementation reads a 
block at a time (probably 1024 or 2048 characters) before writing.  Likewise, 
head reads a block before parsing it into lines, as that is faster than reading 
a line at a time.  So, if head reads the entire block that tee wrote, even if 
only the first part of the block needed to be printed, then tee never sees a 
write failure.

--
Eric Blake



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]