For Programmers: Free Programming Magazines  


Home > Archive > Unix Programming > April 2006 > Default buffers on pipes









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Default buffers on pipes
jh

2006-02-08, 9:31 am

I'm a bit about this... hoping somebody can help.

So I have one program writing to stdout and another reading on stdin
(using fwrite() and fread()); I run them from the command line piping
the output of one to the input of the other. Experimentation reveals
that if the first program writes faster than the second reads, then
there is a fairly small buffer which, after it fills up, the first
program blocks on writing until the second has a chance to catch up.
Likewise, if the reader is faster, it blocks while the pipe is empty.
That's all logical enough. Question: How big is this buffer by default?
Can I change it?

Also (this is the part I'm really interested in), what happens if the
first program writes a little bit at a time, but the second one reads a
large chunk at a time, and it tries to read chunks significantly larger
than the size of that buffer (say, 2-3 times larger)? Will the first
program block when it fills the buffer, and then the second program will
block forever, because the amount of data it wants is never available?
Conversely, what if the first program writes in large chunks... can it
write in chunks bigger than the buffer size?

Thanks,
Josh

PS: I hope I don't offend anyone by not posting a real email address...
I get enough spam as it is. What's the etiquette on that? Most people
seem to include their email... is it considered impolite not to?
Paul Pluzhnikov

2006-02-08, 9:31 am

jh <no@thanks.com> writes:

> Question: How big is this buffer by default?


$ grep PIPE_BUF /usr/include/*/*.h
/usr/include/bits/posix1_lim.h:#define _POSIX_PIPE_BUF 512
/usr/include/linux/limits.h:#define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */

> Can I change it?


No.

> Also (this is the part I'm really interested in), what happens if the
> first program writes a little bit at a time, but the second one reads a
> large chunk at a time, and it tries to read chunks significantly larger
> than the size of that buffer (say, 2-3 times larger)?


Reads from pipe can be "short" (return less data than the program
requested).

> PS: I hope I don't offend anyone by not posting a real email address...
> I get enough spam as it is. What's the etiquette on that? Most people
> seem to include their email... is it considered impolite not to?


Most people nowadays obfuscate their e-mail (as I have done), and
provide instructions that are easy for humans but hard for e-mail
bots to decode.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Alex Fraser

2006-02-08, 9:31 am


"jh" <no@thanks.com> wrote in message
news:no-7CBD72.01533604022006@wonka.hampshire.edu...
> So I have one program writing to stdout and another reading on stdin
> (using fwrite() and fread()); I run them from the command line piping
> the output of one to the input of the other.

[snip]
> Question: How big is this buffer by default? Can I change it?


Which buffer? There are three: the output stdio buffer in the process with
the write end of the pipe, the pipe buffer itself, and the input stdio
buffer in the process with the read end of the pipe.

The stdio buffer sizes are implementation defined, and can be changed with
setvbuf(). The pipe buffer size is PIPE_BUF, which is at least 512 bytes.

> Also (this is the part I'm really interested in), what happens if the
> first program writes a little bit at a time, but the second one reads a
> large chunk at a time, and it tries to read chunks significantly larger
> than the size of that buffer (say, 2-3 times larger)? Will the first
> program block when it fills the buffer, and then the second program will
> block forever, because the amount of data it wants is never available?


fread() and fwrite() do not return until they have, respectively, read or
written the number of bytes implied by their arguments, unless there is an
error or EOF is reached (the latter for fread() only).

fread() and fwrite() internally call read() and write() respectively.
Ignoring signals, write() behaves like fwrite(). But read() has different
semantics: it returns as soon as data is available.

If you write a little bit at a time, fwrite() will probably copy the data in
each call to the stdio buffer, calling write() to drain that buffer when it
is full.

At the other end of the pipe, fread() will loop calling read(), which will
block if no data is available at the time of the call, until enough has been
read (assuming EOF is not reached).

> Conversely, what if the first program writes in large chunks... can it
> write in chunks bigger than the buffer size?


Yes; fwrite() will block in write() if necessary.

Alex


Jordan Abel

2006-02-08, 9:31 am

On 2006-02-04, jh <no@thanks.com> wrote:
> I'm a bit about this... hoping somebody can help.
>
> So I have one program writing to stdout and another reading on stdin
> (using fwrite() and fread()); I run them from the command line piping
> the output of one to the input of the other. Experimentation reveals
> that if the first program writes faster than the second reads, then
> there is a fairly small buffer which, after it fills up, the first
> program blocks on writing until the second has a chance to catch up.
> Likewise, if the reader is faster, it blocks while the pipe is empty.
> That's all logical enough. Question: How big is this buffer by default?
> Can I change it?
>
> Also (this is the part I'm really interested in), what happens if the
> first program writes a little bit at a time, but the second one reads a
> large chunk at a time, and it tries to read chunks significantly larger
> than the size of that buffer (say, 2-3 times larger)? Will the first
> program block when it fills the buffer, and then the second program will
> block forever, because the amount of data it wants is never available?


No, it will read what it can. It may or may not then block on a second
attempt to read, but by then the buffer is empty and the writer can
write more.

> Conversely, what if the first program writes in large chunks... can it
> write in chunks bigger than the buffer size?


No. The write will (i believe) succeed in writing what it can. It may
then block on its next attempt to write, until the buffer is drained

> PS: I hope I don't offend anyone by not posting a real email address...
> I get enough spam as it is. What's the etiquette on that? Most people
> seem to include their email... is it considered impolite not to?


You should really use *.invalid for that purpose - and while it's
technically against the rules, no-one will really care unless you're
a troll using it to hide your identity.
Simon Elliott

2006-02-08, 9:31 am

On 04/02/2006, Alex Fraser wrote:

> Which buffer? There are three: the output stdio buffer in the process
> with the write end of the pipe, the pipe buffer itself, and the input
> stdio buffer in the process with the read end of the pipe.


I've seen some apps (eg the ALSA aplay utility) which use read() and
write() instead of fread() and fwrite(), to talk to stdin and stdout,
getting the file descriptor from fileno(stdin) or fileno(stdout).

I can see the rationale for this: it gives more fine-grained control
over the I/O, and presumably the file descriptors can be made non
blocking.

Does anyone think this is a particularly good/bad idea?

--
Simon Elliott http://www.ctsn.co.uk
Alex Fraser

2006-02-08, 9:31 am

"Simon Elliott" <Simon at ctsn.co.uk> wrote in message
news:43e4ee36$0$1170$bed64819@news.gradwell.net...
> I've seen some apps (eg the ALSA aplay utility) which use read() and
> write() instead of fread() and fwrite(), to talk to stdin and stdout,
> getting the file descriptor from fileno(stdin) or fileno(stdout).
>
> I can see the rationale for this: it gives more fine-grained control
> over the I/O, and presumably the file descriptors can be made non
> blocking.
>
> Does anyone think this is a particularly good/bad idea?


IMO: if you use POSIX I/O functions because the standard C I/O ("stdio")
functions can't do the job, then it is obviously a good idea. Otherwise it
is a bad idea.

There are definitely cases where the stdio functions can't do the job. If
you want to multiplex I/O using select() or poll(), including stdin/out/err,
then the underlying descriptors must be non-blocking for robustness, and the
stdio functions require blocking descriptors. If you are handling signals,
the interaction with stdio functions is unspecified, whereas interaction
with POSIX functions is.

You might want to use SIGALRM or select()/poll() to implement I/O with
timeouts. This rules out stdio.

Alex


Alex Fraser

2006-02-08, 9:31 am

"Jordan Abel" <random832@gmail.com> wrote in message
news:slrndu9qgl.lc9.random832@random.yi.org...
> On 2006-02-04, jh <no@thanks.com> wrote:
[snip][color=darkred]
>
> No, it will read what it can.


This is basically true for read(), but not fread().

> It may or may not then block on a second attempt to read, but by then the
> buffer is empty and the writer can write more.


For read() on a blocking descriptor, the "may or may not" is determined by
whether or not there is (more) data available. That is, ignoring EOF and
signals, read() blocks if no bytes are available, else it returns whatever
data it can (normally the lesser of the number of bytes available and the
specified size, but theoretically anything from one byte up to that amount).

>
> No. The write will (i believe) succeed in writing what it can.


On a blocking descriptor, write() will write as many bytes as requested
(blocking if necessary), unless there is an error or a signal causes it to
return early.

Alex


Ian Zimmerman

2006-02-08, 9:31 am


Simon> I've seen some apps (eg the ALSA aplay utility) which use read()
Simon> and write() instead of fread() and fwrite(), to talk to stdin and
Simon> stdout, getting the file descriptor from fileno(stdin) or
Simon> fileno(stdout).

Isn't fileno(stdin) --- resp. fileno(stdout) --- just a fancy way to
write 0, resp. 1?

Simon> I can see the rationale for this: it gives more fine-grained
Simon> control over the I/O, and presumably the file descriptors can be
Simon> made non blocking. Does anyone think this is a particularly
Simon> good/bad idea?

Not using read or write alone, but _mixing_ read and write with
fread, fwrite and the rest of bufferd I/O.

--
A true pessimist won't be discouraged by a little success.
Jordan Abel

2006-02-08, 9:31 am

On 2006-02-05, Alex Fraser <me@privacy.net> wrote:
> "Jordan Abel" <random832@gmail.com> wrote in message
> news:slrndu9qgl.lc9.random832@random.yi.org...
> [snip]
>
> This is basically true for read(), but not fread().


There is no such thing as fread(). I was talking in terms of the actual
system calls inevitably made by the program, since for these purposes it
doesn't matter what language they're actually in. In the case of
fread(), the "second attempt to read" is made in a loop within fread.

>
> For read() on a blocking descriptor, the "may or may not" is determined by
> whether or not there is (more) data available.


I was referring to after it drains a full buffer. [therefore, there's no
data left until the writer puts in more]

>
> On a blocking descriptor, write() will write as many bytes as requested
> (blocking if necessary), unless there is an error or a signal causes it to
> return early.


Are you sure? It can't time out?

Now, if write is blocking, at least the ones that fit in the buffer will
then be available to the reader, right?
Alex Fraser

2006-02-08, 9:31 am

"Jordan Abel" <random832@gmail.com> wrote in message
news:slrnducnp0.qjm.random832@random.yi.org...
> On 2006-02-05, Alex Fraser <me@privacy.net> wrote:
>
> There is no such thing as fread().


Are you sure?

> I was talking in terms of the actual system calls inevitably made by the
> program,


But given that the OP only mentioned fread() and fwrite(), you didn't think
that fact was worth mentioning? (This was really my point.)

[snip]
>
> Are you sure? It can't time out?


Not usually, but a timeout would constitute an error.

Alex


Geoff Clare

2006-02-08, 9:32 am

Ian Zimmerman <nobrowser@gmail.com> wrote, on Sun, 05 Feb 2006:

> Isn't fileno(stdin) --- resp. fileno(stdout) --- just a fancy way to
> write 0, resp. 1?


They start out with those values, but there are ways they can get
changed. E.g.:

close(0);
freopen(somefile, "w", stdout);

Now fileno(stdout) is 0.

--
Geoff Clare <netnews@gclare.org.uk>

pixelbeat

2006-04-18, 10:12 am

I was by this myself, so
I wrote up some notes on the subject:
http://www.pixelbeat.org/programming/stdio_buffering/

Any comments appreciated.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2010 codecomments.com