For Programmers: Free Programming Magazines  


Home > Archive > Tcl > December 2006 > TIP #287: Add a Commands for Determining Size of Buffered Data









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author TIP #287: Add a Commands for Determining Size of Buffered Data
Fredderic

2006-12-14, 4:17 am

I remember a little while back there was a discussion on the DoSability
of [gets], and ways to plug the hole. This TIP being one of them...
I've looked through the TIP index, but didn't see anything matching my
query here;

As I understand it, TIP #287 allows a script to check on the state of
the incoming buffer, and if it appears to be accumulating too much data
(1MB is a rather large URL)

My question is whether there's any plans for a method of [gets] DoS
prevention that doesn't require polling the channels periodically.
Something like an alert threshold, or upper limit on the line length
allowed for a channel? It's difficult to figure out a good poll
interval to set, at which to check all the channels to see if their
input buffers are being flooded. It's also something that a lot of
people wouldn't implement until after they discover that they needed
to, which is not really optimal.

Having a buffer/line controlling option there, say one that triggers an
overflow fileevent once the buffer reaches a certain limit, is much
easier to implement, far less obscure, and hence more likely to
actually be put into practise BEFORE the problem occurs.

At least, that's what I think. ;)


Fredderic
Stephan Kuhagen

2006-12-14, 4:17 am

Fredderic wrote:

> My question is whether there's any plans for a method of [gets] DoS
> prevention that doesn't require polling the channels periodically.
> Something like an alert threshold, or upper limit on the line length
> allowed for a channel?


How about a switch for [gets] that sets the maximum line length that is
allowed to be returned? That would be similar to [read numChars], but with
the convenient behaviour of [gets] in normal cases. Example:

catch {
gets -maxLength 1024 theLine
} error

If [gets] reads more than 1024 chars without finding a newline, it throws an
error. This would not break old code and there is no need to poll.

Regards
Stephan


Andreas Leitgeb

2006-12-14, 9:04 am

Stephan Kuhagen <nospam@domain.tld> wrote:
> Fredderic wrote:
> How about a switch for [gets] that sets the maximum line length that is
> allowed to be returned? That would be similar to [read numChars], but with
> the convenient behaviour of [gets] in normal cases. Example:


> catch {
> gets -maxLength 1024 theLine
> } error


I like the idea of throwing an error (when the limit is exceeded)
very much: afterall it has to be distinguishable from a normal
return where the eol appeared right in time.
In C's fgets(), this case is distinguishable because the
(eventual) eol-char is part of the returned string.
In Tcl we need something else.

However, to make it robust, one would still have to
check for the reason of [gets] failing (it could also
have been due to an array-variable given, or for
an invalid channel, or whatever other programming error)

we could also specify it as passing the maxlength as
an optional third argument. This would mean that giving
a maxlen forces a variable-name to also be given. I don't
think this is a problem, because even if gets runs into
a too long line, it's good to still know what has been
read so far, and this information can only be made available
through the variable (not through return-value, in the case
of error-throwing!)

Alternatively, we could change errror-handling inside gets such,
that if an error is thrown, the data read so far is marked as
unread, so it could be retrieved with a following [read ...].

Currently, this at least doesn't happen in the case where the
variable specified is an array. (data is read but never assigned)
As of now this isn't a problem, because it only happens for
programming errors, but once it could happen for "foreign errors"
it instantly becomes a problem.

Stephan Kuhagen

2006-12-14, 9:04 am

Andreas Leitgeb wrote:

> However, to make it robust, one would still have to
> check for the reason of [gets] failing (it could also
> have been due to an array-variable given, or for
> an invalid channel, or whatever other programming error)


Of course, but this can be checked by looking at the contents of the
error-variable from [catch].

> we could also specify it as passing the maxlength as
> an optional third argument. This would mean that giving
> a maxlen forces a variable-name to also be given. I don't
> think this is a problem, because even if gets runs into
> a too long line, it's good to still know what has been
> read so far, and this information can only be made available
> through the variable (not through return-value, in the case
> of error-throwing!)


Good point, and makes it even more elegant and easy to write. So it should
be

catch {
gets theLine 1024
} error

> Alternatively, we could change errror-handling inside gets such,
> that if an error is thrown, the data read so far is marked as
> unread, so it could be retrieved with a following [read ...].


I would prefer the first solution for two reasons: 1. I think, it's faster,
because it is not necessary to put the data back (if this is possible at
all because of buffering, reading from pipe and such), 2. the
implementation of [gets] can be thought as something like (untested...)

proc gets {channel {vname ""} {maxLen -1}} {
set count 0
if {$vname != ""} {
upvar $vname data
}
set data {}
while { ($count<$maxLen) || ($maxLen<0) } {
set char [read $channel 1]
if {$char=="\n"} break
append data $char
incr count
}
if {$vname != ""} {
return $count
}
return $data
}

So I would expect, that if there is some error inside that code, I have at
least the chars read so far.

Regards
Stephan

Andreas Leitgeb

2006-12-14, 7:10 pm

Stephan Kuhagen <nospam@domain.tld> wrote:
> Of course, but this can be checked by looking at the contents of the
> error-variable from [catch].


Yes, sure there is a way, but I see again the tendency
of programmers to avoid the real check, and just assume
that a thrown error implies a too long input line...

Still, this bug is still the harmlesser one compared
to DoS-prone current [gets].

> Good point, and makes it even more elegant and easy to write.
> So it should be
> catch {

gets theLine 1024
> } error

The channel is of course non-optional (gets stdin theLine 1024).
And one also should not ignore catch's result in this context,
or one might run into surprises later :-)

> I would prefer the first solution for two reasons: 1. I think, it's faster,
> because it is not necessary to put the data back (if this is possible at
> all because of buffering, reading from pipe and such),


I think that unlike your procedural description, gets really
doesn't "read byte by byte till it finds a eol". I rather think
it reads a block, scans it for eol's and leaves the rest in some
input buffer to be used at next gets/read. If it's that way,
then "unreading" should be all the easier implementable.
If it's not that way (and if putting back into buffer is
really impossible), then the read bytes should at least be
written to the variable (except for invalid varnames), to
prevent their loss.

Darren New

2006-12-14, 10:07 pm

Stephan Kuhagen wrote:
> Of course, but this can be checked by looking at the contents of the
> error-variable from [catch].


I would like to suggest that any time people want to distinguish causes
of errors, $erroCode be defined and used for that? Trying to parse
human-readable error message strings is fraught with error, and Tcl
provides a very nice alternative to that.

--
Darren New / San Diego, CA, USA (PST)
Scruffitarianism - Where T-shirt, jeans,
and a three-day beard are "Sunday Best."
Michael A. Cleverly

2006-12-15, 4:14 am

On Thu, 14 Dec 2006, Fredderic wrote:

> My question is whether there's any plans for a method of [gets] DoS
> prevention that doesn't require polling the channels periodically.


I'm not sure what you mean by "doesn't require polling the channels
periodically"...

There are two different cases:

1) the channel is in blocking mode (in which case [gets] doesn't
return until it has a complete line period);

or

2) the channel is in non-blocking mode and the event loop is active
(and presumably you've defined a [fileevent readable] callback).

TIP #287 doesn't aim to do anything with blocking input at all. That
would have already been solveable at the script level by a proc that did
something like sit in a while loop doing [read $chan 1] and appending a
character at a time until a newline or some limit was reached.

What TIP #287 was intended to help with was the non-blocking fileevent
driven case. And for that you don't need to do any kind of new polling
that you weren't already doing before.

It's easy to miss the fact that a fileevent readable callback is triggered
anytime there is unread data on the channel *even* when the channel is in
line buffering mode and there isn't a complete line available yet. (This
is precisely the reason we have [fblocked]; to after-the-fact distinguish
between [gets] reading a line of length 0 and [gets] returning (or
assigning to our variable) the empty string because there was an
incomplete line in the buffer.)

So, in your existing fileevent callback you can use [chan pending input
$chan] to p and see how much unread data Tcl has buffered for you. If
it is above some threshold you take appropriate action. What is an
appropriate action will, of course, depend on the nature of your
application. Some possibilities might include:

1) abort the connection (close the socket);

2) use [read $chan $n_bytes] instead of [gets] to partially drain and
process the input buffer

For example here is a simple "echo" server that doesn't like incomplete
lines > 40 characters. (Naturally it requires tcl 8.5 that has [chan
pending]...)

### Beginning of server.tcl

proc accept {sock peer port} {
fconfigure $sock -buffering line -blocking 0 -buffersize 40
fileevent $sock readable [list echo $sock]
puts "Accepted connection ($sock) from $peer"
}

proc echo {sock} {
# Have [gets] store what it reads in $line and return the
# length of the line read (-1 has to be distinguished by
# calling [eof] or [fblocked], etc.)
if {[gets $sock line] > -1} then {
puts "They wrote: $line"
return [puts $sock "You wrote: $line"]
}

# Either we've got an EOF on this channel or
# there was some data but not a full line
if {[eof $sock]} then {
puts "EOF detected"
return [close $sock]
}

# An incomplete line is waiting in the buffer; check
# and see how much unread data there is
if {[set qty [chan pending input $sock]] > 40} then {
puts "Wow, they talk too much! ($qty > 40 !)"
puts $sock "Sorry, you talk too much! ($qty > 40 !)"
return [close $sock]
}

# There is an incomplete line, but the total amount of data
# buffered is still <= 40 bytes. The readable fileevent won't
# fire again until there is either more data or EOF
puts "There wasn't a full line, but they aren't too chatty yet ($qty)"
return
}

socket -server accept 1234
vwait forever

### End of server.tcl / Beginning of client.tcl

set sock [socket 127.0.0.1 1234]
fconfigure $sock -buffering none

puts $sock "Hello World"
flush $sock

puts -nonewline $sock [string repeat ! 30]
flush $sock

# with the data flushed the server will have an incomplete line (len=30)
puts $sock " really."

# pause then burst progressively large amounts of data
foreach {qty} {1 5 10 9876 32767 987654321} {
puts "Trying $qty"
if {[eof $sock] || [catch {
puts -nonewline $sock [string repeat . $qty]
}]} then {
puts "They shut us down before we could finish writing $qty"
break
} else {
flush $sock
}
}

### end of client.tcl

The output I get (on stdout of the server) when I run the above demos is:

powerbook:~ michael$ ~/sandbox/bin/tclsh8.5 server.tcl
Accepted connection (sock6) from 127.0.0.1
There wasn't a full line, but they aren't too chatty yet (11)
They wrote: Hello World
There wasn't a full line, but they aren't too chatty yet (30)
There wasn't a full line, but they aren't too chatty yet (38)
They wrote: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! really.
There wasn't a full line, but they aren't too chatty yet (1)
There wasn't a full line, but they aren't too chatty yet (6)
There wasn't a full line, but they aren't too chatty yet (16)
Wow, they talk too much! (4112 > 40 !)

and on the client:

powerbook:~ michael$ tclsh client.tcl
Trying 1
Trying 5
Trying 10
Trying 9876
They shut us down before we could finish writing 9876

Michael
Stephan Kuhagen

2006-12-15, 4:14 am

Darren New wrote:

> I would like to suggest that any time people want to distinguish causes
> of errors, $erroCode be defined and used for that? Trying to parse
> human-readable error message strings is fraught with error, and Tcl
> provides a very nice alternative to that.


Of course you are right. But the problem with errorCode often seems to be
(for me) that it does not add any information to the error-checking. For
example with channels (and so related to this thread):

% gets noSuchChannel theLine
can not find channel named "noSuchChannel"
% puts $errorCode
NONE

Not very helpfull... So we must parse $errorInfo anyway, which of course is
error prone and not a really good solution.

Regards
Stephan


Stephan Kuhagen

2006-12-15, 4:14 am

Andreas Leitgeb wrote:

>
> Yes, sure there is a way, but I see again the tendency
> of programmers to avoid the real check, and just assume
> that a thrown error implies a too long input line...
>
> Still, this bug is still the harmlesser one compared
> to DoS-prone current [gets].


I think Darren News suggestion of using (a more expressive) $errorCode would
be a good solution for that. If $errorCode would be extended to have more,
and maybe command specific, codes, it would give us much better error
handling options and simple error checking.

> gets theLine 1024
> The channel is of course non-optional (gets stdin theLine 1024).
> And one also should not ignore catch's result in this context,
> or one might run into surprises later :-)


Ah, I used the ReadProgrammerMind-Extended Tclsh, where the channel is
optional, but I should have skipped the [catch] then also, because the
errorInfo automatically materialises in your mind with that... ;-)

>
> I think that unlike your procedural description, gets really
> doesn't "read byte by byte till it finds a eol". I rather think
> it reads a block, scans it for eol's and leaves the rest in some
> input buffer to be used at next gets/read. If it's that way,
> then "unreading" should be all the easier implementable.


I don't know the implementation of [gets]. It was just to demonstrate, what
I think would be an intuitive behaviour in case of an error.

Regards
Stephan

Andreas Leitgeb

2006-12-15, 8:10 am

Stephan Kuhagen <nospam@domain.tld> wrote:
> Ah, I used the ReadProgrammerMind-Extended Tclsh, where the channel is
> optional, but I should have skipped the [catch] then also, because the
> errorInfo automatically materialises in your mind with that... ;-)


Oh, then *think* about submitting the necessary patches to sf.
(you don't need to actually do it: let your tclsh do it for you :-)
[color=darkred]

After rethinking that, it probably wouldn't work that way, still,
because for sufficiently large maxSize's, it might read some
(eol-free) blocks completely, and only later notice the lack of
eol.

So, having gets place up to maxSize already-read bytes into the
variable even for the "no eol in sight"-case, seems to be the
sanest way.

Andreas Leitgeb

2006-12-15, 8:10 am

Michael A. Cleverly <michael@cleverly.com> wrote:
> For example here is a simple "echo" server that doesn't like incomplete
> lines > 40 characters. (Naturally it requires tcl 8.5 that has [chan
> pending]...)
>
> ### Beginning of server.tcl
> proc echo {sock} {
> if {[gets $sock line] > -1} then { ... }
> if {[eof $sock]} then { ... }
> if {[set qty [chan pending input $sock]] > 40} then { ... }
> }


A big ouch! It doesn't take into account,
that between [gets] and [chan pending] some moderate block of
bytes might become available, perhaps even having an eol right
at start (to finish off the previously incomplete short line).

So the script would "think" that a long-line is waiting,
although the other side sends perfectly valid data.

That isn't even just theoretic, but with a 4096byte buffered
stream, it's not unlikely for a "line" to end just at the start
of the next 4096 bytes.

It's not a solution, it's a very race-condition-prone bad hack.

Michael A. Cleverly

2006-12-15, 7:10 pm

On Fri, 15 Dec 2006, Andreas Leitgeb wrote:

> Michael A. Cleverly <michael@cleverly.com> wrote:
>
> A big ouch! It doesn't take into account,
> that between [gets] and [chan pending] some moderate block of
> bytes might become available, perhaps even having an eol right
> at start (to finish off the previously incomplete short line).


Actually no.

The implementation calls the (long existing in the core but never exposed
to the script level) Tcl_InputBuffered() function.

Without having returned to the event loop (presuming there isn't an
[update] between your [gets] and [chan pending]) and without another
attempted input operation (another call to [gets] or [read]) then the data
will be buffered at the OS level, but not yet read/seen/known to Tcl.

So you tried to read a line. The *entire* contents of the data available
to Tcl at the time of the [gets] did not contain an end-of-line. So
gets returns a -1 and leaves all the data in its own (Tcl) buffers,
unread.

Then without attempting to read even more from the OS (by asking Tcl to
attempt another input operation on our behalf) and without re-entering the
event loop or dropping back to the event loop we ask Tcl how much data it
had available to it--that it looked through in search of the non-existant
newline--and then take some application specific action based on the size
it tells us.

> It's not a solution, it's a very race-condition-prone bad hack.


If that were the case then [fblocked] would have always been a very
race-condition-prone bad hack too. But it isn't.

Michael
Darren New

2006-12-15, 7:10 pm

Stephan Kuhagen wrote:
> Not very helpfull... So we must parse $errorInfo anyway, which of course is
> error prone and not a really good solution.


But that's my point. That's a bad way to go about it. What happens when
you localize Tcl and errorCode is now in French?

The right solution is to fix every error to return an errorCode.

In my S3 library, for example, every errorCode is a list that starts
with S3, then the type of error (local, remote, usage, etc), then the
argument that caused the error (so a usage error would have perhaps
-blocking as the third element if you provided "-blocking hello" instead
of a true/false value), and for local errors the fourth element is the
original errorCode, and for remote errors the fourth element is the HTTP
status result, and so on.

Of course, the errorInfo says something useful to the reader. But if you
want to *handle* the errors, having good errorCodes are important. The
answer "errorCode is currently broken, so we might as well leave it
broken and work around it forever" is a suboptimal solution, I think. If
you're defining a new error, you should specify the errorCode that goes
with it, especially if you know you want to handle it automatically.

--
Darren New / San Diego, CA, USA (PST)
Scruffitarianism - Where T-shirt, jeans,
and a three-day beard are "Sunday Best."
Andreas Leitgeb

2006-12-15, 7:10 pm

Michael A. Cleverly <michael@cleverly.com> wrote:
> Actually no.
> Without having returned to the event loop (presuming there isn't an
> [update] between your [gets] and [chan pending]) and without another
> attempted input operation (another call to [gets] or [read]) then the data
> will be buffered at the OS level, but not yet read/seen/known to Tcl.


Oh, that of course makes my comment moot.

> If that were the case then [fblocked] would have always been a very
> race-condition-prone bad hack too. But it isn't.


Actually, I never used [fblocked], but then I never
queried non-blocking channels for more than gets' return
value (being >=0 or not) and [eof]. I never came into
the situation of having a use for the information provided
by [fblocked], yet.

Nevertheless, I'd find a maxSize feature for [gets] much more
programmer-friendly than the solution you offered, despite
that your solution works right now, whereas the maxSize
feature would require a TIP to be written, accepted and
implemented. :-)

Michael A. Cleverly

2006-12-16, 4:12 am

On Fri, 15 Dec 2006, Andreas Leitgeb wrote:

> Nevertheless, I'd find a maxSize feature for [gets] much more
> programmer-friendly than the solution you offered, despite
> that your solution works right now, whereas the maxSize
> feature would require a TIP to be written, accepted and
> implemented. :-)


I wanted to see a solution to the problem in 8.5 and went with the
simplest thing that I knew I'd be able to provide an implementation for in
time. I do think adding some kind of -max switch to [gets] could be
useful for a lot of folks, but I'll happily defer to someone else slightly
more ambitious than myself. :-)

Michael
Fredderic

2006-12-16, 10:07 pm

On Thu, 14 Dec 2006 22:14:37 -0700,
"Michael A. Cleverly" <michael@cleverly.com> wrote:

> What TIP #287 was intended to help with was the non-blocking fileevent
> driven case. And for that you don't need to do any kind of new
> polling that you weren't already doing before.
>
> It's easy to miss the fact that a fileevent readable callback is
> triggered anytime there is unread data on the channel *even* when the
> channel is in line buffering mode and there isn't a complete line
> available yet. (This is precisely the reason we have [fblocked]; to
> after-the-fact distinguish between [gets] reading a line of length 0
> and [gets] returning (or assigning to our variable) the empty string
> because there was an incomplete line in the buffer.)


It is indeed... I hope an example similar to your 40-byte echo server
will make it into the documentation somewhere nice and obvious... :)


Fredderic
Stephan Kuhagen

2006-12-18, 4:16 am

Darren New wrote:

> Of course, the errorInfo says something useful to the reader. But if you
> want to *handle* the errors, having good errorCodes are important. The
> answer "errorCode is currently broken, so we might as well leave it
> broken and work around it forever" is a suboptimal solution, I think.


Okay, maybe I misunderstood you there before. Of course would it be the best
solution, to have useful errorCodes and use them, and I would appreciate,
if there was some strict style guide in the core and for extensions, how
the errorCode _must_ be implemented. What I wrote was just a reaction of
how errorInfo/errorCode currently work.

One more to the errorCode/errorInfo style guide: for example what you do
with S3 may be a good solution for this, but I have the feeling, there has
never been any real thoughts of a working solution, which can be used a
general rule (and requirement) for generating errorInfo/errorCode. But
there should be something like this for a good error handling. If I know S3
and if I know, I'm working just with it, I can build a nice error handling
scheme. But then I add some more libraries and extensions and have to add
individual error handling for each of them, because everybody either finds
his/her own system of useful and well designed error generation rules or
simply doesn't care.

So you are right: if [gets] or anything else fails, it should possible (and
considered good style) to check for the errorCode and handle the error. But
for that to become good practice, errorCode must be a reliable feature
throughout the whole language.

Regards
Stephan

Darren New

2006-12-18, 4:16 am

Stephan Kuhagen wrote:
> if there was some strict style guide in the core and for extensions, how
> the errorCode _must_ be implemented.


I think the best approach is probably to always use a list, always have
the first element indicate the subsystem returning the error, and have
things near the start of the list more general than things near the end
of the list. I think that's how the core does errorCode now in the
places it does, with (for example) ARITH and POSIX error codes.


--
Darren New / San Diego, CA, USA (PST)
Scruffitarianism - Where T-shirt, jeans,
and a three-day beard are "Sunday Best."
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com