For Programmers: Free Programming Magazines  


Home > Archive > Tcl > September 2006 > Questions on SEEK









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Questions on SEEK
Tom Conner

2006-09-01, 7:02 pm

Two questions regarding SEEK.

1. Why does s crash if the file size is smaller than the number of bytes
being sed? For example:

s ./testFile -200 end

crashes if the file is less than 200 bytes in size. Expected behavior would
be to just return whatever bytes are in the file.

2. Is there a performance hit when sing large files? In other words,
does the OS know where the end of the file is, or does it need to go through
each line until it finds the end.


Michael Schlenker

2006-09-01, 7:02 pm

Tom Conner schrieb:
> Two questions regarding SEEK.
>
> 1. Why does s crash if the file size is smaller than the number of bytes
> being sed? For example:
>
> s ./testFile -200 end


s takes a channelId not a file name, so either you get an error
message or your example is wrong. If it crashes, please cut and copy the
exact error message/crash message and provide the results of 'info
patchlevel'.
>
> crashes if the file is less than 200 bytes in size. Expected behavior would
> be to just return whatever bytes are in the file.


s in Tcl changes the file pointer, it returns no bytes.

See:
[url]http://tmml.sourceforge.net/doc/tcl/s.html[/url]

>
> 2. Is there a performance hit when sing large files? In other words,
> does the OS know where the end of the file is, or does it need to go through
> each line until it finds the end.
>

Most OS's do not have a concept of 'lines'. Otherwise its implementation
dependent. Its trivial to measure for yourself, just create some large
some hundred megabytes to your disk and try sing and do some timings
with tcls time command.

Michael
Robert Heller

2006-09-01, 7:02 pm

At Fri, 01 Sep 2006 17:12:17 GMT "Tom Conner" <tconner@olopha.net> wrote:

>
> Two questions regarding SEEK.
>
> 1. Why does s crash if the file size is smaller than the number of bytes
> being sed? For example:
>
> s ./testFile -200 end
>
> crashes if the file is less than 200 bytes in size. Expected behavior would
> be to just return whatever bytes are in the file.


I'm presuming s eventually calls fs (3) , which in turn calls
ls (2):

From man 2 ls:

ERRORS
EBADF fildes is not an open file descriptor.

ESPIPE fildes is associated with a pipe, socket, or FIFO.

EINVAL whence is not one of SEEK_SET, SEEK_CUR, SEEK_END, or the
resulting file offset would be negative.
note------------------------------------------^

EOVERFLOW
The resulting file offset cannot be represented in an off_t.

% exec touch /tmp/empty.file
% set fp [open /tmp/empty.file r]
file3
% s $fp -200 end
error during s on "file3": invalid argument
% s $fp 0 end
% s $fp -1 end
error during s on "file3": invalid argument

"invalid argument" == EINVAL

I would expect that 'catch' could be your friend:

proc safes {fp offset {origin start}} {
global errorInfo errorCode
if {[catch "s $fp $offset $origin" message]} {
if {[regexp {: invalid argument$} "$message"] > 0} {
s $fp 0 start
} else {
error "$message" $errorInfo $errorCode
}
}
}

>
> 2. Is there a performance hit when sing large files? In other words,
> does the OS know where the end of the file is, or does it need to go through
> each line until it finds the end.
>
>
>


--
Robert Heller -- 978-544-6933
Deepwoods Software -- Linux Installation and Administration
http://www.deepsoft.com/ -- Web Hosting, with CGI and Database
heller@deepsoft.com -- Contract Programming: C/C++, Tcl/Tk

Tom Conner

2006-09-01, 7:02 pm


"Robert Heller" <heller@deepsoft.com> wrote in message
news:bd14c$44f877ba$404a99a1$23499@news.news-service.com...
> At Fri, 01 Sep 2006 17:12:17 GMT "Tom Conner" <tconner@olopha.net> wrote:
>

In response to another answer, yes my example is wrong. In the actual code
I am using a channelId.
[color=darkred]
>
> I'm presuming s eventually calls fs (3) , which in turn calls
> ls (2):
>
> From man 2 ls:
>
> ERRORS
> EBADF fildes is not an open file descriptor.
>
> ESPIPE fildes is associated with a pipe, socket, or FIFO.
>
> EINVAL whence is not one of SEEK_SET, SEEK_CUR, SEEK_END, or

the
> resulting file offset would be negative.
> note------------------------------------------^


I wonder why returning an error was chosen instead of just returning a
pointer to the beginning of the file if the reverse offset is larger than
the file size.


>
> EOVERFLOW
> The resulting file offset cannot be represented in an off_t.
>
> % exec touch /tmp/empty.file
> % set fp [open /tmp/empty.file r]
> file3
> % s $fp -200 end
> error during s on "file3": invalid argument
> % s $fp 0 end
> % s $fp -1 end
> error during s on "file3": invalid argument
>
> "invalid argument" == EINVAL
>
> I would expect that 'catch' could be your friend:
>


I used [file size] instead to determine whether to read from the beginning,
or the end, of the file.

Thanks for the answers.


Robert Heller

2006-09-01, 7:02 pm

At Fri, 01 Sep 2006 18:38:56 GMT "Tom Conner" <tconner@olopha.net> wrote:

>
>
> "Robert Heller" <heller@deepsoft.com> wrote in message
> news:bd14c$44f877ba$404a99a1$23499@news.news-service.com...
>
> In response to another answer, yes my example is wrong. In the actual code
> I am using a channelId.
>
> the
>
> I wonder why returning an error was chosen instead of just returning a
> pointer to the beginning of the file if the reverse offset is larger than
> the file size.


I'm guessing that the Tcl s command has *exactly* the same semantics
as the C functions fs() & ls(), including error conditions. If it
had *different* semantics, things would be more confusing.

>
>
>
> I used [file size] instead to determine whether to read from the beginning,
> or the end, of the file.
>
> Thanks for the answers.
>
>
>


--
Robert Heller -- 978-544-6933
Deepwoods Software -- Linux Installation and Administration
http://www.deepsoft.com/ -- Web Hosting, with CGI and Database
heller@deepsoft.com -- Contract Programming: C/C++, Tcl/Tk

Gerald W. Lester

2006-09-01, 7:02 pm

Tom Conner wrote:
> Two questions regarding SEEK.
>
> 1. Why does s crash if the file size is smaller than the number of bytes
> being sed? For example:
>
> s ./testFile -200 end
>
> crashes if the file is less than 200 bytes in size. Expected behavior would
> be to just return whatever bytes are in the file.
>
> 2. Is there a performance hit when sing large files? In other words,
> does the OS know where the end of the file is, or does it need to go through
> each line until it finds the end.


What OS are you on -- I've just tried sing past the EOF, before the SOF
and have no errors. I've also attempted tells and reads of one character
when past with no crashes.

You may have a bad build -- are you using an ActiveState build or did you
build it yourself or did someone build it for you?

--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
Robert Heller

2006-09-01, 7:02 pm

At Fri, 01 Sep 2006 14:51:05 -0500 "Gerald W. Lester" <Gerald.Lester@cox.net> wrote:

>
> Tom Conner wrote:
>
> What OS are you on -- I've just tried sing past the EOF, before the SOF
> and have no errors. I've also attempted tells and reads of one character
> when past with no crashes.


Sing past EOF is allowed. Sing to a negative offset is not (at
least under Linux and probably UNIX in general).

>
> You may have a bad build -- are you using an ActiveState build or did you
> build it yourself or did someone build it for you?
>


--
Robert Heller -- 978-544-6933
Deepwoods Software -- Linux Installation and Administration
http://www.deepsoft.com/ -- Web Hosting, with CGI and Database
heller@deepsoft.com -- Contract Programming: C/C++, Tcl/Tk

Donal K. Fellows

2006-09-01, 7:02 pm

Tom Conner wrote:
> 1. Why does s crash if the file size is smaller than the number of bytes
> being sed?


When you say "crash" do you mean throw an error? This is because the
[s] command is supposed to throw an error when the underlying
syscall returns some error condition (or if the arguments are
malformatted, but that doesn't apply here). If that's what's happening,
that's just how things are; write your code to cope. :-)

If the whole tclsh executable is crashing, that's bad.

> 2. Is there a performance hit when sing large files? In other words,
> does the OS know where the end of the file is, or does it need to go through
> each line until it finds the end.


I've never heard of an OS that didn't implement the ls() syscall
efficiently (on non-serial media; tapes are a whole 'nother thing).
Internally, disks are a random access[*] collection of sectors, each
notionally holding 512 bytes, and sing is easy and simple. Indeed,
it can be much quicker than spooling through, since if you've got a few
tens of gigs of data, going straight to the bit you really want is a
great short cut.

The only divantage of [s] is that it works entirely at the byte
level, and so is far more suited to record-oriented data than for text.
Combining sing with line-oriented input is not simple, especially if
you allow unbounded line lengths...

Donal.
[* Actually things are much more complex than this. But it's a good
model. ]

Darren New

2006-09-01, 7:02 pm

Donal K. Fellows wrote:
> I've never heard of an OS that didn't implement the ls() syscall
> efficiently (on non-serial media; tapes are a whole 'nother thing).


The Atari 800 OS springs to mind. Each sector was chained to the next in
that sector's data, like asingly linked list, so sing essentially
required reading thru the file. Appending worked by writing a new chain
of sectors, and then when you closed the file it read thru the first
file to the end and then adjusted the pointer. And since each read was
accompanied by a "beep" to let you know, it could get pretty annoying.

--
Darren New / San Diego, CA, USA (PST)
This octopus isn't tasty. Too many
tentacles, not enough chops.
Tom Conner

2006-09-01, 7:02 pm


"Donal K. Fellows" <donal.k.fellows@man.ac.uk> wrote in message
news:1157143609.444436.162680@b28g2000cwb.googlegroups.com...
> Tom Conner wrote:
>
> When you say "crash" do you mean throw an error?


Yes. Poor semantics on my part. Sorry.

>
> I've never heard of an OS that didn't implement the ls() syscall
> efficiently (on non-serial media; tapes are a whole 'nother thing).
> Internally, disks are a random access[*] collection of sectors, each
> notionally holding 512 bytes, and sing is easy and simple. Indeed,
> it can be much quicker than spooling through, since if you've got a
> few tens of gigs of data, going straight to the bit you really want
> is a great short cut.
>


I wasn't sure how an OS (Linux in this case) dealt with files, but, thanks
to everyone, I now have a good understanding. Testing s on a "large"
(500K) file didn't register any CPU increase (as seen by top), or time
increase as compared to a "normal" (in this environment) size file of around
100 bytes.


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com