For Programmers: Free Programming Magazines  


Home > Archive > Tcl > August 2004 > Large binary data manipulation









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Large binary data manipulation
Derek

2004-08-25, 8:57 pm

hi All


I have a binary data file that has interleaved data. I need to extract
the left and right data into seperate variables, keeping binary.

Input file "L|R|L|R|L|R|L|R|L|R...."
Left output L|L|L|L|L|L|L|....
Right output R|R|R|R|R|R|R|....

I tried allocating it as follows

for {set i 0} {$i < $num_samples} {incr i} {
set data_left $data_left[read $file_handle 2]
set data_right $data_right[read $file_handle 2]
}




How ever this took a long time, I presume as it had to re allocate
memory while assigning the next data point.

Is there a better way? just using the core Tcl, i.e. no extensions

P.S. I found it quicker to write to files then read them back in!

Thanks in advance

Derek
Wellcode

2004-08-25, 8:57 pm

Hi Derek

you have two problems in your example:

your first problem is the reading 2 bytes every step. this causes alot
of io operations. The second problem, that you don't use the append
command, your method reallocates the buffer every time you set a new
variable value.
Just read larger buffers and the routines get alot faster.

set data [read $file_handle 10240]
set len [string length $data]
set left_data ""
set right_data ""
for {set i 0} {$i < $len} {incr i 4} {
append left_data [string range $data $i [expr $i+1]]
append right_data [string range $data [expr $i+2] [expr $i+3]]
}

regards
--
Try Code-Navigator on http://www.codenav.com
a source code navigating, analysis and developing tool. It supports
almost all languages on the scope.


Derek wrote:
> hi All
>
>
> I have a binary data file that has interleaved data. I need to extract
> the left and right data into seperate variables, keeping binary.
>
> Input file "L|R|L|R|L|R|L|R|L|R...."
> Left output L|L|L|L|L|L|L|....
> Right output R|R|R|R|R|R|R|....
>
> I tried allocating it as follows
>
> for {set i 0} {$i < $num_samples} {incr i} {
> set data_left $data_left[read $file_handle 2]
> set data_right $data_right[read $file_handle 2]
> }
>
>
>
>
> How ever this took a long time, I presume as it had to re allocate
> memory while assigning the next data point.
>
> Is there a better way? just using the core Tcl, i.e. no extensions
>
> P.S. I found it quicker to write to files then read them back in!
>
> Thanks in advance
>
> Derek



--
Dipl.-Informatiker
Khamis Abuelkomboz
Rosenweg 124
58239 Schwerte
+49 2304 898560 (Telefon)
+49 2304 898561 (Fax)
http://www.wellcode.com
Bryan Oakley

2004-08-25, 8:57 pm

Wellcode wrote:

> Hi Derek
>
> you have two problems in your example:
>
> your first problem is the reading 2 bytes every step. this causes alot
> of io operations. The second problem, that you don't use the append
> command, your method reallocates the buffer every time you set a new
> variable value.
> Just read larger buffers and the routines get alot faster.
>
> set data [read $file_handle 10240]
> set len [string length $data]
> set left_data ""
> set right_data ""
> for {set i 0} {$i < $len} {incr i 4} {
> append left_data [string range $data $i [expr $i+1]]
> append right_data [string range $data [expr $i+2] [expr $i+3]]
> }
>
> regards


.... and it will go even faster if you put {} around the expressions:

...
append left_data [string range $data $i [expr {$i+1}]]
...

Whether it's noticible in your situation or not depends on a lot of
factors but it's a good habit to get into in any case.

USCode

2004-08-25, 8:57 pm

"Bryan Oakley" <oakley@bardo.clearlight.com> wrote
>
> ... and it will go even faster if you put {} around the expressions:
>
> ...
> append left_data [string range $data $i [expr {$i+1}]]
> ...
>
> Whether it's noticible in your situation or not depends on a lot of
> factors but it's a good habit to get into in any case.
>


That's interesting ... for us newbies, why is using {} in expr faster Bryan?
I should get into that habit as well!
Thanks!


Khamis

2004-08-25, 8:57 pm

Wellcode wrote:

> Hi Derek
>
> you have two problems in your example:
>
> your first problem is the reading 2 bytes every step. this causes alot
> of io operations. The second problem, that you don't use the append
> command, your method reallocates the buffer every time you set a new
> variable value.
> Just read larger buffers and the routines get alot faster.
>
> set data [read $file_handle 10240]
> set len [string length $data]
> set left_data ""
> set right_data ""
> for {set i 0} {$i < $len} {incr i 4} {
> append left_data [string range $data $i [expr $i+1]]
> append right_data [string range $data [expr $i+2] [expr $i+3]]
> }
>
> regards



This sounds for me a good solution, but if I think on my programs in c
source code, I would never use i++ if the variable is not set.
So "set i 0" must be placed somewhere in your source code before calling
"incr i" and you don't need to validate it's exist.

It sounds like tcl is more saver than bad c source code :-)

regards
--
Try Code-Navigator on http://www.codenav.com
a source code navigating, analysis and developing tool. It supports
almost all languages on the scope.
Khamis Abuelkomboz

2004-08-25, 8:57 pm

Sorry, reply to wrong theme, should go to [incr] and counting occurences



Khamis wrote:

> Wellcode wrote:
>
>
>
>
> This sounds for me a good solution, but if I think on my programs in c
> source code, I would never use i++ if the variable is not set.
> So "set i 0" must be placed somewhere in your source code before calling
> "incr i" and you don't need to validate it's exist.
>
> It sounds like tcl is more saver than bad c source code :-)
>
> regards

Bruce Stephens

2004-08-25, 8:57 pm

"USCode" <uscode@dontspam.me> writes:

[...]

> That's interesting ... for us newbies, why is using {} in expr
> faster Bryan?


Because expr does its own expansion of its argument, so putting the
argument in braces makes everything clearer to the bytecode compiler.

> I should get into that habit as well!


For stylistic reasons as much as anything. "expr $a*$b" seems simple
until you notice the "set a {7*$c+2}" a few lines earlier.
R. T. Wurth

2004-08-26, 3:57 am

In article <412CF9E2.4070705@wellcode.com>, Wellcode <info@wellcode.com>
wrote:
> Hi Derek
>
> you have two problems in your example:
>
> your first problem is the reading 2 bytes every step. this causes alot
> of io operations. The second problem, that you don't use the append
> command, your method reallocates the buffer every time you set a new
> variable value.
> Just read larger buffers and the routines get alot faster.
>
> set data [read $file_handle 10240]
> set len [string length $data]
> set left_data ""
> set right_data ""
> for {set i 0} {$i < $len} {incr i 4} {
> append left_data [string range $data $i [expr $i+1]]
> append right_data [string range $data [expr $i+2] [expr $i+3]]
> }
>
> regards


Someone else already commented that you will get better performance
if you brace ({ ... }) your expressions. Someone asked why. The
reason is that if the expression is braced, the byte code compiler
recognizes the braced expression as a constant (to the parse step),
and so hard-codes that into the call to the expr command, which then
substitutes the variables. If left unbraced, the byte-code compiler
has to compile in the evaluation of each of the parts, but the expr
command cannot tell it is receiving pre-parsed data through its
arguments, so it still has to re-parse them for substitutions.
Plus, there are certain degenerate cases where this double
substitution could break an expression. (Although there was also a
degenerate case where I deliberately left the braces out just to get
double substitution, such tricks should be reserved for advanced
users and should be very well documented in a comment.)

If you really want to squeeze every last bit of performance out of
your application, compare and benchmark the above to these:

set data [read $file_handle 10240]
set len [string length $data]
set left_data ""
set right_data ""
set i 0
while {$i < $len} {
append left_data [string range $data $i [incr i]]
append right_data [string range $data [incr i] [incr i]]
incr i
}
or,

set data [read $file_handle 10240]
set len [string length $data]
set left_data ""
set right_data ""
set i -1
while {$i < $len - 1} {
append left_data [string range $data [incr i] [incr i]]
append right_data [string range $data [incr i] [incr i]]
}

I'm not sure either one is any faster, but I have a hunch one of
them might be. I'll leave it to you to benchmark and compare them.

This might be faster or slower (or it might run you out of memory),
and since you are dealing with binary data, not character strings,
it might fail altogether:

set data [read $file_handle 10240]
set left_data ""
set right_data ""
set ldat [split $data {} ]
foreach {0 1 2 3} $ldat {
append left_data [join $0 $1 {}]
append right_data [join $2 $3 {}]
}

Note: 0, 1, 2, and 3 are just variable names, nothing magic.
--
Rich Wurth / rwurth@att.net / Rumson, NJ USA
Bob Techentin

2004-08-26, 8:57 am

"Bruce Stephens" <bruce+usenet@cenderis.demon.co.uk> wrote
> "USCode" <uscode@dontspam.me> writes:
>
> Because expr does its own expansion of its argument, so putting the
> argument in braces makes everything clearer to the bytecode

compiler.

If speeding up your application code is important, also see the Tcl
Performance page at http://wiki.tcl.tk/348

Bob
--
Bob Techentin techentin.robert@NOSPAMmayo.edu
Mayo Foundation (507) 538-5495
200 First St. SW FAX (507) 284-9171
Rochester MN, 55901 USA http://www.mayo.edu/sppdg/



Derek

2004-08-27, 8:57 pm

Hi All

I replied to wellcode directly but should have copied the list.

We were ina first phase of writing code, i.e. could it be done with
TCL.
Once it was working we found that the code that strips out the L and R
data to be a tcl/code style bottle neck. We have others but these are
more target processor related.

We knew there must have been a better way to do what we were doing but
were focusing on the binary command rather than moving out side our
self constructed box.

From Wellcode's suggestion we only used the append which increased the
performance of this section of code from 40+ seconds to.... well we
didn't bother measuring it but it must be less than a second. This now
puts it off the radar as far as improving the performance of the code.
There are other S/W involved that may be reviewed if we need futher
performance, but the upper level Tcl is now not an imediate concern.

I have been using Tcl/Expect for test automation for some time, mainly
dealing with regexps and small strings, this was the first time I had
looked at binary data.

It was a revelation that the string command could be used to handle
binary data as well.
It now puts other coding exercises at work more within the grasp of
Tcl than I had first expected.

Thanks to all that contributed, we may use more of the ideas if we
need to improve performance, or after our code review.

Again thanks

Derek Philip


rwurth@att.net (R. T. Wurth) wrote in message news:<cgjc42$23o_002@worldnet.att.net>...
> In article <412CF9E2.4070705@wellcode.com>, Wellcode <info@wellcode.com>
> wrote:
>
> Someone else already commented that you will get better performance
> if you brace ({ ... }) your expressions. Someone asked why. The
> reason is that if the expression is braced, the byte code compiler
> recognizes the braced expression as a constant (to the parse step),
> and so hard-codes that into the call to the expr command, which then
> substitutes the variables. If left unbraced, the byte-code compiler
> has to compile in the evaluation of each of the parts, but the expr
> command cannot tell it is receiving pre-parsed data through its
> arguments, so it still has to re-parse them for substitutions.
> Plus, there are certain degenerate cases where this double
> substitution could break an expression. (Although there was also a
> degenerate case where I deliberately left the braces out just to get
> double substitution, such tricks should be reserved for advanced
> users and should be very well documented in a comment.)
>
> If you really want to squeeze every last bit of performance out of
> your application, compare and benchmark the above to these:
>
> set data [read $file_handle 10240]
> set len [string length $data]
> set left_data ""
> set right_data ""
> set i 0
> while {$i < $len} {
> append left_data [string range $data $i [incr i]]
> append right_data [string range $data [incr i] [incr i]]
> incr i
> }
> or,
>
> set data [read $file_handle 10240]
> set len [string length $data]
> set left_data ""
> set right_data ""
> set i -1
> while {$i < $len - 1} {
> append left_data [string range $data [incr i] [incr i]]
> append right_data [string range $data [incr i] [incr i]]
> }
>
> I'm not sure either one is any faster, but I have a hunch one of
> them might be. I'll leave it to you to benchmark and compare them.
>
> This might be faster or slower (or it might run you out of memory),
> and since you are dealing with binary data, not character strings,
> it might fail altogether:
>
> set data [read $file_handle 10240]
> set left_data ""
> set right_data ""
> set ldat [split $data {} ]
> foreach {0 1 2 3} $ldat {
> append left_data [join $0 $1 {}]
> append right_data [join $2 $3 {}]
> }
>
> Note: 0, 1, 2, and 3 are just variable names, nothing magic.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com