Home > Archive > Tcl > April 2005 > performance string comparison
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
performance string comparison
|
|
| Jeannot 2005-04-20, 4:00 pm |
| Hi all,
I have just a question on performance. I just want to know if any as
already checked what is the fastest and also the correct way to test
these expressions (with considering strings and/or numerical values)
1:
if {$a == $b} {
...
}
2:
if {[string equal $a == $b} {
...
}
3:
and maybe a last with expr ?
Thanks in advance
Jeannot
| |
| Robert Seeger 2005-04-20, 4:00 pm |
| If it's a string, I think what you want is:
if { $a eq $b } { .... }
If it's numerical, then:
if { $a == $b } { .... }
The second is needed for numbers, because { 1.0 == 1 }, but it's not
true that { 1.0 eq 1 }, since "eq" is a pure string comparison. The
first should be faster for strings, and the second for numbers (even
numbers of the same string rep?).
Robert Seeger
Jeannot wrote:
> Hi all,
>
> I have just a question on performance. I just want to know if any as
> already checked what is the fastest and also the correct way to test
> these expressions (with considering strings and/or numerical values)
>
>
> 1:
> if {$a == $b} {
> ...
> }
>
> 2:
> if {[string equal $a == $b} {
> ...
> }
>
> 3:
> and maybe a last with expr ?
>
> Thanks in advance
>
> Jeannot
| |
| Khaled 2005-04-21, 4:00 am |
| Jeannot wrote:
> Hi all,
>
> I have just a question on performance. I just want to know if any as
> already checked what is the fastest and also the correct way to test
> these expressions (with considering strings and/or numerical values)
>
>
> 1:
> if {$a == $b} {
> ...
> }
>
> 2:
> if {[string equal $a == $b} {
> ...
> }
>
> 3:
> and maybe a last with expr ?
>
> Thanks in advance
>
> Jeannot
Just test it :)
time {if {$a == $b} { ... }} 100
time {if {[string match $a $b]} { ... }} 100
time {if {$a eq $b} { ... }} 100
Rgrds,
Khaled
| |
| Jeannot 2005-04-21, 4:00 am |
| :-) thank you I tested it and I see no differences between all for
simple comparison
with a and b = 10
and a and b = "toto"
> time {if {$a == $b} { puts t }} 10000
> time {if {[string match $a $b]} { puts t }} 10000
> time {if {[string equal $a $b]} { puts t }} 10000
> time {if {$a eq $b} { puts t }} 10000
thanks again
Khaled a écrit :
> Jeannot wrote:
>
>
>
>
>
> Just test it :)
>
> time {if {$a == $b} { ... }} 100
> time {if {[string match $a $b]} { ... }} 100
> time {if {$a eq $b} { ... }} 100
>
> Rgrds,
> Khaled
>
| |
| Kaitzschu 2005-04-21, 8:58 am |
| On Thu, 21 Apr 2005, Jeannot wrote:
[color=darkred]
> :-) thank you I tested it and I see no differences between all for simple
> comparison
> with a and b = 10
> and a and b = "toto"
Yes you do see, you just have way too high-end equipment :P
You'll need to adjust the test a bit, try the following:
code:
proc iu {a b} {
set r1 0
set r2 0
set r3 0
set r4 0
set t1 [lindex [split [time {
time {if {$a == $b} {set r1 1}} 10000}] { }] 0]
set t2 [lindex [split [time {
time {if {[string match $a $b]} {set r2 1}} 10000}] { }] 0]
set t3 [lindex [split [time {
time {if {[string equal $a $b]} {set r3 1}} 10000}] { }] 0]
set t4 [lindex [split [time {
time {if {$a eq $b} {set r4 1}} 10000}] { }] 0]
puts "$a $b ->\t==: $t1 /$r1\tsm: $t2 /$r2\tse: $t3 /$r3\teq: $t4 /$r4"
}
iu 10 10
iu toto toto
iu {a[bc]d} {abd}
iu 011 9
iu 0xff 255
Don't forget to take a closer look at the r? variables, too. And this one
doesn't even include failing patterns. And if your high-end equipment
still gives you no-difference, then pump that iteration counter (10000) up
to a few million and retry.
--
-Kaitzschu
s="TCL ";while true;do echo -en "\r$s";s=${s:1:${#s}}${s:0:1};sleep .1;done
| |
| Jeannot 2005-04-21, 4:00 pm |
| Ok thanks, I will try
Kaitzschu a écrit :
> On Thu, 21 Apr 2005, Jeannot wrote:
>
>
>
> Yes you do see, you just have way too high-end equipment :P
> You'll need to adjust the test a bit, try the following:
>
> code:
>
> proc iu {a b} {
> set r1 0
> set r2 0
> set r3 0
> set r4 0
> set t1 [lindex [split [time {
> time {if {$a == $b} {set r1 1}} 10000}] { }] 0]
> set t2 [lindex [split [time {
> time {if {[string match $a $b]} {set r2 1}} 10000}] { }] 0]
> set t3 [lindex [split [time {
> time {if {[string equal $a $b]} {set r3 1}} 10000}] { }] 0]
> set t4 [lindex [split [time {
> time {if {$a eq $b} {set r4 1}} 10000}] { }] 0]
> puts "$a $b ->\t==: $t1 /$r1\tsm: $t2 /$r2\tse: $t3 /$r3\teq: $t4 /$r4"
> }
>
> iu 10 10
> iu toto toto
> iu {a[bc]d} {abd}
> iu 011 9
> iu 0xff 255
>
>
>
> Don't forget to take a closer look at the r? variables, too. And this
> one doesn't even include failing patterns. And if your high-end
> equipment still gives you no-difference, then pump that iteration
> counter (10000) up to a few million and retry.
>
| |
| MartinLemburg@UGS 2005-04-22, 8:58 am |
| Hhm - using "==" is not really a slow down, when comparing strings, but
it can be dangerous if the strings are numbers, that must be compared
as strings not as numbers, because these string will be converted
numbers, and than those numbers will be compared.
What's about the internal object types?
Should the "==" operator be faster if e.g. the strings "0xff" and "255"
have the internal type "integer" instead of "string"?
I changed the code above to (added "string compare"):
code:
proc iu {a b} {
global repeats;
# to prevent shimmering
#
set a1 [set a2 [set a3 [set a4 [set a5 $a]]]];
set b1 [set b2 [set b3 [set b4 [set b5 $b]]]];
set r1 [set r2 [set r3 [set r4 [set r5 0]]]];
set t1 [lindex [split [time {time \
{if {$a1 == $b1} {set r1 1}} $repeats}] { }] 0]
set t2 [lindex [split [time {time \
{if {[string match $a1 $b1] == 1} {set r2 1}} $repeats}] { }] 0]
set t3 [lindex [split [time {time \
{if {[string equal $a1 $b1] == 1} {set r3 1}} $repeats}] { }] 0]
set t4 [lindex [split [time {time \
{if {[string compare $a1 $b1] == 0} {set r4 1}} $repeats}] { }] 0]
set t5 [lindex [split [time {time \
{if {$a1 eq $b1} {set r5 1}} $repeats}] { }] 0]
puts "$a $b -> ==: ${t1}/$r1 sm: ${t2}/$r2 se: ${t3}/$r3 sc: ${t4}/$r4
eq: ${t4}/$r4"
}
set repeats 50000;
And I run following tests:
code:
# "simple" examples comparing always "strings"
#
iu 10 10
iu toto toto
iu {a[bc]d} {abd}
iu 011 9
iu 0xff 255
[result]
10 10 -> ==: 64364/1 sm: 75869/1 se: 70262/1 sc: 72300/1 eq: 72300/1
toto toto -> ==: 99066/1 sm: 79240/1 se: 72271/1 sc: 72965/1 eq:
72965/1
a[bc]d abd -> ==: 86288/0 sm: 77563/1 se: 58838/0 sc: 62603/0 eq:
62603/0
011 9 -> ==: 61042/1 sm: 59206/0 se: 58202/0 sc: 59140/0 eq: 59140/0
0xff 255 -> ==: 61511/1 sm: 59427/0 se: 58665/0 sc: 59254/0 eq: 59254/0
[/result]
code:
# examples with "typed" "strings"
#
set a 10;
set b 10;
set c [expr {$a * $b}];
iu $a $b
iu $a $c
iu $c $c
[result]
10 10 -> ==: 61903/1 sm: 73881/1 se: 70868/1 sc: 72999/1 eq: 72999/1
10 100 -> ==: 52354/0 sm: 62127/0 se: 60392/0 sc: 61416/0 eq: 61416/0
100 100 -> ==: 60781/1 sm: 77318/1 se: 69184/1 sc: 70191/1 eq: 70191/1
[/result]
code:
set a [string range $a 0 end];
set b [string range $b 0 end];
set c [string range $c 0 end];
iu $a $b
iu $a $c
iu $c $c
[result]
10 10 -> ==: 61861/1 sm: 73981/1 se: 71953/1 sc: 72072/1 eq: 72072/1
10 100 -> ==: 50234/0 sm: 70064/0 se: 58949/0 sc: 60662/0 eq: 60662/0
100 100 -> ==: 59595/1 sm: 76337/1 se: 69011/1 sc: 69256/1 eq: 69256/1
[/result]
code:
iu [expr {int(0xff)}] [expr {int(255)}];
iu [expr {double(0xff)}] [expr {int(255)}];
[result]
255 255 -> ==: 60616/1 sm: 76223/1 se: 71211/1 sc: 73050/1 eq: 73050/1
255.0 255 -> ==: 62439/1 sm: 68898/0 se: 58965/0 sc: 60713/0 eq:
60713/0
[/result]
So I'm not really able to suggest the usage of "==" or "eq" only using
speed as judgement!
But I would always suggest to clarify code to use "==" only for number
comparisons and "eq" only for string comparison - no matter what is
quicker or not!
A bit surprising for me was, that "eq" is as fast as "string compare".
Why this?
Best regards,
Martin Lemburg
UGS The PLM Company
| |
| Don Porter 2005-04-22, 4:01 pm |
| MartinLemburg@UGS wrote:
> # to prevent shimmering
> #
> set a1 [set a2 [set a3 [set a4 [set a5 $a]]]];
> set b1 [set b2 [set b3 [set b4 [set b5 $b]]]];
Why do you imagine this will prevent shimmering?
--
| Don Porter Mathematical and Computational Sciences Division |
| donald.porter@nist.gov Information Technology Laboratory |
| http://math.nist.gov/~DPorter/ NIST |
|_______________________________________
_______________________________|
| |
| lvirden@gmail.com 2005-04-22, 4:01 pm |
|
According to MartinLemburg@UGS <martin.lemburg.ugs@gmx.net>:
:Hhm - using "==" is not really a slow down, when comparing strings, but
:it can be dangerous if the strings are numbers, that must be compared
:as strings not as numbers, because these string will be converted
:numbers, and than those numbers will be compared.
Also, most performance testing should be done, not as plain commands,
but invoking a proc containing the commands.
This is because otherwise, it is my understanding that you don't get
the byte compilation.
--
<URL: http://wiki.tcl.tk/ > MP3 ID tag repair < http://www.fixtunes.com/?C=17038 >
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
<URL: mailto:lvirden@gmail.com > <URL: http://www.purl.org/NET/lvirden/ >
| |
| MartinLemburg@UGS 2005-04-22, 4:01 pm |
| I thought, that if ...
.... a1 is used in a context, where shimmering will happen, a or a2, a3,
a4, a5 are not influenced, because only a1 will change the internal
type!
That's all.
Martin Lemburg
UGS The PLM Company
| |
| Don Porter 2005-04-22, 4:01 pm |
| MartinLemburg@UGS wrote:
> ... a1 is used in a context, where shimmering will happen, a or a2, a3,
> a4, a5 are not influenced, because only a1 will change the internal
> type!
set a1 [set a2 value];
The variables a1 and a2 share one Tcl_Obj. If it shimmers, both
variables will see the change in internal rep.
If it one of the variables changed value, then copy-on-write will
protect one variable from changes in the other. However, shimmering
doesn't change the value; it changes only the internal details of how
that value is represented.
--
| Don Porter Mathematical and Computational Sciences Division |
| donald.porter@nist.gov Information Technology Laboratory |
| http://math.nist.gov/~DPorter/ NIST |
|_______________________________________
_______________________________|
| |
| MartinLemburg@UGS 2005-04-22, 4:01 pm |
| Ok, I changed the code to:
code:
proc t1 {a b} {
if {$a == $b} { return 1; }
return 0;
}
proc t2 {a b} {
if {[string match $a $b] == 1} { return 1; }
return 0;
}
proc t3 {a b} {
if {[string equal $a $b] == 1} { return 1; }
return 0;
}
proc t4 {a b} {
if {[string compare $a $b] == 0} { return 1; }
return 0;
}
proc t5 {a b} {
if {$a eq $b} { return 1; }
return 0;
}
proc iu {a b} {
global repeats;
set r1 [t1 $a $b]; set r2 [t2 $a $b];
set r3 [t3 $a $b]; set r4 [t4 $a $b];
set r5 [t5 $a $b];
set t1 [lindex [time {t1 $a $b} $repeats] 0]
set t2 [lindex [time {t2 $a $b} $repeats] 0]
set t3 [lindex [time {t3 $a $b} $repeats] 0]
set t4 [lindex [time {t4 $a $b} $repeats] 0]
set t5 [lindex [time {t5 $a $b} $repeats] 0]
puts "$a $b: == ${t1}/$r1 sm ${t2}/$r2 se ${t3}/$r3 sc ${t4}/$r4 eq
${t4}/$r4"
}
I tried following:
code:
iu 10 10
iu toto toto
iu {a[bc]d} {abd}
iu 011 9
iu 0xff 255
set a 10;
set b 10;
set c [expr {$a * $b}];
iu $a $b
iu $a $c
iu $c $c
set a [string range $a 0 end];
set b [string range $b 0 end];
set c [string range $c 0 end];
iu $a $b
iu $a $c
iu $c $c
iu [expr {int(0xff)}] [expr {int(255)}];
iu [expr {double(0xff)}] [expr {int(255)}];
And I got following results:
10 10: == 3/1 sm 3/1 se 3/1 sc 3/1 eq 3/1
toto toto: == 4/1 sm 3/1 se 3/1 sc 3/1 eq 3/1
a[bc]d abd: == 3/0 sm 3/1 se 3/0 sc 3/0 eq 3/0
011 9: == 2/1 sm 3/0 se 3/0 sc 3/0 eq 3/0
0xff 255: == 3/1 sm 3/0 se 3/0 sc 3/0 eq 3/0
10 10: == 3/1 sm 3/1 se 3/1 sc 3/1 eq 3/1
10 100: == 3/0 sm 3/0 se 3/0 sc 3/0 eq 3/0
100 100: == 2/1 sm 3/1 se 3/1 sc 3/1 eq 3/1
10 10: == 3/1 sm 3/1 se 3/1 sc 3/1 eq 3/1
10 100: == 3/0 sm 3/0 se 3/0 sc 3/0 eq 3/0
100 100: == 2/1 sm 3/1 se 3/1 sc 3/1 eq 3/1
255 255: == 3/1 sm 3/1 se 3/1 sc 3/1 eq 3/1
255.0 255: == 3/1 sm 3/0 se 3/0 sc 3/0 eq 3/0
Now every test is a procedure and should be byte compiled.
And I changed the evaluation time measurement from:
set t1 [lindex [split [time {time {if {$a1 == $b1} {set r1 1}}
$repeats}] { }] 0];
to
set t1 [lindex [time {t1 $a $b} $repeats] 0];
The differences in the consumed time per evaluation are so small, that
it doesn't really make sense to discuss about the performance.
IMHO it is still the question, what I like to compare! If its about a
numerical comparison, than the "==" operator has to be used! Is it
about a string comparison, than it's about the "eq" operator! Right?!
Best Regards,
Martin Lemburg
UGS The PLM Company
| |
| MartinLemburg@UGS 2005-04-22, 4:01 pm |
| Ok, I thought every change of "state" will "disconnect" the
"references" to the one Tcl_Obj and replace the references by copies of
this Tcl_Obj.
Good to know, that I'm wrong!
And ... I never committed to the rule "everything is a string", because
not everytime a double value looses its internal representation, the
rebuild internal representation is the same! It depends on the
tcl_precision setting.
I strictly try to prevent shimmering in our applications, because e.g.
the cross multiplication of transformation matrices in tcl, with
shimmering can cause a "huge" "precision" problem!
A question - why do even functions that only compare the string
representations cause a "double" to be converted to a "string"?
And if a "string" is used in an expression, will the "string" shimmer
to a numerical value? But it will loose this internal numerical
representation if I use "string length"?
Sometimes I don't know really if that is the right way of implicit type
conversions.
Best Regards and a nice w end!
Martin Lemburg
UGS The PLM Company
| |
| Don Porter 2005-04-22, 4:01 pm |
| MartinLemburg@UGS wrote:
> And ... I never committed to the rule "everything is a string", because
> not everytime a double value looses its internal representation, the
> rebuild internal representation is the same! It depends on the
> tcl_precision setting.
Yes, and that's why the whole concept of tcl_precision is best
considered a bug.
The changes of TIP 132 go a long way toward fixing that bug in Tcl 8.5.
After it's completed, any tcl_precision problems a program will have
are those it asks for.
--
| Don Porter Mathematical and Computational Sciences Division |
| donald.porter@nist.gov Information Technology Laboratory |
| http://math.nist.gov/~DPorter/ NIST |
|_______________________________________
_______________________________|
| |
| MartinLemburg@UGS 2005-04-22, 4:01 pm |
| Wonderful!
We are waiting for this day!
And we wait for this day, because of the locale thing, that breaks tcl
expressions, after someone changed the locale from "C" to "German".
Happy days are coming!
Best Regards,
Martin Lemburg
UGS The PLM Company
| |
| lvirden@gmail.com 2005-04-22, 4:01 pm |
|
According to MartinLemburg@UGS <martin.lemburg.ugs@gmx.net>:
:Ok, I changed the code to:
:
:code:
:proc iu {a b} {
: global repeats;
:
: set r1 [t1 $a $b]; set r2 [t2 $a $b];
: set r3 [t3 $a $b]; set r4 [t4 $a $b];
: set r5 [t5 $a $b];
:
: set t1 [lindex [time {t1 $a $b} $repeats] 0]
: set t2 [lindex [time {t2 $a $b} $repeats] 0]
: set t3 [lindex [time {t3 $a $b} $repeats] 0]
: set t4 [lindex [time {t4 $a $b} $repeats] 0]
: set t5 [lindex [time {t5 $a $b} $repeats] 0]
:
: puts "$a $b: == ${t1}/$r1 sm ${t2}/$r2 se ${t3}/$r3 sc ${t4}/$r4 eq
:${t4}/$r4"
:}
:
You didn't show the setting of the global variable $::repeats.
What was its value?
:The differences in the consumed time per evaluation are so small, that
:it doesn't really make sense to discuss about the performance.
Surely you wouldn't expect the comparison of a few bytes of data
to be significant, right? If the arguments were strings of, say,
100,000 or perhaps 1,000,000 characters of data, then it would be
reasonable to expect to see some performance hits.
--
<URL: http://wiki.tcl.tk/ > MP3 ID tag repair < http://www.fixtunes.com/?C=17038 >
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
<URL: mailto:lvirden@gmail.com > <URL: http://www.purl.org/NET/lvirden/ >
| |
| MartinLemburg@UGS 2005-04-22, 4:01 pm |
| The value of "::repeats" was 50000.
Nice w end!
Martin Lemburg
UGS The PLM Company
| |
| gustaf.neumann@wu-wien.ac.at 2005-04-23, 3:58 pm |
| MartinLemburg@UGS wrote:
> proc iu {a b} {
....
>
> puts "$a $b: == ${t1}/$r1 sm ${t2}/$r2 se ${t3}/$r3 sc ${t4}/$r4 eq
> ${t4}/$r4"
> }
....
> The differences in the consumed time per evaluation are so small,
that
> it doesn't really make sense to discuss about the performance.
you can't measure with a test like this the speed-difference
for the comparison operators in general, at least not for the
difference between string compare and eq (hint look at the variable
names).
the count is to low, use double nested times, subtract the invocation
overhead. The example below tries to be a little more clever about
these things.
there is a famous german proverb: "wer misst, misst mist"
(who measures, measures junk; the gag in german is that
the word for measures has the same pronounciation like the word for
junk)
The overall message is certainly true: for most applications, the
semantic
differences of the comparison operators are much more important than
the difference in speed. string match (or even regexp) is not a good
idea for string comparisons since they are not commutative and slow.
In most situations, eq is the best for string comparisons.
The test below shows an overall difference on my powerbook g4 by a
factor
of 18 between various ways of comparison (disregarding semantic
differences)
-gustaf
####################
set repeats 1000000
proc t0 {a b} {;}
proc t1 {a b} {if {$a == $b} {return 1} {return 0}}
proc t1a {a b} {expr {$a == $b}}
proc t2 {a b} {string match $a $b}
proc t3 {a b} {string equal $a $b}
proc t4 {a b} {string compare $a $b}
proc t5 {a b} {expr {$a eq $b}}
proc t5a {a b} {if {$a eq $b} {return 1} {return 0}}
proc t6 {a b} {regexp $a $b}
proc iu {a b} {
global repeats tests sum t0;
set r1 [t1 $a $b]; set r2 [t2 $a $b];
set r1a [t1a $a $b];
set r3 [t3 $a $b]; set r4 [t4 $a $b];
set r5 [t5 $a $b]; set r6 [t6 $a $b];
set r5a [t5a $a $b];
foreach test $tests {
set $test [lindex [time [list time [list $test $a $b] $repeats]] 0]
incr $test $t0
incr sum($test) [set $test]
}
puts -nonewline [format %-11s "$a $b"]
puts "== $t1/$r1 $t1a/$r1a sm $t2/$r2 se $t3/$r3 sc $t4/$r4 eq
$t5/$r5 $t5a/$r5a re $t6/$r6 "
}
proc invocation_overhead {} {
global repeats t0
# estimate invocation overhead
set t0 [lindex [time [list time [list t0 1 2] $repeats]] 0]
puts t0=$t0
set t0 [expr {-1*$t0}]
}
invocation_overhead
invocation_overhead
invocation_overhead
set tests {t1 t1a t2 t3 t4 t5 t5a t6}
foreach test $tests {set sum($test) 0}
# ... your test cases
puts -nonewline [format %-11s ""]
puts "== $sum(t1) $sum(t2) $sum(t3) $sum(t4) $sum(t5) $sum(t5a)
$sum(t6)"
| |
| Jeff Hobbs 2005-04-26, 4:02 am |
| MartinLemburg@UGS wrote:
> IMHO it is still the question, what I like to compare! If its about a
> numerical comparison, than the "==" operator has to be used! Is it
> about a string comparison, than it's about the "eq" operator! Right?!
You should just look at the tclbench code and see the results
at http://wiki.tcl.tk/1611 to save time writing your own
benchmarks. I am very careful with how I set things up to
test what I want to test, getting good numbers. Note that
I'm having to find older machines now to give me slower
results - gives better granularity.
In any case, since I wrote a large chunk of the optimization
around the string comparisons, I'll answer the general q.
The answer is "yes" to the above. If you use ==, eq, string
equal|compare without args, then it will be bytecompiled with
low-level instructions.
For the in-depth, look in tcl/generic/tclExecute.c
* == is for integer comparisons. INST_EQ. It will check to
see if operands are numeric first, and try to convert to
numeric, before it falls back to string comparison.
* eq/ne INST_STR_(N)EQ, as well as simpe 'string equal'.
This one can be fast because it first checks to see if the
strings are of equal size before it bothers to do the
comparison. This doesn't handle lots of different obj
types, whereas string compare does ... (could change that).
* string compare INST_STR_CMP. This one checks to see if
both objs are ByteArray or String (unicode) and do special
case checks for those. It's slower in general because
there is always a cmp func involved, in order to get the
-1/0/1 result.
--
Jeff Hobbs, The Tcl Guy
http://www.ActiveState.com/, a division of Sophos
|
|
|
|
|