For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > April 2007 > trying to generate integer from string









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author trying to generate integer from string
bpatton

2007-04-25, 8:01 am

I'm trying to generate a unique integer from a string. It must
generate the same integer each time it has the same string. I'm
trying to use unpack to do this.
Here is a small sample. My real version now has @ 2000 strings, but
his is only going up.
my $s1 = '-Dfull_drc=true -Dgds_file=gds.VIA4T -Dgds_layer=VIA4T -
Dstore_layer=VIA4T';
my $s2 = '-Dfull_drc=true -Dgds_file=gds.VIA1T -Dgds_layer=VIA1T -
Dstore_layer=VIA1T';
my ($u1,$u2);
($u1) = unpack("%J*",$s1);
($u2) = unpack("%J*",$s2);
print "u1 = $u1\n";
print "u2 = $u2\n";

If I change the J to an A this example works ok, but hunderds other
one fail.
I'm checking these by creating a perl hash where the $U# is the key
and the string is the value.
So that I check for the existance of a key, if it exists the I compare
the values. if the are equal then it is an error


Here is my actual code : (less genRppPermutations too large) $s1 and
$s2 are examples from genRppPerrmutations.
my @switchList;
my ($rpp,%hash,$key,$string);
foreach $rpp ( qw ( COMBINE GEN_STORE L2G.gatet L2G.met L2G.primary
L2G.umc MASTER_RPP PG_PASS1 PG_PASS2 PG_PASS2.SPLITPOL PG_PASS2.met
PG_PASS3) ) {
@switchList = genRppPermutations($rpp);
foreach $string (@switchList) {
($key) = unpack("%A*",$string);
if (exists $hash{$key}) {
unless ($hash{$key} eq $string) {
print "collision between strings, both generated '$key'\n";
print " s1 : $string\n";
print " s2 : $hash{$key}\n";
}
} else {
$hash{$key} = $string;
}
}
}

Mirco Wahab

2007-04-25, 8:01 am

bpatton wrote:
> I'm trying to generate a unique integer from a string. It must
> generate the same integer each time it has the same string. I'm
> trying to use unpack to do this.
> Here is a small sample. My real version now has @ 2000 strings, but
> his is only going up.
> my $s1 = '-Dfull_drc=true -Dgds_file=gds.VIA4T -Dgds_layer=VIA4T -
> Dstore_layer=VIA4T';
> my $s2 = '-Dfull_drc=true -Dgds_file=gds.VIA1T -Dgds_layer=VIA1T -
> Dstore_layer=VIA1T';
> my ($u1,$u2);
> ($u1) = unpack("%J*",$s1);
> ($u2) = unpack("%J*",$s2);
> print "u1 = $u1\n";
> print "u2 = $u2\n";
>
> If I change the J to an A this example works ok, but hunderds other
> one fail.
> I'm checking these by creating a perl hash where the $U# is the key
> and the string is the value.
> So that I check for the existance of a key, if it exists the I compare
> the values. if the are equal then it is an error


Depending on the length of the string, compute a 10-20 byte 'fingerprint'
of them, for example with the md5 or sha1 algorithm. There are modules for
this purpose, you may use one of the Digest:: Modules
(http://search.cpan.org/~gaas/Digest-1.15/Digest.pm), eg. SHA1

>
> Here is my actual code : (less genRppPermutations too large) $s1 and
> $s2 are examples from genRppPerrmutations.


Example:
==>

use strict;
use warnings;
# print 20 byte number , sha1 (40 byte hex code)
use Digest::SHA1 qw(sha1_hex);

my @strings = qw'
COMBINE GEN_STORE L2G.gatet L2G.met L2G.primary L2G.umc MASTER_RPP PG_PASS1
PG_PASS2 PG_PASS2.SPLITPOL PG_PASS2.met PG_PASS3';

my ($rpp, %hash, $key, $string, $collision);

foreach $rpp (@strings) {
foreach $string ( genRppPermutations($rpp) ) {
$key = sha1_hex( $string );
if( exists $hash{$key} ) {
if( $hash{$key} ne $string ) {
print "collision" . ++$collision . "between generated '$key'\n";
print " s1 : $string\n";
print " s2 : $hash{$key}\n"
}
}
else {
$hash{$key} = $string;
print "$key, "
}
}
}
print "all ok!\n" unless $collision;

<==

Regards

M.
anno4000@radom.zrz.tu-berlin.de

2007-04-25, 8:01 am

bpatton <bpatton@ti.com> wrote in comp.lang.perl.misc:
> I'm trying to generate a unique integer from a string. It must
> generate the same integer each time it has the same string. I'm
> trying to use unpack to do this.
> Here is a small sample. My real version now has @ 2000 strings, but
> his is only going up.


But two thousand is nothing. Just use the strings as hash keys.

If it were two millions or more, using a digest could be meaningful.
If so, use a module that generates a tried-and-(mathematically-)proven
digest instead of am ad-hoc solution.

Anno
Mirco Wahab

2007-04-25, 8:01 am

Mirco Wahab wrote:
> Depending on the length of the string, compute a 10-20 byte 'fingerprint'
> of them, for example with the md5 or sha1 algorithm. There are modules for
> this purpose, you may use one of the Digest:: Modules
> (http://search.cpan.org/~gaas/Digest-1.15/Digest.pm), eg. SHA1


If you need "normal integers (4 byte)" as keys,
you'd look at the CRC32 algorithm, where a
module is also available. The following would
use "regular" integers as keys:
(only modified parts shown)
==>
...
use Digest::CRC qw'crc32';
...

...
foreach $string ( genRppPermutations($rpp) ) {
my $key = crc32($string);
if( exists $hash{$key} ) {
if( $hash{$key} ne $string ) {
print "collision " . ++$collision . " between generated '$key'\n";
print " s1 : $string\n s2 : $hash{$key}\n";
}
}
else {
$hash{$key} = $string;
printf "0x%08X, ", $key
}
}
...


<==

Regards

M.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com