Home > Archive > PERL Miscellaneous > June 2005 > Hashes of hashes or just one hash ?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Hashes of hashes or just one hash ?
|
|
| Perl Learner 2005-06-08, 3:59 pm |
| I am storing all the data from a HUGE file into 1 hash with long key
names.
for ex. my key would be something like
NAMEK__PROPERTYA__RELATIONB__SET3__CHARA
CTER4__CONDITION__TABLE
I can also make 7 hashes, one inside the other:
HASH{NAMES}{PROPERTIES}{RELATIONS}{SETS}
{CHARACTERS}{CONDITIONS}{TABLES}
which one would be more efficient ?
I was using 1 big hash instead of cascaded hashes as it would be a lot
simpler. And also, sometimes, my data would stop at some random
point..
for example, for some NAMEs, I might only have
NAMEK__PROPERTYA__TABLE or even NAME__PROPERTYJ, basically it might or
might not have all the possible "sections"
That's why I am using a single hash as it would take care of all
conditions.
I just wanted to ask you guys which one would be more efficient.
thanks.
| |
| Gunnar Hjalmarsson 2005-06-08, 3:59 pm |
| Perl Learner wrote:
> I am storing all the data from a HUGE file into 1 hash with long key
> names.
> for ex. my key would be something like
>
> NAMEK__PROPERTYA__RELATIONB__SET3__CHARA
CTER4__CONDITION__TABLE
>
> I can also make 7 hashes, one inside the other:
> HASH{NAMES}{PROPERTIES}{RELATIONS}{SETS}
{CHARACTERS}{CONDITIONS}{TABLES}
<snip>
> I just wanted to ask you guys which one would be more efficient.
Creating one hash consumes less resources than creating seven hashes, of
course. Which data structure is the most suitable in this case depends
on how you are going to make use of the hash data.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
| |
| Perl Learner 2005-06-08, 3:59 pm |
| thanks for the quick response. using a single hash takes less
resources ? that makes me happy.
oh by the way, i am using it to compare two HUGE files.
so
NAMEK__PROPERTYA of file A against NAMEK__PROPERTYA of file B
and...
NAMEK__PROPERTYA__TABLE of fileA against the same in file B
and so on..
| |
| Anno Siegel 2005-06-08, 3:59 pm |
| Perl Learner <perl707@gmail.com> wrote in comp.lang.perl.misc:
> I am storing all the data from a HUGE file into 1 hash with long key
> names.
> for ex. my key would be something like
>
> NAMEK__PROPERTYA__RELATIONB__SET3__CHARA
CTER4__CONDITION__TABLE
>
> I can also make 7 hashes, one inside the other:
> HASH{NAMES}{PROPERTIES}{RELATIONS}{SETS}
{CHARACTERS}{CONDITIONS}{TABLES}
>
> which one would be more efficient ?
>
> I was using 1 big hash instead of cascaded hashes as it would be a lot
> simpler. And also, sometimes, my data would stop at some random
> point..
> for example, for some NAMEs, I might only have
>
> NAMEK__PROPERTYA__TABLE or even NAME__PROPERTYJ, basically it might or
> might not have all the possible "sections"
>
> That's why I am using a single hash as it would take care of all
> conditions.
Look up "multidimensional array emulation" in perlvar, it may be
what you are looking for.
Anno
| |
| Arne Ruhnau 2005-06-08, 3:59 pm |
| Perl Learner wrote:
> I am storing all the data from a HUGE file into 1 hash with long key
> names.
> for ex. my key would be something like
>
> NAMEK__PROPERTYA__RELATIONB__SET3__CHARA
CTER4__CONDITION__TABLE
>
> I can also make 7 hashes, one inside the other:
> HASH{NAMES}{PROPERTIES}{RELATIONS}{SETS}
{CHARACTERS}{CONDITIONS}{TABLES}
<snip>
> I was using 1 big hash instead of cascaded hashes as it would be a lot
> simpler. And also, sometimes, my data would stop at some random
> point..
> for example, for some NAMEs, I might only have
>
> NAMEK__PROPERTYA__TABLE or even NAME__PROPERTYJ, basically it might or
> might not have all the possible "sections"
Although it depends on the way you will use your data, as Gunnar already
pointed out, you could alternatively use a hash of arrays and bind your
former hash-keys to array-indices. Thereby, you can overcome the mentioned
"gaps" in your data, but have to be prepared to get undef back. You take as
many keys as hash-keys as you can guarantee (seems as if NAME is always
present) and then simply take LOL, like so:
$hash->{name}[
[Property, Relation, Set, Character, Condition, Table],
[Property, Relation, Set, Character, Condition, Table],
];
Of course, to get something that has name A and Relation C, you need
grep { $_->[1] } @{ $hash->{A} }
To make it more readable, you could <use constant> and sort of name your
array-indices.
But, again, it depends on the way you want to use your data. And I cannot
tell you if this would be more efficient...
Arne Ruhnau
| |
| Arne Ruhnau 2005-06-08, 3:59 pm |
| Arne Ruhnau wrote:
> you could alternatively use a hash of arrays and bind your
> former hash-keys to array-indices. Thereby, you can overcome the mentioned
> "gaps" in your data, but have to be prepared to get undef back. You take as
> many keys as hash-keys as you can guarantee (seems as if NAME is always
> present) and then simply take LOL, like so:
>
> $hash->{name}[
> [Property, Relation, Set, Character, Condition, Table],
> [Property, Relation, Set, Character, Condition, Table],
> ];
>
> Of course, to get something that has name A and Relation C, you need
>
> grep { $_->[1] } @{ $hash->{A} }
grep { $_->[1] eq 'C' } @{ $hash->{A} }
*grmbl*
Arne Ruhnau
| |
| xhoster@gmail.com 2005-06-08, 3:59 pm |
| "Perl Learner" <perl707@gmail.com> wrote:
> I am storing all the data from a HUGE file into 1 hash with long key
> names.
> for ex. my key would be something like
>
> NAMEK__PROPERTYA__RELATIONB__SET3__CHARA
CTER4__CONDITION__TABLE
>
> I can also make 7 hashes, one inside the other:
> HASH{NAMES}{PROPERTIES}{RELATIONS}{SETS}
{CHARACTERS}{CONDITIONS}{TABLES}
>
> which one would be more efficient ?
You didn't tell us what you are using these hashes for. If you don't
actually use the data, then it would be more efficient to simply forgo
both methods and not have any hashes at all.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
| |
| xhoster@gmail.com 2005-06-08, 3:59 pm |
| "Perl Learner" <perl707@gmail.com> wrote:
> thanks for the quick response. using a single hash takes less
> resources ? that makes me happy.
>
> oh by the way, i am using it to compare two HUGE files.
How HUGE are they? To me, huge files are at least the size of
main memory, if not more. Which means that even the more efficient
of your hash method won't work.
I'd use system tools to sort each file into a canonical order, and then
use Perl (or even other system tools) to do the comparison on the
canonicalized files in a memory efficient way.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
| |
|
| Perl Learner wrote:
> I am storing all the data from a HUGE file into 1 hash with long key
Sounds to me like you need to load this _HUGE_ data into a database.
This is much better, and much quicker. you could use perl DBI interface
to massage the data, clean it up, and get it in, and then use the DB.
Perl is not really ideal for what you are describing, and those 'keys'
that you are generating sound rather shakey to me. You could do so much
more from the database, and just use Perl to issue SQL statements held
in scalars.
There are plenty of 'free' database, and MS have just realeased a
'free' version of MS 2005 called SQL Express. You can create up to 4gb
databases for nothing on a win32 machine.
| |
| Perl Learner 2005-06-08, 8:57 pm |
| Thanks for the detailed replies folks.
Here're some of my responses to the questions some of you had for me:
1. what will i be using these hashes for?
a. a quick answer is .. to easily compare corresponding values in two
different "databases" as they have the same "key" (or address, if you
will)
the file i am reading in will be something like
Cell (CELLNAME)
{
area :value
capacitance: value
pin(PIN)
{
capacitance:
power
{
blah blah
}
timing
{
blah blah
}
blah blah
}
note that in these "blah blah"s i have skipped over a lot of things
that i need. i have a lot of conditional information, 2D, 3D tables,
or just as simple values.
these tables have values at certain "whens" and at certain "paths" (and
there are a few more other details)
but, basically, that's the basic structure of the file..
now i have 2 files like this that i want to compare. although both the
files are "pretty much" of the same format, there are some minute
differences.
by "compare" i am talking about comparing the values (numbers) at the
same "when"s and "paths" for the same "pins" and the same "cells" etc
etc
i have to extrapolate values from one file and project them to the same
conditions as the other file, and then compare. basically, a lot of
math involved.
and then i want to graph certain things, etc etc.
(i wanted to save you the long story. but in the process, i might have
given too little info. sorry about that).
2. how huge are these files?
a. each file is about 20 MB. i was saying _HUGE_ because i haven't
edited files this big . also the structure of the data in these files
is v. complicated which was too overwhelming for me and i said HUGE.
:-)
i managed to get the parser done and working (took about a w ). i am
using a single hash with a long key name.. something like
CELL:ADDER__PIN:A__RELATEDPIN:CI__TIMING
__TIMINGTYPE:POSITIVE_UNATE__TIMINGSENSE
:RISING__WHEN:!CI__RISETRANSITION
...err.. something like that.
now if i use that key, i get a table back from the hash.
if i just use
CELL:ADDER__CAPACITANCE
i get a single value (of capacitance) back from the hash
since i have a lot of these things to deal with, i figured single hash
would be the simplest.
i am able to get it do its job in about 3 mins using a 64bit linux
machine... and about 6-7 mins using a 32bit linux machine.
although this is not a big deal... i was thinking it could maybe be
done a little faster :-)
3. Perl is not very ideal for what you are describing ........
a. you may be right. i am not that big of a programmer (i learned
perl in 21 days :-) sam's way). i havent done any SQL, database stuff
before. back in the day, i remember fiddling with DBase 3 plus.. but
that was it.
file parsing seemed to be a little easy in perl and it can do my
extrapolation math (basic + - * /, modulus, etc) so i figured perl was
the deal. i mean.. it is working fine now and doing its job.
my question was just subjective and i just wanted to know if there
could be done a bit more efficiently in perl.
i mean, i can just forget optimizing this. its only a 5 min wait for
the results right ;-)
thanks for all your comments folks.
| |
| davidfilmer@gmail.com 2005-06-09, 3:57 am |
| What's that smell? I know that smell from somewhere... Oh, I remember
- it's the smell an application that is just begging for a database on
the back-end.
Just a thought. The hash looks very database-ish. NULLs don't bother
databases, and you have the power of SQL queries to retrieve
information.
| |
|
| Perl Learner wrote:
> perl in 21 days :-) sam's way). i havent done any SQL, database stuff
You could learn enough SQL Express in two days :-) - Basic SELECT
stuff. Valuable for the rest of your programming life. Try:
http://www.w3schools.com/sql/default.asp
> before. back in the day, i remember fiddling with DBase 3 plus.. but
> that was it.
Yeah me too. And Clipper, and Foxpro. They are pants in comparison to
what a real database could do for you.
Databases are complex, and there is a lot behind them, but that does
not mean that you should avoid them. No! Don't delay, start today! Not
only that, but databases compliment Perl nicely, as I mentioned in the
posting above.
|
|
|
|
|