Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Hashtable or Btree?
I've got 100MB of urls organized by domain and then by document. I
thought that a hastable of hastables or a btree of btrees would be a
good way to lookup a specific url quickly by first finding the domain
and then finding the matching document. What do you think would be
better? And do you have any implementation you recommend?

Thanks, and merry Christmas folks.

-Dan

Report this thread to moderator Post Follow-up to this message
Old Post
Eloff
12-23-04 09:10 AM


Re: Hashtable or Btree?
"Eloff" <dan.eloff@gmail.com> wrote in message
news:4817b6fc.0412222244.334397d6@posting.google.com...
> I've got 100MB of urls organized by domain and then by document. I
> thought that a hastable of hastables or a btree of btrees would be a
> good way to lookup a specific url quickly by first finding the domain
> and then finding the matching document. What do you think would be
> better?
Probably hash table.
Note that in this case it is potentially unnecessary to make a
hash table of hash tables - hasing the full URL once might do as well.

> And do you have any implementation you recommend?
The pre-standard hash_map or unordered_map that your standard library
implementation is very likely to include should work well.

If anything was to be optimized, it is probably more on the side
of the string storage... eventually.


hth -Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form



Report this thread to moderator Post Follow-up to this message
Old Post
Ivan Vecerina
12-23-04 09:10 AM


Re: Hashtable or Btree?
Eloff wrote:
> I've got 100MB of urls organized by domain and then by document.
> I thought that a hastable of hastables or a btree of btrees
> would be a good way to lookup a specific url quickly by first
> finding the domain and then finding the matching document. What
> do you think would be better? And do you have any implementation
> you recommend?

I don't see why you would need containers of containers. Your key could
be the (domain, document) tuple or just the URL, for either hashtables
or btrees.

None of the C++ standard containers sound suitable for this. hash_map
is not standard, and while std::map is, and probably does use a tree,
it is not going to be a btree. Also, those sorts of containers would
build data structures in memory, rather than operating on the file
directly. I don't of know of any popular libraries that would do what
you need either. Most people would just use a database; you could try
Google for "embedded database". I found some in C, but not C++.
--
Dave O'Hearn


Report this thread to moderator Post Follow-up to this message
Old Post
Dave O'Hearn
12-23-04 09:07 PM


Re: Hashtable or Btree?
"Eloff" <dan.eloff@gmail.com> wrote in message
news:4817b6fc.0412222244.334397d6@posting.google.com...
> I've got 100MB of urls organized by domain and then by document. I
> thought that a hastable of hastables or a btree of btrees would be a
> good way to lookup a specific url quickly by first finding the domain
> and then finding the matching document. What do you think would be
> better? And do you have any implementation you recommend?
>
> Thanks, and merry Christmas folks.
>
> -Dan

I recommend you put your data in a database first. Then you can use any
programming language to search as you please using SQL queries. A good free
database is mysql:

http://www.mysql.com/

--
Cy
http://home.rochester.rr.com/cyhome/



Report this thread to moderator Post Follow-up to this message
Old Post
Cy Edmunds
12-24-04 02:03 AM


Re: Hashtable or Btree?
Dave O'Hearn wrote:

> ...
> I don't see why you would need containers of containers. Your key could
> be the (domain, document) tuple or just the URL, for either hashtables
> or btrees.
If you have many documents per domain, it could make sense to organize
the data this way in order to reduce memory consuption.


> ...
> you need either. Most people would just use a database; you could try
> Google for "embedded database". I found some in C, but not C++.
sqlite (www.sqlite.org) may be a good choice ...

greetings
Martin

Report this thread to moderator Post Follow-up to this message
Old Post
Martin Stettner
12-27-04 02:01 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

C++ archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 04:27 AM.

 

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.