Code Comments
Programming Forum and web based access to our favorite programming groups.I've got 100MB of urls organized by domain and then by document. I thought that a hastable of hastables or a btree of btrees would be a good way to lookup a specific url quickly by first finding the domain and then finding the matching document. What do you think would be better? And do you have any implementation you recommend? Thanks, and merry Christmas folks. -Dan
Post Follow-up to this message"Eloff" <dan.eloff@gmail.com> wrote in message news:4817b6fc.0412222244.334397d6@posting.google.com... > I've got 100MB of urls organized by domain and then by document. I > thought that a hastable of hastables or a btree of btrees would be a > good way to lookup a specific url quickly by first finding the domain > and then finding the matching document. What do you think would be > better? Probably hash table. Note that in this case it is potentially unnecessary to make a hash table of hash tables - hasing the full URL once might do as well. > And do you have any implementation you recommend? The pre-standard hash_map or unordered_map that your standard library implementation is very likely to include should work well. If anything was to be optimized, it is probably more on the side of the string storage... eventually. hth -Ivan -- http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Post Follow-up to this messageEloff wrote: > I've got 100MB of urls organized by domain and then by document. > I thought that a hastable of hastables or a btree of btrees > would be a good way to lookup a specific url quickly by first > finding the domain and then finding the matching document. What > do you think would be better? And do you have any implementation > you recommend? I don't see why you would need containers of containers. Your key could be the (domain, document) tuple or just the URL, for either hashtables or btrees. None of the C++ standard containers sound suitable for this. hash_map is not standard, and while std::map is, and probably does use a tree, it is not going to be a btree. Also, those sorts of containers would build data structures in memory, rather than operating on the file directly. I don't of know of any popular libraries that would do what you need either. Most people would just use a database; you could try Google for "embedded database". I found some in C, but not C++. -- Dave O'Hearn
Post Follow-up to this message"Eloff" <dan.eloff@gmail.com> wrote in message news:4817b6fc.0412222244.334397d6@posting.google.com... > I've got 100MB of urls organized by domain and then by document. I > thought that a hastable of hastables or a btree of btrees would be a > good way to lookup a specific url quickly by first finding the domain > and then finding the matching document. What do you think would be > better? And do you have any implementation you recommend? > > Thanks, and merry Christmas folks. > > -Dan I recommend you put your data in a database first. Then you can use any programming language to search as you please using SQL queries. A good free database is mysql: http://www.mysql.com/ -- Cy http://home.rochester.rr.com/cyhome/
Post Follow-up to this messageDave O'Hearn wrote: > ... > I don't see why you would need containers of containers. Your key could > be the (domain, document) tuple or just the URL, for either hashtables > or btrees. If you have many documents per domain, it could make sense to organize the data this way in order to reduce memory consuption. > ... > you need either. Most people would just use a database; you could try > Google for "embedded database". I found some in C, but not C++. sqlite (www.sqlite.org) may be a good choice ... greetings Martin
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.