Home > Archive > VC STL > January 2006 > Which container I should choose for better performance?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Which container I should choose for better performance?
|
|
| davihigh@gmail.com 2006-01-09, 11:10 pm |
| Hi Friends:
I am managing a large number of objects which has unique id <ULONG64>.
Which container i should choose to perform best find() performance?
Can choose from STL in vc2003 or Boost 1_33. Currently I am using
std::map<>. Your suggestion?
Rgds, David Xiao
| |
| tom_usenet@hotmail.com 2006-01-09, 11:10 pm |
|
davihigh@gmail.com wrote:
> Hi Friends:
>
> I am managing a large number of objects which has unique id <ULONG64>.
> Which container i should choose to perform best find() performance?
>
> Can choose from STL in vc2003 or Boost 1_33. Currently I am using
> std::map<>. Your suggestion?
VC2003 has a hash_map class. If your ULONG64 values are fairly random,
that would probably be best. If they are not random, you'll need to
write a good hash function to make sure you capture as much variation
as possible. If you can't find a good hash function for the values,
then you should probably stick with std::map, which is nice in that is
offers *worst case* O(log n) finds (as opposed to hash_map, the
performance of which depends heavily on quality of the hash function).
Remember that log(n) is pretty good - a search of billions of values
only take, say, 20 times longer than a search of a handful of values.
Finally, if the values are decided up front, and you aren't performing
insertions and deletions, then a sorted vector (and binary searches
with lower_bound) is faster than std::map by a reasonable constant
factor (certainly 2x as fast I think).
I don't think boost has anything to help you here.
Tom
| |
| davihigh@gmail.com 2006-01-09, 11:10 pm |
| Hi Tom:
Thanks for reply. The key is not random, they seems increase regularly,
[1,2,3.....n].
Also there are heavy insert/remove/find operations. So it seems that
std::map<> will be my best choice?
Rgds, David
| |
| Stephen Howe 2006-01-09, 11:10 pm |
|
> I am managing a large number of objects which has unique id <ULONG64>.
> Which container i should choose to perform best find() performance?
map gives O(log N) find performance. But the number of objects would have to
be large to beat vector O(N) because of map's overheap (poor locality of
reference).
Better still would be sorted vector if the objects stay static for a long
while.
You would use lower_bound() or equal_range().
You get O(log N)
And the hash containers can give a theoretical O(1) with a perfect hash
function.
Stephen Howe
| |
| Stephen Howe 2006-01-09, 11:10 pm |
| > Thanks for reply. The key is not random, they seems increase regularly,
> [1,2,3.....n].
>
> Also there are heavy insert/remove/find operations. So it seems that
> std::map<> will be my best choice?
Most likely. But I would measure things if uncertain.
For map every insert or remove corresponds to memory allocation/deallocation
which is not insignicant.
vector will settle down after a while. Despite the fact that find() on a
vector is O(N), it might give good performance.
Stephen Howe
| |
| Tom Widmer 2006-01-09, 11:10 pm |
| davihigh@gmail.com wrote:
> Hi Tom:
>
> Thanks for reply. The key is not random, they seems increase regularly,
> [1,2,3.....n].
If n isn't too large (e.g. it's less than 2^32), then this might be a
reasonable hash function:
class myhash_compare : public hash_compare<ULONG64> {
Pr comp;
public:
//increase this if you find the map getting too big
const size_t bucket_size = 3;
//make minimum size suitably big, to minimize rehashing
const size_t min_buckets = 100000;
size_t operator()(ULONG64 key) const
{
return static_cast<size_t>(key);
}
};
> Also there are heavy insert/remove/find operations. So it seems that
> std::map<> will be my best choice?
I'd measure it against hash_map and see for myself. std::vector is out
of the running, anyway.
A good way of increasing performance for node based containers like
std::map is to use a custom allocator, which will improve insert/remove
performance and possibly find performance, if your allocator increases
locality of reference. Take a look at boost's pool allocator - remember
to use a specific pool just for your map.
Tom
| |
| Rasmus Dyhr Frederiksen 2006-01-09, 11:10 pm |
| davihigh@gmail.com wrote:
> Hi Friends:
>
> I am managing a large number of objects which has unique id <ULONG64>.
> Which container i should choose to perform best find() performance?
>
> Can choose from STL in vc2003 or Boost 1_33. Currently I am using
> std::map<>. Your suggestion?
>
> Rgds, David Xiao
>
Hi David
Your choice of container depends on whether your set of objects is
static, or is constantly changing.
In simple cases, a sorted vector + equal_range (binary search) is a
compact and very fast solution.
For more dynamic scenarios, a map is useful, but it has a quite high
memory overhead pr. object.
regards,
Rasmus
| |
| Jason Winnebeck 2006-01-17, 8:03 am |
| davihigh@gmail.com wrote:
> Hi Tom:
>
> Thanks for reply. The key is not random, they seems increase regularly,
> [1,2,3.....n].
>
> Also there are heavy insert/remove/find operations. So it seems that
> std::map<> will be my best choice?
>
> Rgds, David
>
If the IDs are "packed" then why not just use an array or vector, since
a vector is a mapping of int of an index to some piece of data. If the
data is almost completely packed (or can be because you can reuse IDs)
this might be OK. If it is a little less packed but still very packed,
you could consider an array of pointers. If a vector-based solution is
possible then you are guaranteed a constant look-up time and good insert
time if the IDs are reused or always increasing -- the only problem here
is with vector reallocations. If it not very packed, then you'll
probably want to use the solutions presented in this thread.
Jason
|
|
|
|
|