When selecting a data structure to hold a list of items, evaluate the best container to use by determining how the data is going to be used. No one solution will fit all scenarios, so it’s important to understand what the tradeoffs are. There are many attributes to evaluate, the following summarizes some of them:
Depending on the container you select, there may be more overhead than you realize. For instance, a heap used for memory allocations typically takes up twice as much space than is used by the actual application. This of course, is at the cost of efficiently allocating new chunks of memory and freeing them. When memory requirements are tight, as they often are, consider the data structures that you are selecting.
As with a regular array, a dynamic array such as a C# ArrayList or a C++ vector has no memory overhead other than what is stored. But a linked list incurs an extra pointer (4 bytes) for each node. (An STL <list> is doubly-linked, so it incurs an extra 8 bytes.)
An STL deque is sometimes preferred over a vector because it is as efficient but has the advantage of being able to insert items at the beginning of the list, as well as the end. But unlike a vector, it has a 12 byte overhead per item.
A map is a very handy data structure for many reasons, but one needs to keep in mind that an extra 16 bytes is incurred for each element in the map!
In all cases, keep in mind the memory constraint of your data structure if that is important. Consider how the data will be used. What you may think might be a data structure that holds a few dozen entries, may end up with thousands if not millions of items in it depending on how the software might get used in the future.
Will the container be searched for elements? Given an index to an element in an the vector, one can retrieve the element in constant time because the location of the element can be calculated using the index and the size of the element that’s stored in the vector. This can be done because each element in a vector, like an array, is the same size.
Searching for an element of a given value may require scanning the entire list, when using a vector or a linked-list! A SortedList in C# or map or set in C++, on the other hand, is very fast when locating an entry: containing 1000 elements, it will take at most 10 iterations to locate an entry. With 4 billion entries, it will take at most 32 iterations (log2 n).
But remember that there’s some overhead in using a more complex data structure such as a map or a set. When the list is to contain about a dozen or so items, using a simpler container and doing a linear scan will actually be faster in many cases.
If no new items will be added to the list after the container is initially populated, it may be faster to simply sort the list once and use a binary search.
A technique that is faster than a map is to use a hash_map (Hashtable in C#.) Using a hash_map, however, is as good as the hashing function that is supplied for it. For a regular map, the only requirement is that the stored items can be comparable against each other, for sorting. With a hash_map a hashing function is supplied. The more that the hashing function distributes the items evenly, the faster the look ups will be. In fact, if you know every item that will be in the list at compile time (such as for a list of symbols for an interpreter), you can use a utility to generate a perfect-hashing function with no collisions.
Inserting and Removing EntriesEdit
How fast should inserts occur and how important is it where the new element is inserted into the container? Will you be removing entries from the list after it is populated?
A linked list has a nice property that an element may be inserted into any part of the list in constant time. Inserting into a map or a set is also very fast (log n). A vector however can only insert elements at the end in constant time, but inserting in the front or the middle of a vector requires that all the elements after it be moved, which can be time consuming.
Is it important that the entries remain in some specific order? You may want the list to remain in the order you inserted the items in, such as for a vector, or you may want the list to always be sorted, in which case a map or a set is appropriate. But although a hash_map is sometimes faster for retrieving items than a regular map, it does not retain any type of a useful order.
Do not underestimate the requirement for selecting the right data structure, and always ensure that you fully understand the pros and cons to selecting the right container for your items.