1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
2 xml:id="manual.ext.containers.pbds" xreflabel="pbds">
4 <title>Policy-Based Data Structures</title>
38 <?dbhtml filename="policy_data_structures.html"?>
40 <!-- 2006-04-01 Ami Tavory -->
41 <!-- 2011-05-25 Benjamin Kosnik -->
44 <section xml:id="pbds.intro">
45 <info><title>Intro</title></info>
48 This is a library of policy-based elementary data structures:
49 associative containers and priority queues. It is designed for
50 high-performance, flexibility, semantic safety, and conformance to
51 the corresponding containers in <literal>std</literal> and
52 <literal>std::tr1</literal> (except for some points where it differs
58 <section xml:id="pbds.intro.issues">
59 <info><title>Performance Issues</title></info>
64 An attempt is made to categorize the wide variety of possible
65 container designs in terms of performance-impacting factors. These
66 performance factors are translated into design policies and
67 incorporated into container design.
71 There is tension between unravelling factors into a coherent set of
72 policies. Every attempt is made to make a minimal set of
73 factors. However, in many cases multiple factors make for long
74 template names. Every attempt is made to alias and use typedefs in
75 the source files, but the generated names for external symbols can
76 be large for binary files or debuggers.
80 In many cases, the longer names allow capabilities and behaviours
81 controlled by macros to also be unamibiguously emitted as distinct
86 Specific issues found while unraveling performance factors in the
87 design of associative containers and priority queues follow.
90 <section xml:id="pbds.intro.issues.associative">
91 <info><title>Associative</title></info>
94 Associative containers depend on their composite policies to a very
95 large extent. Implicitly hard-wiring policies can hamper their
96 performance and limit their functionality. An efficient hash-based
97 container, for example, requires policies for testing key
98 equivalence, hashing keys, translating hash values into positions
99 within the hash table, and determining when and how to resize the
100 table internally. A tree-based container can efficiently support
101 order statistics, i.e. the ability to query what is the order of
102 each key within the sequence of keys in the container, but only if
103 the container is supplied with a policy to internally update
104 meta-data. There are many other such examples.
108 Ideally, all associative containers would share the same
109 interface. Unfortunately, underlying data structures and mapping
110 semantics differentiate between different containers. For example,
111 suppose one writes a generic function manipulating an associative
116 template<typename Cntnr>
118 some_op_sequence(Cntnr& r_cnt)
125 Given this, then what can one assume about the instantiating
126 container? The answer varies according to its underlying data
127 structure. If the underlying data structure of
128 <literal>Cntnr</literal> is based on a tree or trie, then the order
129 of elements is well defined; otherwise, it is not, in general. If
130 the underlying data structure of <literal>Cntnr</literal> is based
131 on a collision-chaining hash table, then modifying
132 r_<literal>Cntnr</literal> will not invalidate its iterators' order;
133 if the underlying data structure is a probing hash table, then this
134 is not the case. If the underlying data structure is based on a tree
135 or trie, then a reference to the container can efficiently be split;
136 otherwise, it cannot, in general. If the underlying data structure
137 is a red-black tree, then splitting a reference to the container is
138 exception-free; if it is an ordered-vector tree, exceptions can be
144 <section xml:id="pbds.intro.issues.priority_queue">
145 <info><title>Priority Que</title></info>
148 Priority queues are useful when one needs to efficiently access a
149 minimum (or maximum) value as the set of values changes.
153 Most useful data structures for priority queues have a relatively
154 simple structure, as they are geared toward relatively simple
155 requirements. Unfortunately, these structures do not support access
156 to an arbitrary value, which turns out to be necessary in many
157 algorithms. Say, decreasing an arbitrary value in a graph
158 algorithm. Therefore, some extra mechanism is necessary and must be
159 invented for accessing arbitrary values. There are at least two
160 alternatives: embedding an associative container in a priority
161 queue, or allowing cross-referencing through iterators. The first
162 solution adds significant overhead; the second solution requires a
163 precise definition of iterator invalidation. Which is the next
168 Priority queues, like hash-based containers, store values in an
169 order that is meaningless and undefined externally. For example, a
170 <code>push</code> operation can internally reorganize the
171 values. Because of this characteristic, describing a priority
172 queues' iterator is difficult: on one hand, the values to which
173 iterators point can remain valid, but on the other, the logical
174 order of iterators can change unpredictably.
178 Roughly speaking, any element that is both inserted to a priority
179 queue (e.g. through <code>push</code>) and removed
180 from it (e.g., through <code>pop</code>), incurs a
181 logarithmic overhead (in the amortized sense). Different underlying
182 data structures place the actual cost differently: some are
183 optimized for amortized complexity, whereas others guarantee that
184 specific operations only have a constant cost. One underlying data
185 structure might be chosen if modifying a value is frequent
186 (Dijkstra's shortest-path algorithm), whereas a different one might
187 be chosen otherwise. Unfortunately, an array-based binary heap - an
188 underlying data structure that optimizes (in the amortized sense)
189 <code>push</code> and <code>pop</code> operations, differs from the
190 others in terms of its invalidation guarantees. Other design
191 decisions also impact the cost and placement of the overhead, at the
192 expense of more difference in the the kinds of operations that the
193 underlying data structure can support. These differences pose a
194 challenge when creating a uniform interface for priority queues.
199 <section xml:id="pbds.intro.motivation">
200 <info><title>Goals</title></info>
203 Many fine associative-container libraries were already written,
204 most notably, the C++ standard's associative containers. Why
205 then write another library? This section shows some possible
206 advantages of this library, when considering the challenges in
207 the introduction. Many of these points stem from the fact that
208 the ISO C++ process introduced associative-containers in a
209 two-step process (first standardizing tree-based containers,
210 only then adding hash-based containers, which are fundamentally
211 different), did not standardize priority queues as containers,
212 and (in our opinion) overloads the iterator concept.
215 <section xml:id="pbds.intro.motivation.associative">
216 <info><title>Associative</title></info>
220 <section xml:id="motivation.associative.policy">
221 <info><title>Policy Choices</title></info>
223 Associative containers require a relatively large number of
224 policies to function efficiently in various settings. In some
225 cases this is needed for making their common operations more
226 efficient, and in other cases this allows them to support a
227 larger set of operations
233 Hash-based containers, for example, support look-up and
234 insertion methods (<function>find</function> and
235 <function>insert</function>). In order to locate elements
236 quickly, they are supplied a hash functor, which instruct
237 how to transform a key object into some size type; a hash
238 functor might transform <constant>"hello"</constant>
239 into <constant>1123002298</constant>. A hash table, though,
240 requires transforming each key object into some size-type
241 type in some specific domain; a hash table with a 128-long
242 table might transform <constant>"hello"</constant> into
243 position <constant>63</constant>. The policy by which the
244 hash value is transformed into a position within the table
245 can dramatically affect performance. Hash-based containers
246 also do not resize naturally (as opposed to tree-based
247 containers, for example). The appropriate resize policy is
248 unfortunately intertwined with the policy that transforms
249 hash value into a position within the table.
255 Tree-based containers, for example, also support look-up and
256 insertion methods, and are primarily useful when maintaining
257 order between elements is important. In some cases, though,
258 one can utilize their balancing algorithms for completely
263 Figure A shows a tree whose each node contains two entries:
264 a floating-point key, and some size-type
265 <emphasis>metadata</emphasis> (in bold beneath it) that is
266 the number of nodes in the sub-tree. (The root has key 0.99,
267 and has 5 nodes (including itself) in its sub-tree.) A
268 container based on this data structure can obviously answer
269 efficiently whether 0.3 is in the container object, but it
270 can also answer what is the order of 0.3 among all those in
271 the container object: see <xref linkend="biblio.clrs2001"/>.
276 As another example, Figure B shows a tree whose each node
277 contains two entries: a half-open geometric line interval,
278 and a number <emphasis>metadata</emphasis> (in bold beneath
279 it) that is the largest endpoint of all intervals in its
280 sub-tree. (The root describes the interval <constant>[20,
281 36)</constant>, and the largest endpoint in its sub-tree is
282 99.) A container based on this data structure can obviously
283 answer efficiently whether <constant>[3, 41)</constant> is
284 in the container object, but it can also answer efficiently
285 whether the container object has intervals that intersect
286 <constant>[3, 41)</constant>. These types of queries are
287 very useful in geometric algorithms and lease-management
292 It is important to note, however, that as the trees are
293 modified, their internal structure changes. To maintain
294 these invariants, one must supply some policy that is aware
295 of these changes. Without this, it would be better to use a
296 linked list (in itself very efficient for these purposes).
303 <title>Node Invariants</title>
306 <imagedata align="center" format="PNG" scale="100"
307 fileref="../images/pbds_node_invariants.png"/>
310 <phrase>Node Invariants</phrase>
317 <section xml:id="motivation.associative.underlying">
318 <info><title>Underlying Data Structures</title></info>
320 The standard C++ library contains associative containers based on
321 red-black trees and collision-chaining hash tables. These are
322 very useful, but they are not ideal for all types of
327 The figure below shows the different underlying data structures
328 currently supported in this library.
332 <title>Underlying Associative Data Structures</title>
335 <imagedata align="center" format="PNG" scale="100"
336 fileref="../images/pbds_different_underlying_dss_1.png"/>
339 <phrase>Underlying Associative Data Structures</phrase>
345 A shows a collision-chaining hash-table, B shows a probing
346 hash-table, C shows a red-black tree, D shows a splay tree, E shows
347 a tree based on an ordered vector(implicit in the order of the
348 elements), F shows a PATRICIA trie, and G shows a list-based
349 container with update policies.
353 Each of these data structures has some performance benefits, in
354 terms of speed, size or both. For now, note that vector-based trees
355 and probing hash tables manipulate memory more efficiently than
356 red-black trees and collision-chaining hash tables, and that
357 list-based associative containers are very useful for constructing
362 Now consider a function manipulating a generic associative
366 template<class Cntnr>
368 some_op_sequence(Cntnr &r_cnt)
375 Ideally, the underlying data structure
376 of <classname>Cntnr</classname> would not affect what can be
377 done with <varname>r_cnt</varname>. Unfortunately, this is not
382 For example, if <classname>Cntnr</classname>
383 is <classname>std::map</classname>, then the function can
387 std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
390 in order to apply <classname>foobar</classname> to all
391 elements between <classname>foo</classname> and
392 <classname>bar</classname>. If
393 <classname>Cntnr</classname> is a hash-based container,
394 then this call's results are undefined.
398 Also, if <classname>Cntnr</classname> is tree-based, the type
399 and object of the comparison functor can be
400 accessed. If <classname>Cntnr</classname> is hash based, these
401 queries are nonsensical.
405 There are various other differences based on the container's
406 underlying data structure. For one, they can be constructed by,
407 and queried for, different policies. Furthermore:
413 Containers based on C, D, E and F store elements in a
414 meaningful order; the others store elements in a meaningless
415 (and probably time-varying) order. By implication, only
416 containers based on C, D, E and F can
417 support <function>erase</function> operations taking an
418 iterator and returning an iterator to the following element
419 without performance loss.
425 Containers based on C, D, E, and F can be split and joined
426 efficiently, while the others cannot. Containers based on C
427 and D, furthermore, can guarantee that this is exception-free;
428 containers based on E cannot guarantee this.
434 Containers based on all but E can guarantee that
435 erasing an element is exception free; containers based on E
436 cannot guarantee this. Containers based on all but B and E
437 can guarantee that modifying an object of their type does
438 not invalidate iterators or references to their elements,
439 while containers based on B and E cannot. Containers based
440 on C, D, and E can furthermore make a stronger guarantee,
441 namely that modifying an object of their type does not
442 affect the order of iterators.
448 A unified tag and traits system (as used for the C++ standard
449 library iterators, for example) can ease generic manipulation of
450 associative containers based on different underlying data
456 <section xml:id="motivation.associative.iterators">
457 <info><title>Iterators</title></info>
459 Iterators are centric to the design of the standard library
460 containers, because of the container/algorithm/iterator
461 decomposition that allows an algorithm to operate on a range
462 through iterators of some sequence. Iterators, then, are useful
463 because they allow going over a
464 specific <emphasis>sequence</emphasis>. The standard library
465 also uses iterators for accessing a
466 specific <emphasis>element</emphasis>: when an associative
467 container returns one through <function>find</function>. The
468 standard library consistently uses the same types of iterators
469 for both purposes: going over a range, and accessing a specific
470 found element. Before the introduction of hash-based containers
471 to the standard library, this made sense (with the exception of
472 priority queues, which are discussed later).
476 Using the standard associative containers together with
477 non-order-preserving associative containers (and also because of
478 priority-queues container), there is a possible need for
479 different types of iterators for self-organizing containers:
480 the iterator concept seems overloaded to mean two different
481 things (in some cases). <remark> XXX
482 "ds_gen.html#find_range">Design::Associative
483 Containers::Data-Structure Genericity::Point-Type and Range-Type
487 <section xml:id="associative.iterators.using">
489 <title>Using Point Iterators for Range Operations</title>
492 Suppose <classname>cntnr</classname> is some associative
493 container, and say <varname>c</varname> is an object of
494 type <classname>cntnr</classname>. Then what will be the outcome
499 std::for_each(c.find(1), c.find(5), foo);
503 If <classname>cntnr</classname> is a tree-based container
504 object, then an in-order walk will
505 apply <classname>foo</classname> to the relevant elements,
506 as in the graphic below, label A. If <varname>c</varname> is
507 a hash-based container, then the order of elements between any
508 two elements is undefined (and probably time-varying); there is
509 no guarantee that the elements traversed will coincide with the
510 <emphasis>logical</emphasis> elements between 1 and 5, as in
515 <title>Range Iteration in Different Data Structures</title>
518 <imagedata align="center" format="PNG" scale="100"
519 fileref="../images/pbds_point_iterators_range_ops_1.png"/>
522 <phrase>Node Invariants</phrase>
528 In our opinion, this problem is not caused just because
529 red-black trees are order preserving while
530 collision-chaining hash tables are (generally) not - it
531 is more fundamental. Most of the standard's containers
532 order sequences in a well-defined manner that is
533 determined by their <emphasis>interface</emphasis>:
534 calling <function>insert</function> on a tree-based
535 container modifies its sequence in a predictable way, as
536 does calling <function>push_back</function> on a list or
537 a vector. Conversely, collision-chaining hash tables,
538 probing hash tables, priority queues, and list-based
539 containers (which are very useful for "multimaps") are
540 self-organizing data structures; the effect of each
541 operation modifies their sequences in a manner that is
542 (practically) determined by their
543 <emphasis>implementation</emphasis>.
547 Consequently, applying an algorithm to a sequence obtained from most
548 containers may or may not make sense, but applying it to a
549 sub-sequence of a self-organizing container does not.
553 <section xml:id="associative.iterators.cost">
555 <title>Cost to Point Iterators to Enable Range Operations</title>
558 Suppose <varname>c</varname> is some collision-chaining
559 hash-based container object, and one calls
561 <programlisting>c.find(3)</programlisting>
563 Then what composes the returned iterator?
567 In the graphic below, label A shows the simplest (and
568 most efficient) implementation of a collision-chaining
569 hash table. The little box marked
570 <classname>point_iterator</classname> shows an object
571 that contains a pointer to the element's node. Note that
572 this "iterator" has no way to move to the next element (
574 <function>operator++</function>). Conversely, the little
575 box marked <classname>iterator</classname> stores both a
576 pointer to the element, as well as some other
577 information (the bucket number of the element). the
578 second iterator, then, is "heavier" than the first one-
579 it requires more time and space. If we were to use a
580 different container to cross-reference into this
581 hash-table using these iterators - it would take much
582 more space. As noted above, nothing much can be done by
583 incrementing these iterators, so why is this extra
588 Alternatively, one might create a collision-chaining hash-table
589 where the lists might be linked, forming a monolithic total-element
590 list, as in the graphic below, label B. Here the iterators are as
591 light as can be, but the hash-table's operations are more
596 <title>Point Iteration in Hash Data Structures</title>
599 <imagedata align="center" format="PNG" scale="100"
600 fileref="../images/pbds_point_iterators_range_ops_2.png"/>
603 <phrase>Point Iteration in Hash Data Structures</phrase>
609 It should be noted that containers based on collision-chaining
610 hash-tables are not the only ones with this type of behavior;
611 many other self-organizing data structures display it as well.
615 <section xml:id="associative.iterators.invalidation">
616 <info><title>Invalidation Guarantees</title></info>
617 <para>Consider the following snippet:</para>
624 Following the call to <classname>erase</classname>, what is the
625 validity of <classname>it</classname>: can it be de-referenced?
626 can it be incremented?
630 The answer depends on the underlying data structure of the
631 container. The graphic below shows three cases: A1 and A2 show
632 a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
633 show a collision-chaining hash table.
637 <title>Effect of erase in different underlying data structures</title>
640 <imagedata align="center" format="PNG" scale="100"
641 fileref="../images/pbds_invalidation_guarantee_erase.png"/>
644 <phrase>Effect of erase in different underlying data structures</phrase>
652 Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
653 be de-referenced and incremented. The sequence of iterators
654 changed, but in a way that is well-defined by the interface.
660 Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
661 not valid at all - it cannot be de-referenced or
662 incremented; the order of iterators changed in a way that is
663 (practically) determined by the implementation and not by
670 Erasing 5 from C1 yields C2. Here the situation is more
671 complicated. On the one hand, there is no problem in
672 de-referencing <classname>it</classname>. On the other hand,
673 the order of iterators changed in a way that is
674 (practically) determined by the implementation and not by
681 So in the standard library containers, it is not always possible
682 to express whether <varname>it</varname> is valid or not. This
683 is true also for <function>insert</function>. Again, the
684 iterator concept seems overloaded.
687 </section> <!--iterators-->
690 <section xml:id="motivation.associative.functions">
691 <info><title>Functional</title></info>
696 The design of the functional overlay to the underlying data
697 structures differs slightly from some of the conventions used in
698 the C++ standard. A strict public interface of methods that
699 comprise only operations which depend on the class's internal
700 structure; other operations are best designed as external
701 functions. (See <xref linkend="biblio.meyers02both"/>).With this
702 rubric, the standard associative containers lack some useful
703 methods, and provide other methods which would be better
707 <section xml:id="motivation.associative.functions.erase">
708 <info><title><function>erase</function></title></info>
713 Order-preserving standard associative containers provide the
722 which takes an iterator, erases the corresponding
723 element, and returns an iterator to the following
724 element. Also standardd hash-based associative
725 containers provide this method. This seemingly
726 increasesgenericity between associative containers,
727 since it is possible to use
730 typename C::iterator it = c.begin();
731 typename C::iterator e_it = c.end();
734 it = pred(*it)? c.erase(it) : ++it;
738 in order to erase from a container object <varname>
739 c</varname> all element which match a
740 predicate <classname>pred</classname>. However, in a
741 different sense this actually decreases genericity: an
742 integral implication of this method is that tree-based
743 associative containers' memory use is linear in the total
744 number of elements they store, while hash-based
745 containers' memory use is unbounded in the total number of
746 elements they store. Assume a hash-based container is
747 allowed to decrease its size when an element is
748 erased. Then the elements might be rehashed, which means
749 that there is no "next" element - it is simply
750 undefined. Consequently, it is possible to infer from the
751 fact that the standard library's hash-based containers
752 provide this method that they cannot downsize when
753 elements are erased. As a consequence, different code is
754 needed to manipulate different containers, assuming that
755 memory should be conserved. Therefor, this library's
756 non-order preserving associative containers omit this
763 All associative containers include a conditional-erase method
773 which erases all elements matching a predicate. This is probably the
774 only way to ensure linear-time multiple-item erase which can
775 actually downsize a container.
781 The standard associative containers provide methods for
782 multiple-item erase of the form
789 erasing a range of elements given by a pair of
790 iterators. For tree-based or trie-based containers, this can
791 implemented more efficiently as a (small) sequence of split
792 and join operations. For other, unordered, containers, this
793 method isn't much better than an external loop. Moreover,
794 if <varname>c</varname> is a hash-based container,
798 c.erase(c.find(2), c.find(5))
801 is almost certain to do something
802 different than erasing all elements whose keys are between 2
803 and 5, and is likely to produce other undefined behavior.
807 </section> <!-- erase -->
809 <section xml:id="motivation.associative.functions.split">
812 <function>split</function> and <function>join</function>
816 It is well-known that tree-based and trie-based container
817 objects can be efficiently split or joined (See
818 <xref linkend="biblio.clrs2001"/>). Externally splitting or
819 joining trees is super-linear, and, furthermore, can throw
820 exceptions. Split and join methods, consequently, seem good
821 choices for tree-based container methods, especially, since as
822 noted just before, they are efficient replacements for erasing
826 </section> <!-- split -->
828 <section xml:id="motivation.associative.functions.insert">
831 <function>insert</function>
835 The standard associative containers provide methods of the form
838 template<class It>
844 for inserting a range of elements given by a pair of
845 iterators. At best, this can be implemented as an external loop,
846 or, even more efficiently, as a join operation (for the case of
847 tree-based or trie-based containers). Moreover, these methods seem
848 similar to constructors taking a range given by a pair of
849 iterators; the constructors, however, are transactional, whereas
850 the insert methods are not; this is possibly confusing.
853 </section> <!-- insert -->
855 <section xml:id="motivation.associative.functions.compare">
858 <function>operator==</function> and <function>operator<=</function>
863 Associative containers are parametrized by policies allowing to
864 test key equivalence: a hash-based container can do this through
865 its equivalence functor, and a tree-based container can do this
866 through its comparison functor. In addition, some standard
867 associative containers have global function operators, like
868 <function>operator==</function> and <function>operator<=</function>,
869 that allow comparing entire associative containers.
873 In our opinion, these functions are better left out. To begin
874 with, they do not significantly improve over an external
875 loop. More importantly, however, they are possibly misleading -
876 <function>operator==</function>, for example, usually checks for
877 equivalence, or interchangeability, but the associative
878 container cannot check for values' equivalence, only keys'
879 equivalence; also, are two containers considered equivalent if
880 they store the same values in different order? this is an
883 </section> <!-- compare -->
885 </section> <!-- functional -->
887 </section> <!--associative-->
889 <section xml:id="pbds.intro.motivation.priority_queue">
890 <info><title>Priority Queues</title></info>
892 <section xml:id="motivation.priority_queue.policy">
893 <info><title>Policy Choices</title></info>
896 Priority queues are containers that allow efficiently inserting
897 values and accessing the maximal value (in the sense of the
898 container's comparison functor). Their interface
899 supports <function>push</function>
900 and <function>pop</function>. The standard
901 container <classname>std::priorityqueue</classname> indeed support
902 these methods, but little else. For algorithmic and
903 software-engineering purposes, other methods are needed:
909 Many graph algorithms (see
910 <xref linkend="biblio.clrs2001"/>) require increasing a
911 value in a priority queue (again, in the sense of the
912 container's comparison functor), or joining two
913 priority-queue objects.
918 <para>The return type of <classname>priority_queue</classname>'s
919 <function>push</function> method is a point-type iterator, which can
920 be used for modifying or erasing arbitrary values. For
923 priority_queue<int> p;
924 priority_queue<int>::point_iterator it = p.push(3);
928 <para>These types of cross-referencing operations are necessary
929 for making priority queues useful for different applications,
930 especially graph applications.</para>
935 It is sometimes necessary to erase an arbitrary value in a
936 priority queue. For example, consider
937 the <function>select</function> function for monitoring
943 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
944 struct timeval *timeout);
947 then, as the select documentation states:
951 The nfds argument specifies the range of file
952 descriptors to be tested. The select() function tests file
953 descriptors in the range of 0 to nfds-1.</quote>
957 It stands to reason, therefore, that we might wish to
958 maintain a minimal value for <varname>nfds</varname>, and
959 priority queues immediately come to mind. Note, though, that
960 when a socket is closed, the minimal file description might
961 change; in the absence of an efficient means to erase an
962 arbitrary value from a priority queue, we might as well
963 avoid its use altogether.
967 The standard containers typically support iterators. It is
969 for <classname>std::priority_queue</classname> to omit them
970 (See <xref linkend="biblio.meyers01stl"/>). One might
971 ask why do priority queues need to support iterators, since
972 they are self-organizing containers with a different purpose
973 than abstracting sequences. There are several reasons:
978 Iterators (even in self-organizing containers) are
979 useful for many purposes: cross-referencing
980 containers, serialization, and debugging code that uses
987 The standard library's hash-based containers support
988 iterators, even though they too are self-organizing
989 containers with a different purpose than abstracting
996 In standard-library-like containers, it is natural to specify the
997 interface of operations for modifying a value or erasing
998 a value (discussed previously) in terms of a iterators.
999 It should be noted that the standard
1000 containers also use iterators for accessing and
1001 manipulating a specific value. In hash-based
1002 containers, one checks the existence of a key by
1003 comparing the iterator returned by <function>find</function> to the
1004 iterator returned by <function>end</function>, and not by comparing a
1005 pointer returned by <function>find</function> to <type>NULL</type>.
1014 <section xml:id="motivation.priority_queue.underlying">
1015 <info><title>Underlying Data Structures</title></info>
1018 There are three main implementations of priority queues: the
1019 first employs a binary heap, typically one which uses a
1020 sequence; the second uses a tree (or forest of trees), which is
1021 typically less structured than an associative container's tree;
1022 the third simply uses an associative container. These are
1023 shown in the figure below with labels A1 and A2, B, and C.
1027 <title>Underlying Priority Queue Data Structures</title>
1030 <imagedata align="center" format="PNG" scale="100"
1031 fileref="../images/pbds_different_underlying_dss_2.png"/>
1034 <phrase>Underlying Priority Queue Data Structures</phrase>
1040 No single implementation can completely replace any of the
1041 others. Some have better <function>push</function>
1042 and <function>pop</function> amortized performance, some have
1043 better bounded (worst case) response time than others, some
1044 optimize a single method at the expense of others, etc. In
1045 general the "best" implementation is dictated by the specific
1050 As with associative containers, the more implementations
1051 co-exist, the more necessary a traits mechanism is for handling
1052 generic containers safely and efficiently. This is especially
1053 important for priority queues, since the invalidation guarantees
1054 of one of the most useful data structures - binary heaps - is
1055 markedly different than those of most of the others.
1060 <section xml:id="motivation.priority_queue.binary_heap">
1061 <info><title>Binary Heaps</title></info>
1065 Binary heaps are one of the most useful underlying
1066 data structures for priority queues. They are very efficient in
1067 terms of memory (since they don't require per-value structure
1068 metadata), and have the best amortized <function>push</function> and
1069 <function>pop</function> performance for primitive types like
1074 The standard library's <classname>priority_queue</classname>
1075 implements this data structure as an adapter over a sequence,
1077 <classname>std::vector</classname>
1078 or <classname>std::deque</classname>, which correspond to labels
1079 A1 and A2 respectively in the graphic above.
1083 This is indeed an elegant example of the adapter concept and
1084 the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1085 several reasons why a binary-heap priority queue
1086 may be better implemented as a container instead of a
1093 <classname>std::priority_queue</classname> cannot erase values
1094 from its adapted sequence (irrespective of the sequence
1095 type). This means that the memory use of
1096 an <classname>std::priority_queue</classname> object is always
1097 proportional to the maximal number of values it ever contained,
1098 and not to the number of values that it currently
1099 contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1100 This implementation of binary heaps acts very differently than
1101 other underlying data structures (See also pairing heaps).
1107 Some combinations of adapted sequences and value types
1108 are very inefficient or just don't make sense. If one uses
1109 <classname>std::priority_queue<std::vector<std::string>
1110 > ></classname>, for example, then not only will each
1111 operation perform a logarithmic number of
1112 <classname>std::string</classname> assignments, but, furthermore, any
1113 operation (including <function>pop</function>) can render the container
1114 useless due to exceptions. Conversely, if one uses
1115 <classname>std::priority_queue<std::deque<int> >
1116 ></classname>, then each operation uses incurs a logarithmic
1117 number of indirect accesses (through pointers) unnecessarily.
1118 It might be better to let the container make a conservative
1119 deduction whether to use the structure in the graphic above, labels A1 or A2.
1125 There does not seem to be a systematic way to determine
1126 what exactly can be done with the priority queue.
1131 If <classname>p</classname> is a priority queue adapting an
1132 <classname>std::vector</classname>, then it is possible to iterate over
1133 all values by using <function>&p.top()</function> and
1134 <function>&p.top() + p.size()</function>, but this will not work
1135 if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1136 case, one cannot use <classname>p.begin()</classname> and
1137 <classname>p.end()</classname>. If a different sequence is adapted, it
1138 is even more difficult to determine what can be
1145 If <varname>p</varname> is a priority queue adapting an
1146 <classname>std::deque</classname>, then the reference return by
1152 will remain valid until it is popped,
1153 but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1154 next <function>push</function> will invalidate it. If a different
1155 sequence is adapted, it is even more difficult to
1156 determine what can be done.
1164 Sequence-based binary heaps can still implement
1165 linear-time <function>erase</function> and <function>modify</function> operations.
1166 This means that if one needs to erase a small
1167 (say logarithmic) number of values, then one might still
1168 choose this underlying data structure. Using
1169 <classname>std::priority_queue</classname>, however, this will generally
1170 change the order of growth of the entire sequence of
1178 </section> <!-- goals/motivation -->
1179 </section> <!-- intro -->
1182 <section xml:id="containers.pbds.using">
1183 <info><title>Using</title></info>
1184 <?dbhtml filename="policy_data_structures_using.html"?>
1186 <section xml:id="pbds.using.prereq">
1187 <info><title>Prerequisites</title></info>
1189 <para>The library contains only header files, and does not require any
1190 other libraries except the standard C++ library . All classes are
1191 defined in namespace <code>__gnu_pbds</code>. The library internally
1192 uses macros beginning with <code>PB_DS</code>, but
1193 <code>#undef</code>s anything it <code>#define</code>s (except for
1194 header guards). Compiling the library in an environment where macros
1195 beginning in <code>PB_DS</code> are defined, may yield unpredictable
1196 results in compilation, execution, or both.</para>
1199 Further dependencies are necessary to create the visual output
1200 for the performance tests. To create these graphs, an
1201 additional package is needed: <command>pychart</command>.
1205 <section xml:id="pbds.using.organization">
1206 <info><title>Organization</title></info>
1209 The various data structures are organized as follows.
1221 <classname>basic_branch</classname>
1222 is an abstract base class for branched-based
1223 associative-containers
1229 <classname>tree</classname>
1230 is a concrete base class for tree-based
1231 associative-containers
1237 <classname>trie</classname>
1238 is a concrete base class trie-based
1239 associative-containers
1252 <classname>basic_hash_table</classname>
1253 is an abstract base class for hash-based
1254 associative-containers
1260 <classname>cc_hash_table</classname>
1261 is a concrete collision-chaining hash-based
1262 associative-containers
1268 <classname>gp_hash_table</classname>
1269 is a concrete (general) probing hash-based
1270 associative-containers
1283 <classname>list_update</classname>
1284 list-based update-policy associative container
1296 <classname>priority_queue</classname>
1305 The hierarchy is composed naturally so that commonality is
1306 captured by base classes. Thus <function>operator[]</function>
1307 is defined at the base of any hierarchy, since all derived
1308 containers support it. Conversely <function>split</function> is
1309 defined in <classname>basic_branch</classname>, since only
1310 tree-like containers support it.
1314 In addition, there are the following diagnostics classes,
1315 used to report errors specific to this library's data
1320 <title>Exception Hierarchy</title>
1323 <imagedata align="center" format="PDF" scale="75"
1324 fileref="../images/pbds_exception_hierarchy.pdf"/>
1327 <imagedata align="center" format="PNG" scale="100"
1328 fileref="../images/pbds_exception_hierarchy.png"/>
1331 <phrase>Exception Hierarchy</phrase>
1338 <section xml:id="pbds.using.tutorial">
1339 <info><title>Tutorial</title></info>
1341 <section xml:id="pbds.using.tutorial.basic">
1342 <info><title>Basic Use</title></info>
1345 For the most part, the policy-based containers containers in
1346 namespace <literal>__gnu_pbds</literal> have the same interface as
1347 the equivalent containers in the standard C++ library, except for
1348 the names used for the container classes themselves. For example,
1349 this shows basic operations on a collision-chaining hash-based
1353 #include <ext/pb_ds/assoc_container.h>
1357 __gnu_pbds::cc_hash_table<int, char> c;
1359 assert(c.find(1) == c.end());
1364 The container is called
1365 <classname>__gnu_pbds::cc_hash_table</classname> instead of
1366 <classname>std::unordered_map</classname>, since <quote>unordered
1367 map</quote> does not necessarily mean a hash-based map as implied by
1368 the C++ library (C++11 or TR1). For example, list-based associative
1369 containers, which are very useful for the construction of
1370 "multimaps," are also unordered.
1373 <para>This snippet shows a red-black tree based container:</para>
1376 #include <ext/pb_ds/assoc_container.h>
1380 __gnu_pbds::tree<int, char> c;
1382 assert(c.find(2) != c.end());
1386 <para>The container is called <classname>tree</classname> instead of
1387 <classname>map</classname> since the underlying data structures are
1388 being named with specificity.
1392 The member function naming convention is to strive to be the same as
1393 the equivalent member functions in other C++ standard library
1394 containers. The familiar methods are unchanged:
1395 <function>begin</function>, <function>end</function>,
1396 <function>size</function>, <function>empty</function>, and
1397 <function>clear</function>.
1401 This isn't to say that things are exactly as one would expect, given
1402 the container requirments and interfaces in the C++ standard.
1406 The names of containers' policies and policy accessors are
1407 different then the usual. For example, if <type>hash_type</type> is
1408 some type of hash-based container, then</para>
1415 gives the type of its hash functor, and if <varname>obj</varname> is
1416 some hash-based container object, then
1423 <para>will return a reference to its hash-functor object.</para>
1427 Similarly, if <type>tree_type</type> is some type of tree-based
1436 gives the type of its comparison functor, and if
1437 <varname>obj</varname> is some tree-based container object,
1445 <para>will return a reference to its comparison-functor object.</para>
1448 It would be nice to give names consistent with those in the existing
1449 C++ standard (inclusive of TR1). Unfortunately, these standard
1450 containers don't consistently name types and methods. For example,
1451 <classname>std::tr1::unordered_map</classname> uses
1452 <type>hasher</type> for the hash functor, but
1453 <classname>std::map</classname> uses <type>key_compare</type> for
1454 the comparison functor. Also, we could not find an accessor for
1455 <classname>std::tr1::unordered_map</classname>'s hash functor, but
1456 <classname>std::map</classname> uses <classname>compare</classname>
1457 for accessing the comparison functor.
1461 Instead, <literal>__gnu_pbds</literal> attempts to be internally
1462 consistent, and uses standard-derived terminology if possible.
1466 Another source of difference is in scope:
1467 <literal>__gnu_pbds</literal> contains more types of associative
1468 containers than the standard C++ library, and more opportunities
1469 to configure these new containers, since different types of
1470 associative containers are useful in different settings.
1474 Namespace <literal>__gnu_pbds</literal> contains different classes for
1475 hash-based containers, tree-based containers, trie-based containers,
1476 and list-based containers.
1480 Since associative containers share parts of their interface, they
1481 are organized as a class hierarchy.
1484 <para>Each type or method is defined in the most-common ancestor
1485 in which it makes sense.
1488 <para>For example, all associative containers support iteration
1489 expressed in the following form:
1507 But not all containers contain or use hash functors. Yet, both
1508 collision-chaining and (general) probing hash-based associative
1509 containers have a hash functor, so
1510 <classname>basic_hash_table</classname> contains the interface:
1515 get_hash_fn() const;
1522 so all hash-based associative containers inherit the same
1523 hash-functor accessor methods.
1526 </section> <!--basic use -->
1528 <section xml:id="pbds.using.tutorial.configuring">
1531 Configuring via Template Parameters
1536 In general, each of this library's containers is
1537 parametrized by more policies than those of the standard library. For
1538 example, the standard hash-based container is parametrized as
1542 template<typename Key, typename Mapped, typename Hash,
1543 typename Pred, typename Allocator, bool Cache_Hashe_Code>
1544 class unordered_map;
1548 and so can be configured by key type, mapped type, a functor
1549 that translates keys to unsigned integral types, an equivalence
1550 predicate, an allocator, and an indicator whether to store hash
1551 values with each entry. this library's collision-chaining
1552 hash-based container is parametrized as
1555 template<typename Key, typename Mapped, typename Hash_Fn,
1556 typename Eq_Fn, typename Comb_Hash_Fn,
1557 typename Resize_Policy, bool Store_Hash
1558 typename Allocator>
1559 class cc_hash_table;
1563 and so can be configured by the first four types of
1564 <classname>std::tr1::unordered_map</classname>, then a
1565 policy for translating the key-hash result into a position
1566 within the table, then a policy by which the table resizes,
1567 an indicator whether to store hash values with each entry,
1568 and an allocator (which is typically the last template
1569 parameter in standard containers).
1573 Nearly all policy parameters have default values, so this
1574 need not be considered for casual use. It is important to
1575 note, however, that hash-based containers' policies can
1576 dramatically alter their performance in different settings,
1577 and that tree-based containers' policies can make them
1578 useful for other purposes than just look-up.
1582 <para>As opposed to associative containers, priority queues have
1583 relatively few configuration options. The priority queue is
1584 parametrized as follows:</para>
1586 template<typename Value_Type, typename Cmp_Fn,typename Tag,
1587 typename Allocator>
1588 class priority_queue;
1591 <para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1592 <classname>Allocator</classname> parameters are the container's value type,
1593 comparison-functor type, and allocator type, respectively;
1594 these are very similar to the standard's priority queue. The
1595 <classname>Tag</classname> parameter is different: there are a number of
1596 pre-defined tag types corresponding to binary heaps, binomial
1597 heaps, etc., and <classname>Tag</classname> should be instantiated
1598 by one of them.</para>
1600 <para>Note that as opposed to the
1601 <classname>std::priority_queue</classname>,
1602 <classname>__gnu_pbds::priority_queue</classname> is not a
1603 sequence-adapter; it is a regular container.</para>
1607 <section xml:id="pbds.using.tutorial.traits">
1610 Querying Container Attributes
1615 <para>A containers underlying data structure
1616 affect their performance; Unfortunately, they can also affect
1617 their interface. When manipulating generically associative
1618 containers, it is often useful to be able to statically
1619 determine what they can support and what the cannot.
1622 <para>Happily, the standard provides a good solution to a similar
1623 problem - that of the different behavior of iterators. If
1624 <classname>It</classname> is an iterator, then
1627 typename std::iterator_traits<It>::iterator_category
1630 <para>is one of a small number of pre-defined tag classes, and
1633 typename std::iterator_traits<It>::value_type
1636 <para>is the value type to which the iterator "points".</para>
1639 Similarly, in this library, if <type>C</type> is a
1640 container, then <classname>container_traits</classname> is a
1641 trait class that stores information about the kind of
1642 container that is implemented.
1645 typename container_traits<C>::container_category
1648 is one of a small number of predefined tag structures that
1649 uniquely identifies the type of underlying data structure.
1652 <para>In most cases, however, the exact underlying data
1653 structure is not really important, but what is important is
1654 one of its other attributes: whether it guarantees storing
1655 elements by key order, for example. For this one can
1658 typename container_traits<C>::order_preserving
1664 typename container_traits<C>::invalidation_guarantee
1667 <para>is the container's invalidation guarantee. Invalidation
1668 guarantees are especially important regarding priority queues,
1669 since in this library's design, iterators are practically the
1670 only way to manipulate them.</para>
1673 <section xml:id="pbds.using.tutorial.point_range_iteration">
1676 Point and Range Iteration
1681 <para>This library differentiates between two types of methods
1682 and iterators: point-type, and range-type. For example,
1683 <function>find</function> and <function>insert</function> are point-type methods, since
1684 they each deal with a specific element; their returned
1685 iterators are point-type iterators. <function>begin</function> and
1686 <function>end</function> are range-type methods, since they are not used to
1687 find a specific element, but rather to go over all elements in
1688 a container object; their returned iterators are range-type
1692 <para>Most containers store elements in an order that is
1693 determined by their interface. Correspondingly, it is fine that
1694 their point-type iterators are synonymous with their range-type
1695 iterators. For example, in the following snippet
1698 std::for_each(c.find(1), c.find(5), foo);
1701 two point-type iterators (returned by <function>find</function>) are used
1702 for a range-type purpose - going over all elements whose key is
1707 Conversely, the above snippet makes no sense for
1708 self-organizing containers - ones that order (and reorder)
1709 their elements by implementation. It would be nice to have a
1710 uniform iterator system that would allow the above snippet to
1711 compile only if it made sense.
1715 This could trivially be done by specializing
1716 <function>std::for_each</function> for the case of iterators returned by
1717 <classname>std::tr1::unordered_map</classname>, but this would only solve the
1718 problem for one algorithm and one container. Fundamentally, the
1719 problem is that one can loop using a self-organizing
1720 container's point-type iterators.
1724 This library's containers define two families of
1725 iterators: <type>point_const_iterator</type> and
1726 <type>point_iterator</type> are the iterator types returned by
1727 point-type methods; <type>const_iterator</type> and
1728 <type>iterator</type> are the iterator types returned by range-type
1732 class <- some container ->
1737 typedef <- something -> const_iterator;
1739 typedef <- something -> iterator;
1741 typedef <- something -> point_const_iterator;
1743 typedef <- something -> point_iterator;
1750 const_iterator begin () const;
1754 point_const_iterator find(...) const;
1756 point_iterator find(...);
1761 containers whose interface defines sequence order , it
1762 is very simple: point-type and range-type iterators are exactly
1763 the same, which means that the above snippet will compile if it
1764 is used for an order-preserving associative container.
1768 For self-organizing containers, however, (hash-based
1769 containers as a special example), the preceding snippet will
1770 not compile, because their point-type iterators do not support
1771 <function>operator++</function>.
1774 <para>In any case, both for order-preserving and self-organizing
1775 containers, the following snippet will compile:
1778 typename Cntnr::point_iterator it = c.find(2);
1782 because a range-type iterator can always be converted to a
1783 point-type iterator.
1786 <para>Distingushing between iterator types also
1787 raises the point that a container's iterators might have
1788 different invalidation rules concerning their de-referencing
1789 abilities and movement abilities. This now corresponds exactly
1790 to the question of whether point-type and range-type iterators
1791 are valid. As explained above, <classname>container_traits</classname> allows
1792 querying a container for its data structure attributes. The
1793 iterator-invalidation guarantees are certainly a property of
1794 the underlying data structure, and so
1797 container_traits<C>::invalidation_guarantee
1801 gives one of three pre-determined types that answer this
1806 </section> <!-- tutorial -->
1808 <section xml:id="pbds.using.examples">
1809 <info><title>Examples</title></info>
1811 Additional code examples are provided in the source
1812 distribution, as part of the regression and performance
1816 <section xml:id="pbds.using.examples.basic">
1817 <info><title>Intermediate Use</title></info>
1823 <filename>basic_map.cc</filename>
1830 <filename>basic_set.cc</filename>
1836 Conditionally erasing values from an associative container object:
1837 <filename>erase_if.cc</filename>
1843 Basic use of multimaps:
1844 <filename>basic_multimap.cc</filename>
1850 Basic use of multisets:
1851 <filename>basic_multiset.cc</filename>
1857 Basic use of priority queues:
1858 <filename>basic_priority_queue.cc</filename>
1864 Splitting and joining priority queues:
1865 <filename>priority_queue_split_join.cc</filename>
1871 Conditionally erasing values from a priority queue:
1872 <filename>priority_queue_erase_if.cc</filename>
1879 <section xml:id="pbds.using.examples.query">
1880 <info><title>Querying with <classname>container_traits</classname> </title></info>
1884 Using <classname>container_traits</classname> to query
1885 about underlying data structure behavior:
1886 <filename>assoc_container_traits.cc</filename>
1892 A non-compiling example showing wrong use of finding keys in
1893 hash-based containers: <filename>hash_find_neg.cc</filename>
1898 Using <classname>container_traits</classname>
1899 to query about underlying data structure behavior:
1900 <filename>priority_queue_container_traits.cc</filename>
1908 <section xml:id="pbds.using.examples.container">
1909 <info><title>By Container Method</title></info>
1912 <section xml:id="pbds.using.examples.container.hash">
1913 <info><title>Hash-Based</title></info>
1915 <section xml:id="pbds.using.examples.container.hash.resize">
1916 <info><title>size Related</title></info>
1921 Setting the initial size of a hash-based container
1923 <filename>hash_initial_size.cc</filename>
1929 A non-compiling example showing how not to resize a
1930 hash-based container object:
1931 <filename>hash_resize_neg.cc</filename>
1937 Resizing the size of a hash-based container object:
1938 <filename>hash_resize.cc</filename>
1944 Showing an illegal resize of a hash-based container
1946 <filename>hash_illegal_resize.cc</filename>
1952 Changing the load factors of a hash-based container
1953 object: <filename>hash_load_set_change.cc</filename>
1959 <section xml:id="pbds.using.examples.container.hash.hashor">
1960 <info><title>Hashing Function Related</title></info>
1966 Using a modulo range-hashing function for the case of an
1967 unknown skewed key distribution:
1968 <filename>hash_mod.cc</filename>
1974 Writing a range-hashing functor for the case of a known
1975 skewed key distribution:
1976 <filename>shift_mask.cc</filename>
1982 Storing the hash value along with each key:
1983 <filename>store_hash.cc</filename>
1989 Writing a ranged-hash functor:
1990 <filename>ranged_hash.cc</filename>
1999 <section xml:id="pbds.using.examples.container.branch">
2000 <info><title>Branch-Based</title></info>
2003 <section xml:id="pbds.using.examples.container.branch.split">
2004 <info><title>split or join Related</title></info>
2009 Joining two tree-based container objects:
2010 <filename>tree_join.cc</filename>
2016 Splitting a PATRICIA trie container object:
2017 <filename>trie_split.cc</filename>
2023 Order statistics while joining two tree-based container
2025 <filename>tree_order_statistics_join.cc</filename>
2032 <section xml:id="pbds.using.examples.container.branch.invariants">
2033 <info><title>Node Invariants</title></info>
2038 Using trees for order statistics:
2039 <filename>tree_order_statistics.cc</filename>
2045 Augmenting trees to support operations on line
2047 <filename>tree_intervals.cc</filename>
2054 <section xml:id="pbds.using.examples.container.branch.trie">
2055 <info><title>trie</title></info>
2059 Using a PATRICIA trie for DNA strings:
2060 <filename>trie_dna.cc</filename>
2067 trie for finding all entries whose key matches a given prefix:
2068 <filename>trie_prefix_search.cc</filename>
2077 <section xml:id="pbds.using.examples.container.priority_queue">
2078 <info><title>Priority Queues</title></info>
2082 Cross referencing an associative container and a priority
2083 queue: <filename>priority_queue_xref.cc</filename>
2089 Cross referencing a vector and a priority queue using a
2090 very simple version of Dijkstra's shortest path
2092 <filename>priority_queue_dijkstra.cc</filename>
2104 </section> <!-- using -->
2106 <!-- S03: Design -->
2109 <section xml:id="containers.pbds.design">
2110 <info><title>Design</title></info>
2111 <?dbhtml filename="policy_data_structures_design.html"?>
2114 <section xml:id="pbds.design.concepts">
2115 <info><title>Concepts</title></info>
2117 <section xml:id="pbds.design.concepts.null_type">
2118 <info><title>Null Policy Classes</title></info>
2121 Associative containers are typically parametrized by various
2122 policies. For example, a hash-based associative container is
2123 parametrized by a hash-functor, transforming each key into an
2124 non-negative numerical type. Each such value is then further mapped
2125 into a position within the table. The mapping of a key into a
2126 position within the table is therefore a two-step process.
2130 In some cases, instantiations are redundant. For example, when the
2131 keys are integers, it is possible to use a redundant hash policy,
2132 which transforms each key into its value.
2136 In some other cases, these policies are irrelevant. For example, a
2137 hash-based associative container might transform keys into positions
2138 within a table by a different method than the two-step method
2139 described above. In such a case, the hash functor is simply
2144 When a policy is either redundant or irrelevant, it can be replaced
2145 by <classname>null_type</classname>.
2149 For example, a <emphasis>set</emphasis> is an associative
2150 container with one of its template parameters (the one for the
2151 mapped type) replaced with <classname>null_type</classname>. Other
2152 places simplifications are made possible with this technique
2153 include node updates in tree and trie data structures, and hash
2154 and probe functions for hash data structures.
2158 <section xml:id="pbds.design.concepts.associative_semantics">
2159 <info><title>Map and Set Semantics</title></info>
2161 <section xml:id="concepts.associative_semantics.set_vs_map">
2164 Distinguishing Between Maps and Sets
2169 Anyone familiar with the standard knows that there are four kinds
2170 of associative containers: maps, sets, multimaps, and
2171 multisets. The map datatype associates each key to
2176 Sets are associative containers that simply store keys -
2177 they do not map them to anything. In the standard, each map class
2178 has a corresponding set class. E.g.,
2179 <classname>std::map<int, char></classname> maps each
2180 <classname>int</classname> to a <classname>char</classname>, but
2181 <classname>std::set<int, char></classname> simply stores
2182 <classname>int</classname>s. In this library, however, there are no
2183 distinct classes for maps and sets. Instead, an associative
2184 container's <classname>Mapped</classname> template parameter is a policy: if
2185 it is instantiated by <classname>null_type</classname>, then it
2186 is a "set"; otherwise, it is a "map". E.g.,
2189 cc_hash_table<int, char>
2192 is a "map" mapping each <type>int</type> value to a <type>
2196 cc_hash_table<int, null_type>
2199 is a type that uniquely stores <type>int</type> values.
2201 <para>Once the <classname>Mapped</classname> template parameter is instantiated
2202 by <classname>null_type</classname>, then
2203 the "set" acts very similarly to the standard's sets - it does not
2204 map each key to a distinct <classname>null_type</classname> object. Also,
2205 , the container's <type>value_type</type> is essentially
2206 its <type>key_type</type> - just as with the standard's sets
2210 The standard's multimaps and multisets allow, respectively,
2211 non-uniquely mapping keys and non-uniquely storing keys. As
2213 reasons why this might be necessary are 1) that a key might be
2214 decomposed into a primary key and a secondary key, 2) that a
2215 key might appear more than once, or 3) any arbitrary
2216 combination of 1)s and 2)s. Correspondingly,
2217 one should use 1) "maps" mapping primary keys to secondary
2218 keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2219 combination of 1)s and 2)s. Thus, for example, an
2220 <classname>std::multiset<int></classname> might be used to store
2221 multiple instances of integers, but using this library's
2222 containers, one might use
2225 tree<int, size_t>
2229 i.e., a <classname>map</classname> of <type>int</type>s to
2230 <type>size_t</type>s.
2233 These "multimaps" and "multisets" might be confusing to
2234 anyone familiar with the standard's <classname>std::multimap</classname> and
2235 <classname>std::multiset</classname>, because there is no clear
2236 correspondence between the two. For example, in some cases
2237 where one uses <classname>std::multiset</classname> in the standard, one might use
2238 in this library a "multimap" of "multisets" - i.e., a
2239 container that maps primary keys each to an associative
2240 container that maps each secondary key to the number of times
2245 When one uses a "multimap," one should choose with care the
2246 type of container used for secondary keys.
2248 </section> <!-- map vs set -->
2251 <section xml:id="concepts.associative_semantics.multi">
2252 <info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2255 Brace onself: this library does not contain containers like
2256 <classname>std::multimap</classname> or
2257 <classname>std::multiset</classname>. Instead, these data
2258 structures can be synthesized via manipulation of the
2259 <classname>Mapped</classname> template parameter.
2262 One maps the unique part of a key - the primary key, into an
2263 associative-container of the (originally) non-unique parts of
2264 the key - the secondary key. A primary associative-container
2265 is an associative container of primary keys; a secondary
2266 associative-container is an associative container of
2271 Stepping back a bit, and starting in from the beginning.
2276 Maps (or sets) allow mapping (or storing) unique-key values.
2277 The standard library also supplies associative containers which
2278 map (or store) multiple values with equivalent keys:
2279 <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2280 <classname>std::tr1::unordered_multimap</classname>, and
2281 <classname>unordered_multiset</classname>. We first discuss how these might
2282 be used, then why we think it is best to avoid them.
2286 Suppose one builds a simple bank-account application that
2287 records for each client (identified by an <classname>std::string</classname>)
2288 and account-id (marked by an <type>unsigned long</type>) -
2289 the balance in the account (described by a
2290 <type>float</type>). Suppose further that ordering this
2291 information is not useful, so a hash-based container is
2292 preferable to a tree based container. Then one can use
2296 std::tr1::unordered_map<std::pair<std::string, unsigned long>, float, ...>
2300 which hashes every combination of client and account-id. This
2301 might work well, except for the fact that it is now impossible
2302 to efficiently list all of the accounts of a specific client
2303 (this would practically require iterating over all
2304 entries). Instead, one can use
2308 std::tr1::unordered_multimap<std::pair<std::string, unsigned long>, float, ...>
2312 which hashes every client, and decides equivalence based on
2313 client only. This will ensure that all accounts belonging to a
2314 specific user are stored consecutively.
2318 Also, suppose one wants an integers' priority queue
2319 (a container that supports <function>push</function>,
2320 <function>pop</function>, and <function>top</function> operations, the last of which
2321 returns the largest <type>int</type>) that also supports
2322 operations such as <function>find</function> and <function>lower_bound</function>. A
2323 reasonable solution is to build an adapter over
2324 <classname>std::set<int></classname>. In this adapter,
2325 <function>push</function> will just call the tree-based
2326 associative container's <function>insert</function> method; <function>pop</function>
2327 will call its <function>end</function> method, and use it to return the
2328 preceding element (which must be the largest). Then this might
2329 work well, except that the container object cannot hold
2330 multiple instances of the same integer (<function>push(4)</function>,
2331 will be a no-op if <constant>4</constant> is already in the
2332 container object). If multiple keys are necessary, then one
2333 might build the adapter over an
2334 <classname>std::multiset<int></classname>.
2338 The standard library's non-unique-mapping containers are useful
2339 when (1) a key can be decomposed in to a primary key and a
2340 secondary key, (2) a key is needed multiple times, or (3) any
2341 combination of (1) and (2).
2345 The graphic below shows how the standard library's container
2346 design works internally; in this figure nodes shaded equally
2347 represent equivalent-key values. Equivalent keys are stored
2348 consecutively using the properties of the underlying data
2349 structure: binary search trees (label A) store equivalent-key
2350 values consecutively (in the sense of an in-order walk)
2351 naturally; collision-chaining hash tables (label B) store
2352 equivalent-key values in the same bucket, the bucket can be
2353 arranged so that equivalent-key values are consecutive.
2357 <title>Non-unique Mapping Standard Containers</title>
2360 <imagedata align="center" format="PNG" scale="100"
2361 fileref="../images/pbds_embedded_lists_1.png"/>
2364 <phrase>Non-unique Mapping Standard Containers</phrase>
2370 Put differently, the standards' non-unique mapping
2371 associative-containers are associative containers that map
2372 primary keys to linked lists that are embedded into the
2373 container. The graphic below shows again the two
2374 containers from the first graphic above, this time with
2375 the embedded linked lists of the grayed nodes marked
2379 <figure xml:id="fig.pbds_embedded_lists_2">
2381 Effect of embedded lists in
2382 <classname>std::multimap</classname>
2386 <imagedata align="center" format="PNG" scale="100"
2387 fileref="../images/pbds_embedded_lists_2.png"/>
2391 Effect of embedded lists in
2392 <classname>std::multimap</classname>
2399 These embedded linked lists have several disadvantages.
2405 The underlying data structure embeds the linked lists
2406 according to its own consideration, which means that the
2407 search path for a value might include several different
2408 equivalent-key values. For example, the search path for the
2409 the black node in either of the first graphic, labels A or B,
2410 includes more than a single gray node.
2416 The links of the linked lists are the underlying data
2417 structures' nodes, which typically are quite structured. In
2418 the case of tree-based containers (the grapic above, label
2419 B), each "link" is actually a node with three pointers (one
2420 to a parent and two to children), and a
2421 relatively-complicated iteration algorithm. The linked
2422 lists, therefore, can take up quite a lot of memory, and
2423 iterating over all values equal to a given key (through the
2424 return value of the standard
2425 library's <function>equal_range</function>) can be
2432 The primary key is stored multiply; this uses more memory.
2438 Finally, the interface of this design excludes several
2439 useful underlying data structures. Of all the unordered
2440 self-organizing data structures, practically only
2441 collision-chaining hash tables can (efficiently) guarantee
2442 that equivalent-key values are stored consecutively.
2448 The above reasons hold even when the ratio of secondary keys to
2449 primary keys (or average number of identical keys) is small, but
2450 when it is large, there are more severe problems:
2456 The underlying data structures order the links inside each
2457 embedded linked-lists according to their internal
2458 considerations, which effectively means that each of the
2459 links is unordered. Irrespective of the underlying data
2460 structure, searching for a specific value can degrade to
2467 Similarly to the above point, it is impossible to apply
2468 to the secondary keys considerations that apply to primary
2469 keys. For example, it is not possible to maintain secondary
2470 keys by sorted order.
2476 While the interface "understands" that all equivalent-key
2477 values constitute a distinct list (through
2478 <function>equal_range</function>), the underlying data
2479 structure typically does not. This means that operations such
2480 as erasing from a tree-based container all values whose keys
2481 are equivalent to a a given key can be super-linear in the
2482 size of the tree; this is also true also for several other
2483 operations that target a specific list.
2490 In this library, all associative containers map
2491 (or store) unique-key values. One can (1) map primary keys to
2492 secondary associative-containers (containers of
2493 secondary keys) or non-associative containers (2) map identical
2494 keys to a size-type representing the number of times they
2495 occur, or (3) any combination of (1) and (2). Instead of
2496 allowing multiple equivalent-key values, this library
2497 supplies associative containers based on underlying
2498 data structures that are suitable as secondary
2499 associative-containers.
2503 In the figure below, labels A and B show the equivalent
2504 underlying data structures in this library, as mapped to the
2505 first graphic above. Labels A and B, respectively. Each shaded
2506 box represents some size-type or secondary
2507 associative-container.
2511 <title>Non-unique Mapping Containers</title>
2514 <imagedata align="center" format="PNG" scale="100"
2515 fileref="../images/pbds_embedded_lists_3.png"/>
2518 <phrase>Non-unique Mapping Containers</phrase>
2524 In the first example above, then, one would use an associative
2525 container mapping each user to an associative container which
2526 maps each application id to a start time (see
2527 <filename>example/basic_multimap.cc</filename>); in the second
2528 example, one would use an associative container mapping
2529 each <classname>int</classname> to some size-type indicating the
2530 number of times it logically occurs
2531 (see <filename>example/basic_multiset.cc</filename>.
2535 See the discussion in list-based container types for containers
2536 especially suited as secondary associative-containers.
2540 </section> <!-- map and set semantics -->
2542 <section xml:id="pbds.design.concepts.iterator_semantics">
2543 <info><title>Iterator Semantics</title></info>
2545 <section xml:id="concepts.iterator_semantics.point_and_range">
2546 <info><title>Point and Range Iterators</title></info>
2549 Iterator concepts are bifurcated in this design, and are
2550 comprised of point-type and range-type iteration.
2554 A point-type iterator is an iterator that refers to a specific
2555 element as returned through an
2556 associative-container's <function>find</function> method.
2560 A range-type iterator is an iterator that is used to go over a
2561 sequence of elements, as returned by a container's
2562 <function>find</function> method.
2566 A point-type method is a method that
2567 returns a point-type iterator; a range-type method is a method
2568 that returns a range-type iterator.
2571 <para>For most containers, these types are synonymous; for
2572 self-organizing containers, such as hash-based containers or
2573 priority queues, these are inherently different (in any
2574 implementation, including that of C++ standard library
2575 components), but in this design, it is made explicit. They are
2581 <section xml:id="concepts.iterator_semantics.both">
2582 <info><title>Distinguishing Point and Range Iterators</title></info>
2584 <para>When using this library, is necessary to differentiate
2585 between two types of methods and iterators: point-type methods and
2586 iterators, and range-type methods and iterators. Each associative
2587 container's interface includes the methods:</para>
2589 point_const_iterator
2590 find(const_key_reference r_key) const;
2593 find(const_key_reference r_key);
2595 std::pair<point_iterator,bool>
2596 insert(const_reference r_val);
2599 <para>The relationship between these iterator types varies between
2600 container types. The figure below
2601 shows the most general invariant between point-type and
2602 range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2603 always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2604 shows invariants for order-preserving containers: point-type
2605 iterators are synonymous with range-type iterators.
2606 Orthogonally, <emphasis>C</emphasis>shows invariants for "set"
2607 containers: iterators are synonymous with const iterators.</para>
2610 <title>Point Iterator Hierarchy</title>
2613 <imagedata align="center" format="PNG" scale="100"
2614 fileref="../images/pbds_point_iterator_hierarchy.png"/>
2617 <phrase>Point Iterator Hierarchy</phrase>
2623 <para>Note that point-type iterators in self-organizing containers
2624 (hash-based associative containers) lack movement
2625 operators, such as <literal>operator++</literal> - in fact, this
2626 is the reason why this library differentiates from the standard C++ librarys
2627 design on this point.</para>
2629 <para>Typically, one can determine an iterator's movement
2631 <literal>std::iterator_traits<It>iterator_category</literal>,
2632 which is a <literal>struct</literal> indicating the iterator's
2633 movement capabilities. Unfortunately, none of the standard predefined
2634 categories reflect a pointer's <emphasis>not</emphasis> having any
2635 movement capabilities whatsoever. Consequently,
2636 <literal>pb_ds</literal> adds a type
2637 <literal>trivial_iterator_tag</literal> (whose name is taken from
2638 a concept in C++ standardese, which is the category of iterators
2639 with no movement capabilities.) All other standard C++ library
2640 tags, such as <literal>forward_iterator_tag</literal> retain their
2645 <section xml:id="pbds.design.concepts.invalidation">
2646 <info><title>Invalidation Guarantees</title></info>
2648 If one manipulates a container object, then iterators previously
2649 obtained from it can be invalidated. In some cases a
2650 previously-obtained iterator cannot be de-referenced; in other cases,
2651 the iterator's next or previous element might have changed
2652 unpredictably. This corresponds exactly to the question whether a
2653 point-type or range-type iterator (see previous concept) is valid or
2654 not. In this design, one can query a container (in compile time) about
2655 its invalidation guarantees.
2660 Given three different types of associative containers, a modifying
2661 operation (in that example, <function>erase</function>) invalidated
2662 iterators in three different ways: the iterator of one container
2663 remained completely valid - it could be de-referenced and
2664 incremented; the iterator of a different container could not even be
2665 de-referenced; the iterator of the third container could be
2666 de-referenced, but its "next" iterator changed unpredictably.
2670 Distinguishing between find and range types allows fine-grained
2671 invalidation guarantees, because these questions correspond exactly
2672 to the question of whether point-type iterators and range-type
2673 iterators are valid. The graphic below shows tags corresponding to
2674 different types of invalidation guarantees.
2678 <title>Invalidation Guarantee Tags Hierarchy</title>
2681 <imagedata align="center" format="PDF" scale="75"
2682 fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2685 <imagedata align="center" format="PNG" scale="100"
2686 fileref="../images/pbds_invalidation_tag_hierarchy.png"/>
2689 <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2697 <classname>basic_invalidation_guarantee</classname>
2698 corresponds to a basic guarantee that a point-type iterator,
2699 a found pointer, or a found reference, remains valid as long
2700 as the container object is not modified.
2706 <classname>point_invalidation_guarantee</classname>
2707 corresponds to a guarantee that a point-type iterator, a
2708 found pointer, or a found reference, remains valid even if
2709 the container object is modified.
2715 <classname>range_invalidation_guarantee</classname>
2716 corresponds to a guarantee that a range-type iterator remains
2717 valid even if the container object is modified.
2722 <para>To find the invalidation guarantee of a
2723 container, one can use</para>
2725 typename container_traits<Cntnr>::invalidation_guarantee
2728 <para>Note that this hierarchy corresponds to the logic it
2729 represents: if a container has range-invalidation guarantees,
2730 then it must also have find invalidation guarantees;
2731 correspondingly, its invalidation guarantee (in this case
2732 <classname>range_invalidation_guarantee</classname>)
2733 can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2734 This means that this this hierarchy can be used easily using
2735 standard metaprogramming techniques, by specializing on the
2736 type of <literal>invalidation_guarantee</literal>.</para>
2739 These types of problems were addressed, in a more general
2740 setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2741 our opinion, an invalidation-guarantee hierarchy would solve
2742 these problems in all container types - not just associative
2747 </section> <!-- iterator semantics -->
2749 <section xml:id="pbds.design.concepts.genericity">
2750 <info><title>Genericity</title></info>
2753 The design attempts to address the following problem of
2754 data-structure genericity. When writing a function manipulating
2755 a generic container object, what is the behavior of the object?
2759 template<typename Cntnr>
2761 some_op_sequence(Cntnr &r_container)
2768 then one needs to address the following questions in the body
2769 of <function>some_op_sequence</function>:
2775 Which types and methods does <literal>Cntnr</literal> support?
2776 Containers based on hash tables can be queries for the
2777 hash-functor type and object; this is meaningless for tree-based
2778 containers. Containers based on trees can be split, joined, or
2779 can erase iterators and return the following iterator; this
2780 cannot be done by hash-based containers.
2786 What are the exception and invalidation guarantees
2787 of <literal>Cntnr</literal>? A container based on a probing
2788 hash-table invalidates all iterators when it is modified; this
2789 is not the case for containers based on node-based
2790 trees. Containers based on a node-based tree can be split or
2791 joined without exceptions; this is not the case for containers
2792 based on vector-based trees.
2798 How does the container maintain its elements? Tree-based and
2799 Trie-based containers store elements by key order; others,
2800 typically, do not. A container based on a splay trees or lists
2801 with update policies "cache" "frequently accessed" elements;
2802 containers based on most other underlying data structures do
2808 How does one query a container about characteristics and
2809 capabilities? What is the relationship between two different
2810 data structures, if anything?
2815 <para>The remainder of this section explains these issues in
2819 <section xml:id="concepts.genericity.tag">
2820 <info><title>Tag</title></info>
2822 Tags are very useful for manipulating generic types. For example, if
2823 <literal>It</literal> is an iterator class, then <literal>typename
2824 It::iterator_category</literal> or <literal>typename
2825 std::iterator_traits<It>::iterator_category</literal> will
2826 yield its category, and <literal>typename
2827 std::iterator_traits<It>::value_type</literal> will yield its
2832 This library contains a container tag hierarchy corresponding to the
2837 <title>Container Tag Hierarchy</title>
2840 <imagedata align="center" format="PDF" scale="75"
2841 fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2844 <imagedata align="center" format="PNG" scale="100"
2845 fileref="../images/pbds_container_tag_hierarchy.png"/>
2848 <phrase>Container Tag Hierarchy</phrase>
2854 Given any container <type>Cntnr</type>, the tag of
2855 the underlying data structure can be found via <literal>typename
2856 Cntnr::container_category</literal>.
2859 </section> <!-- tag -->
2861 <section xml:id="concepts.genericity.traits">
2862 <info><title>Traits</title></info>
2865 <para>Additionally, a traits mechanism can be used to query a
2866 container type for its attributes. Given any container
2867 <literal>Cntnr</literal>, then <literal><Cntnr></literal>
2868 is a traits class identifying the properties of the
2871 <para>To find if a container can throw when a key is erased (which
2872 is true for vector-based trees, for example), one can
2875 <programlisting>container_traits<Cntnr>::erase_can_throw</programlisting>
2878 Some of the definitions in <classname>container_traits</classname>
2879 are dependent on other
2880 definitions. If <classname>container_traits<Cntnr>::order_preserving</classname>
2881 is <constant>true</constant> (which is the case for containers
2882 based on trees and tries), then the container can be split or
2884 case, <classname>container_traits<Cntnr>::split_join_can_throw</classname>
2885 indicates whether splits or joins can throw exceptions (which is
2886 true for vector-based trees);
2887 otherwise <classname>container_traits<Cntnr>::split_join_can_throw</classname>
2888 will yield a compilation error. (This is somewhat similar to a
2889 compile-time version of the COM model).
2892 </section> <!-- traits -->
2894 </section> <!-- genericity -->
2895 </section> <!-- concepts -->
2897 <section xml:id="pbds.design.container">
2898 <info><title>By Container</title></info>
2901 <section xml:id="pbds.design.container.hash">
2902 <info><title>hash</title></info>
2907 /// general terms / background
2908 /// range hashing policies
2909 /// ranged-hash policies
2915 /// trigger policies
2918 // policy interactions
2919 /// probe/size/trigger
2921 /// eq/hash/storing hash values
2922 /// size/load-check trigger
2924 <section xml:id="container.hash.interface">
2925 <info><title>Interface</title></info>
2930 The collision-chaining hash-based container has the
2931 following declaration.</para>
2936 typename Hash_Fn = std::hash<Key>,
2937 typename Eq_Fn = std::equal_to<Key>,
2938 typename Comb_Hash_Fn = direct_mask_range_hashing<>
2939 typename Resize_Policy = default explained below.
2940 bool Store_Hash = false,
2941 typename Allocator = std::allocator<char> >
2942 class cc_hash_table;
2945 <para>The parameters have the following meaning:</para>
2948 <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2950 <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2952 <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2954 <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2956 <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2957 it describes how to translate hash values into positions
2958 within the table. </para></listitem>
2960 <listitem><para><classname>Resize_Policy</classname> describes how a container object
2961 should change its internal size. </para></listitem>
2963 <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2964 should be stored with each entry. </para></listitem>
2966 <listitem><para><classname>Allocator</classname> is an allocator
2967 type.</para></listitem>
2970 <para>The probing hash-based container has the following
2976 typename Hash_Fn = std::hash<Key>,
2977 typename Eq_Fn = std::equal_to<Key>,
2978 typename Comb_Probe_Fn = direct_mask_range_hashing<>
2979 typename Probe_Fn = default explained below.
2980 typename Resize_Policy = default explained below.
2981 bool Store_Hash = false,
2982 typename Allocator = std::allocator<char> >
2983 class gp_hash_table;
2986 <para>The parameters are identical to those of the
2987 collision-chaining container, except for the following.</para>
2990 <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2991 sequence into a sequence of positions within the table.</para></listitem>
2993 <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2996 <para>Some of the default template values depend on the values of
2997 other parameters, and are explained below.</para>
3000 <section xml:id="container.hash.details">
3001 <info><title>Details</title></info>
3003 <section xml:id="container.hash.details.hash_policies">
3004 <info><title>Hash Policies</title></info>
3006 <section xml:id="details.hash_policies.general">
3007 <info><title>General</title></info>
3009 <para>Following is an explanation of some functions which hashing
3010 involves. The graphic below illustrates the discussion.</para>
3013 <title>Hash functions, ranged-hash functions, and
3014 range-hashing functions</title>
3017 <imagedata align="center" format="PNG" scale="100"
3018 fileref="../images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
3021 <phrase>Hash functions, ranged-hash functions, and
3022 range-hashing functions</phrase>
3027 <para>Let U be a domain (e.g., the integers, or the
3028 strings of 3 characters). A hash-table algorithm needs to map
3029 elements of U "uniformly" into the range [0,..., m -
3030 1] (where m is a non-negative integral value, and
3031 is, in general, time varying). I.e., the algorithm needs
3032 a ranged-hash function</para>
3035 f : U × Z<subscript>+</subscript> → Z<subscript>+</subscript>
3038 <para>such that for any u in U ,</para>
3040 <para>0 ≤ f(u, m) ≤ m - 1</para>
3042 <para>and which has "good uniformity" properties (say
3043 <xref linkend="biblio.knuth98sorting"/>.)
3045 common solution is to use the composition of the hash
3048 <para>h : U → Z<subscript>+</subscript> ,</para>
3050 <para>which maps elements of U into the non-negative
3051 integrals, and</para>
3053 <para>g : Z<subscript>+</subscript> × Z<subscript>+</subscript> →
3054 Z<subscript>+</subscript>,</para>
3056 <para>which maps a non-negative hash value, and a non-negative
3057 range upper-bound into a non-negative integral in the range
3058 between 0 (inclusive) and the range upper bound (exclusive),
3059 i.e., for any r in Z<subscript>+</subscript>,</para>
3061 <para>0 ≤ g(r, m) ≤ m - 1</para>
3064 <para>The resulting ranged-hash function, is</para>
3066 <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3068 <title>Ranged Hash Function</title>
3070 f(u , m) = g(h(u), m)
3074 <para>From the above, it is obvious that given g and
3075 h, f can always be composed (however the converse
3076 is not true). The standard's hash-based containers allow specifying
3077 a hash function, and use a hard-wired range-hashing function;
3078 the ranged-hash function is implicitly composed.</para>
3080 <para>The above describes the case where a key is to be mapped
3081 into a single position within a hash table, e.g.,
3082 in a collision-chaining table. In other cases, a key is to be
3083 mapped into a sequence of positions within a table,
3084 e.g., in a probing table. Similar terms apply in this
3085 case: the table requires a ranged probe function,
3086 mapping a key into a sequence of positions withing the table.
3087 This is typically achieved by composing a hash function
3088 mapping the key into a non-negative integral type, a
3089 probe function transforming the hash value into a
3090 sequence of hash values, and a range-hashing function
3091 transforming the sequence of hash values into a sequence of
3096 <section xml:id="details.hash_policies.range">
3097 <info><title>Range Hashing</title></info>
3099 <para>Some common choices for range-hashing functions are the
3100 division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3104 <title>Range-Hashing, Division Method</title>
3112 <para>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</para>
3116 <para>g(r, m) = ⌈ u/v ( r<superscript>2</superscript> mod v ) ⌉</para>
3118 <para>respectively, for some positive integrals u and
3119 v (typically powers of 2), and some a. Each of
3120 these range-hashing functions works best for some different
3123 <para>The division method (see above) is a
3124 very common choice. However, even this single method can be
3125 implemented in two very different ways. It is possible to
3126 implement using the low
3127 level % (modulo) operation (for any m), or the
3128 low level & (bit-mask) operation (for the case where
3129 m is a power of 2), i.e.,</para>
3132 <title>Division via Prime Modulo</title>
3141 <title>Division via Bit Mask</title>
3143 g(r, m) = r & m - 1, (with m =
3144 2<superscript>k</superscript> for some k)
3149 <para>respectively.</para>
3151 <para>The % (modulo) implementation has the advantage that for
3152 m a prime far from a power of 2, g(r, m) is
3153 affected by all the bits of r (minimizing the chance of
3154 collision). It has the disadvantage of using the costly modulo
3155 operation. This method is hard-wired into SGI's implementation
3158 <para>The & (bit-mask) implementation has the advantage of
3159 relying on the fast bit-wise and operation. It has the
3160 disadvantage that for g(r, m) is affected only by the
3161 low order bits of r. This method is hard-wired into
3162 Dinkumware's implementation.</para>
3167 <section xml:id="details.hash_policies.ranged">
3168 <info><title>Ranged Hash</title></info>
3170 <para>In cases it is beneficial to allow the
3171 client to directly specify a ranged-hash hash function. It is
3172 true, that the writer of the ranged-hash function cannot rely
3173 on the values of m having specific numerical properties
3174 suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3175 the values of m are determined by a resize policy with
3176 possibly orthogonal considerations.</para>
3178 <para>There are two cases where a ranged-hash function can be
3179 superior. The firs is when using perfect hashing: the
3180 second is when the values of m can be used to estimate
3181 the "general" number of distinct values required. This is
3182 described in the following.</para>
3187 s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3190 <para>be a string of t characters, each of which is from
3191 domain S. Consider the following ranged-hash
3195 A Standard String Hash Function
3198 f<subscript>1</subscript>(s, m) = ∑ <subscript>i =
3199 0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3204 <para>where a is some non-negative integral value. This is
3205 the standard string-hashing function used in SGI's
3206 implementation (with a = 5). Its advantage is that
3207 it takes into account all of the characters of the string.</para>
3209 <para>Now assume that s is the string representation of a
3210 of a long DNA sequence (and so S = {'A', 'C', 'G',
3211 'T'}). In this case, scanning the entire string might be
3212 prohibitively expensive. A possible alternative might be to use
3213 only the first k characters of the string, where</para>
3215 <para>|S|<superscript>k</superscript> ≥ m ,</para>
3217 <para>i.e., using the hash function</para>
3221 Only k String DNA Hash
3224 f<subscript>2</subscript>(s, m) = ∑ <subscript>i
3225 = 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3229 <para>requiring scanning over only</para>
3231 <para>k = log<subscript>4</subscript>( m )</para>
3233 <para>characters.</para>
3235 <para>Other more elaborate hash-functions might scan k
3236 characters starting at a random position (determined at each
3237 resize), or scanning k random positions (determined at
3238 each resize), i.e., using</para>
3240 <para>f<subscript>3</subscript>(s, m) = ∑ <subscript>i =
3241 r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3242 a<superscript>i</superscript> mod m ,</para>
3246 <para>f<subscript>4</subscript>(s, m) = ∑ <subscript>i = 0</subscript><superscript>k -
3247 1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3250 <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3251 each in the (inclusive) range [0,...,t-1].</para>
3253 <para>It should be noted that the above functions cannot be
3254 decomposed as per a ranged hash composed of hash and range hashing.</para>
3259 <section xml:id="details.hash_policies.implementation">
3260 <info><title>Implementation</title></info>
3262 <para>This sub-subsection describes the implementation of
3263 the above in this library. It first explains range-hashing
3264 functions in collision-chaining tables, then ranged-hash
3265 functions in collision-chaining tables, then probing-based
3266 tables, and finally lists the relevant classes in this
3269 <section xml:id="hash_policies.implementation.collision-chaining">
3271 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3275 <para><classname>cc_hash_table</classname> is
3276 parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3277 hash functor and a combining hash functor, respectively.</para>
3279 <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3280 range-hashing functor. <classname>cc_hash_table</classname>
3281 synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3282 <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3283 diagram for this case. The user inserts an element (point A),
3284 the container transforms the key into a non-negative integral
3285 using the hash functor (points B and C), and transforms the
3286 result into a position using the combining functor (points D
3290 <title>Insert hash sequence diagram</title>
3293 <imagedata align="center" format="PNG" scale="100"
3294 fileref="../images/pbds_hash_range_hashing_seq_diagram.png"/>
3297 <phrase>Insert hash sequence diagram</phrase>
3302 <para>If <classname>cc_hash_table</classname>'s
3303 hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3304 a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3305 diagram. The user inserts an element (point A), the container
3306 transforms the key into a position using the combining functor
3307 (points B and C).</para>
3310 <title>Insert hash sequence diagram with a null policy</title>
3313 <imagedata align="center" format="PNG" scale="100"
3314 fileref="../images/pbds_hash_range_hashing_seq_diagram2.png"/>
3317 <phrase>Insert hash sequence diagram with a null policy</phrase>
3324 <section xml:id="hash_policies.implementation.probe">
3328 <para><classname>gp_hash_table</classname> is parametrized by
3329 <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3330 and <classname>Comb_Probe_Fn</classname>. As before, if
3331 <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3332 are both <classname>null_type</classname>, then
3333 <classname>Comb_Probe_Fn</classname> is a ranged-probe
3334 functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3335 functor, <classname>Probe_Fn</classname> is a functor for offsets
3336 from a hash value, and <classname>Comb_Probe_Fn</classname>
3337 transforms a probe sequence into a sequence of positions within
3342 <section xml:id="hash_policies.implementation.predefined">
3344 Pre-Defined Policies
3347 <para>This library contains some pre-defined classes
3348 implementing range-hashing and probing functions:</para>
3351 <listitem><para><classname>direct_mask_range_hashing</classname>
3352 and <classname>direct_mod_range_hashing</classname>
3353 are range-hashing functions based on a bit-mask and a modulo
3354 operation, respectively.</para></listitem>
3356 <listitem><para><classname>linear_probe_fn</classname>, and
3357 <classname>quadratic_probe_fn</classname> are
3358 a linear probe and a quadratic probe function,
3359 respectively.</para></listitem>
3363 The graphic below shows the relationships.
3366 <title>Hash policy class diagram</title>
3369 <imagedata align="center" format="PNG" scale="100"
3370 fileref="../images/pbds_hash_policy_cd.png"/>
3373 <phrase>Hash policy class diagram</phrase>
3381 </section> <!-- impl -->
3385 <section xml:id="container.hash.details.resize_policies">
3386 <info><title>Resize Policies</title></info>
3388 <section xml:id="resize_policies.general">
3389 <info><title>General</title></info>
3391 <para>Hash-tables, as opposed to trees, do not naturally grow or
3392 shrink. It is necessary to specify policies to determine how
3393 and when a hash table should change its size. Usually, resize
3394 policies can be decomposed into orthogonal policies:</para>
3397 <listitem><para>A size policy indicating how a hash table
3398 should grow (e.g., it should multiply by powers of
3399 2).</para></listitem>
3401 <listitem><para>A trigger policy indicating when a hash
3402 table should grow (e.g., a load factor is
3403 exceeded).</para></listitem>
3408 <section xml:id="resize_policies.size">
3409 <info><title>Size Policies</title></info>
3412 <para>Size policies determine how a hash table changes size. These
3413 policies are simple, and there are relatively few sensible
3414 options. An exponential-size policy (with the initial size and
3415 growth factors both powers of 2) works well with a mask-based
3416 range-hashing function, and is the
3417 hard-wired policy used by Dinkumware. A
3418 prime-list based policy works well with a modulo-prime range
3419 hashing function and is the hard-wired policy used by SGI's
3420 implementation.</para>
3424 <section xml:id="resize_policies.trigger">
3425 <info><title>Trigger Policies</title></info>
3427 <para>Trigger policies determine when a hash table changes size.
3428 Following is a description of two policies: load-check
3429 policies, and collision-check policies.</para>
3431 <para>Load-check policies are straightforward. The user specifies
3432 two factors, Α<subscript>min</subscript> and
3433 Α<subscript>max</subscript>, and the hash table maintains the
3434 invariant that</para>
3436 <para>Α<subscript>min</subscript> ≤ (number of
3437 stored elements) / (hash-table size) ≤
3438 Α<subscript>max</subscript><remark>load factor min max</remark></para>
3440 <para>Collision-check policies work in the opposite direction of
3441 load-check policies. They focus on keeping the number of
3442 collisions moderate and hoping that the size of the table will
3443 not grow very large, instead of keeping a moderate load-factor
3444 and hoping that the number of collisions will be small. A
3445 maximal collision-check policy resizes when the longest
3446 probe-sequence grows too large.</para>
3448 <para>Consider the graphic below. Let the size of the hash table
3449 be denoted by m, the length of a probe sequence be denoted by k,
3450 and some load factor be denoted by Α. We would like to
3451 calculate the minimal length of k, such that if there were Α
3452 m elements in the hash table, a probe sequence of length k would
3453 be found with probability at most 1/m.</para>
3456 <title>Balls and bins</title>
3459 <imagedata align="center" format="PNG" scale="100"
3460 fileref="../images/pbds_balls_and_bins.png"/>
3463 <phrase>Balls and bins</phrase>
3468 <para>Denote the probability that a probe sequence of length
3469 k appears in bin i by p<subscript>i</subscript>, the
3470 length of the probe sequence of bin i by
3471 l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3477 Probability of Probe Sequence of Length k
3480 p<subscript>1</subscript> =
3484 <para>P(l<subscript>1</subscript> ≥ k) =</para>
3487 P(l<subscript>1</subscript> ≥ α ( 1 + k / α - 1) ≤ (a)
3491 e ^ ( - ( α ( k / α - 1 )<superscript>2</superscript> ) /2)
3494 <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3495 calculate the probability that some bin contains a probe
3496 sequence greater than k, we note that the
3497 l<subscript>i</subscript> are negatively-dependent
3498 (<xref linkend="biblio.dubhashi98neg"/>)
3500 I(.) denote the indicator function. Then</para>
3504 Probability Probe Sequence in Some Bin
3507 P( exists<subscript>i</subscript> l<subscript>i</subscript> ≥ k ) =
3511 <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript>
3512 I(l<subscript>i</subscript> ≥ k) ≥ 1 ) =</para>
3514 <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript> I (
3515 l<subscript>i</subscript> ≥ k ) ≥ m p<subscript>1</subscript> ( 1 + 1 / (m
3516 p<subscript>1</subscript>) - 1 ) ) ≤ (a)</para>
3518 <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3519 - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3521 <para>where (a) follows from the fact that the Chernoff bound can
3522 be applied to negatively-dependent variables (<xref
3523 linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3524 equation into the second one, and equating with 1/m, we
3528 <para>k ~ √ ( 2 α ln 2 m ln(m) )
3533 <section xml:id="resize_policies.impl">
3534 <info><title>Implementation</title></info>
3536 <para>This sub-subsection describes the implementation of the
3537 above in this library. It first describes resize policies and
3538 their decomposition into trigger and size policies, then
3539 describes pre-defined classes, and finally discusses controlled
3540 access the policies' internals.</para>
3542 <section xml:id="resize_policies.impl.decomposition">
3543 <info><title>Decomposition</title></info>
3546 <para>Each hash-based container is parametrized by a
3547 <classname>Resize_Policy</classname> parameter; the container derives
3548 <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3551 cc_hash_table<typename Key,
3554 typename Resize_Policy
3555 ...> : public Resize_Policy
3558 <para>As a container object is modified, it continuously notifies
3559 its <classname>Resize_Policy</classname> base of internal changes
3560 (e.g., collisions encountered and elements being
3561 inserted). It queries its <classname>Resize_Policy</classname> base whether
3562 it needs to be resized, and if so, to what size.</para>
3564 <para>The graphic below shows a (possible) sequence diagram
3565 of an insert operation. The user inserts an element; the hash
3566 table notifies its resize policy that a search has started
3567 (point A); in this case, a single collision is encountered -
3568 the table notifies its resize policy of this (point B); the
3569 container finally notifies its resize policy that the search
3570 has ended (point C); it then queries its resize policy whether
3571 a resize is needed, and if so, what is the new size (points D
3572 to G); following the resize, it notifies the policy that a
3573 resize has completed (point H); finally, the element is
3574 inserted, and the policy notified (point I).</para>
3577 <title>Insert resize sequence diagram</title>
3580 <imagedata align="center" format="PNG" scale="100"
3581 fileref="../images/pbds_insert_resize_sequence_diagram1.png"/>
3584 <phrase>Insert resize sequence diagram</phrase>
3590 <para>In practice, a resize policy can be usually orthogonally
3591 decomposed to a size policy and a trigger policy. Consequently,
3592 the library contains a single class for instantiating a resize
3593 policy: <classname>hash_standard_resize_policy</classname>
3594 is parametrized by <classname>Size_Policy</classname> and
3595 <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3596 both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3597 to these policies.</para>
3599 <para>The two graphics immediately below show sequence diagrams
3600 illustrating the interaction between the standard resize policy
3601 and its trigger and size policies, respectively.</para>
3604 <title>Standard resize policy trigger sequence
3608 <imagedata align="center" format="PNG" scale="100"
3609 fileref="../images/pbds_insert_resize_sequence_diagram2.png"/>
3612 <phrase>Standard resize policy trigger sequence
3619 <title>Standard resize policy size sequence
3623 <imagedata align="center" format="PNG" scale="100"
3624 fileref="../images/pbds_insert_resize_sequence_diagram3.png"/>
3627 <phrase>Standard resize policy size sequence
3636 <section xml:id="resize_policies.impl.predefined">
3637 <info><title>Predefined Policies</title></info>
3638 <para>The library includes the following
3639 instantiations of size and trigger policies:</para>
3642 <listitem><para><classname>hash_load_check_resize_trigger</classname>
3643 implements a load check trigger policy.</para></listitem>
3645 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3646 implements a collision check trigger policy.</para></listitem>
3648 <listitem><para><classname>hash_exponential_size_policy</classname>
3649 implements an exponential-size policy (which should be used
3650 with mask range hashing).</para></listitem>
3652 <listitem><para><classname>hash_prime_size_policy</classname>
3653 implementing a size policy based on a sequence of primes
3655 be used with mod range hashing</para></listitem>
3658 <para>The graphic below gives an overall picture of the resize-related
3659 classes. <classname>basic_hash_table</classname>
3660 is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3661 publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>.
3662 <classname>hash_standard_resize_policy</classname>
3663 itself is parametrized by <classname>Trigger_Policy</classname> and
3664 <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3665 instantiated by <classname>hash_load_check_resize_trigger</classname>,
3666 or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3667 <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3668 or <classname>hash_prime_size_policy</classname>.</para>
3672 <section xml:id="resize_policies.impl.internals">
3673 <info><title>Controling Access to Internals</title></info>
3675 <para>There are cases where (controlled) access to resize
3676 policies' internals is beneficial. E.g., it is sometimes
3677 useful to query a hash-table for the table's actual size (as
3678 opposed to its <function>size()</function> - the number of values it
3679 currently holds); it is sometimes useful to set a table's
3680 initial size, externally resize it, or change load factors.</para>
3682 <para>Clearly, supporting such methods both decreases the
3683 encapsulation of hash-based containers, and increases the
3684 diversity between different associative-containers' interfaces.
3685 Conversely, omitting such methods can decrease containers'
3688 <para>In order to avoid, to the extent possible, the above
3689 conflict, the hash-based containers themselves do not address
3690 any of these questions; this is deferred to the resize policies,
3691 which are easier to change or replace. Thus, for example,
3692 neither <classname>cc_hash_table</classname> nor
3693 <classname>gp_hash_table</classname>
3694 contain methods for querying the actual size of the table; this
3695 is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3697 <para>Furthermore, the policies themselves are parametrized by
3698 template arguments that determine the methods they support
3700 <xref linkend="biblio.alexandrescu01modern"/>
3701 shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3702 is parametrized by <classname>External_Size_Access</classname> that
3703 determines whether it supports methods for querying the actual
3704 size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3705 is parametrized by <classname>External_Load_Access</classname> that
3706 determines whether it supports methods for querying or
3707 modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3708 is parametrized by <classname>External_Load_Access</classname> that
3709 determines whether it supports methods for querying the
3712 <para>Some operations, for example, resizing a container at
3713 run time, or changing the load factors of a load-check trigger
3714 policy, require the container itself to resize. As mentioned
3715 above, the hash-based containers themselves do not contain
3716 these types of methods, only their resize policies.
3717 Consequently, there must be some mechanism for a resize policy
3718 to manipulate the hash-based container. As the hash-based
3719 container is a subclass of the resize policy, this is done
3720 through virtual methods. Each hash-based container has a
3721 <classname>private</classname> <classname>virtual</classname> method:</para>
3725 (size_type new_size);
3728 <para>which resizes the container. Implementations of
3729 <classname>Resize_Policy</classname> can export public methods for resizing
3730 the container externally; these methods internally call
3731 <classname>do_resize</classname> to resize the table.</para>
3739 </section> <!-- resize policies -->
3741 <section xml:id="container.hash.details.policy_interaction">
3742 <info><title>Policy Interactions</title></info>
3745 <para>Hash-tables are unfortunately especially susceptible to
3746 choice of policies. One of the more complicated aspects of this
3747 is that poor combinations of good policies can form a poor
3748 container. Following are some considerations.</para>
3750 <section xml:id="policy_interaction.probesizetrigger">
3751 <info><title>probe/size/trigger</title></info>
3753 <para>Some combinations do not work well for probing containers.
3754 For example, combining a quadratic probe policy with an
3755 exponential size policy can yield a poor container: when an
3756 element is inserted, a trigger policy might decide that there
3757 is no need to resize, as the table still contains unused
3758 entries; the probe sequence, however, might never reach any of
3759 the unused entries.</para>
3761 <para>Unfortunately, this library cannot detect such problems at
3762 compilation (they are halting reducible). It therefore defines
3763 an exception class <classname>insert_error</classname> to throw an
3764 exception in this case.</para>
3768 <section xml:id="policy_interaction.hashtrigger">
3769 <info><title>hash/trigger</title></info>
3771 <para>Some trigger policies are especially susceptible to poor
3772 hash functions. Suppose, as an extreme case, that the hash
3773 function transforms each key to the same hash value. After some
3774 inserts, a collision detecting policy will always indicate that
3775 the container needs to grow.</para>
3777 <para>The library, therefore, by design, limits each operation to
3778 one resize. For each <classname>insert</classname>, for example, it queries
3779 only once whether a resize is needed.</para>
3783 <section xml:id="policy_interaction.eqstorehash">
3784 <info><title>equivalence functors/storing hash values/hash</title></info>
3786 <para><classname>cc_hash_table</classname> and
3787 <classname>gp_hash_table</classname> are
3788 parametrized by an equivalence functor and by a
3789 <classname>Store_Hash</classname> parameter. If the latter parameter is
3790 <classname>true</classname>, then the container stores with each entry
3791 a hash value, and uses this value in case of collisions to
3792 determine whether to apply a hash value. This can lower the
3793 cost of collision for some types, but increase the cost of
3794 collisions for other types.</para>
3796 <para>If a ranged-hash function or ranged probe function is
3797 directly supplied, however, then it makes no sense to store the
3798 hash value with each entry. This library's container will
3799 fail at compilation, by design, if this is attempted.</para>
3803 <section xml:id="policy_interaction.sizeloadtrigger">
3804 <info><title>size/load-check trigger</title></info>
3806 <para>Assume a size policy issues an increasing sequence of sizes
3807 a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3808 example, an exponential size policy might issue the sequence of
3809 sizes 8, 16, 32, 64, ...</para>
3811 <para>If a load-check trigger policy is used, with loads
3812 α<subscript>min</subscript> and α<subscript>max</subscript>,
3813 respectively, then it is a good idea to have:</para>
3816 <listitem><para>α<subscript>max</subscript> ~ 1 / q</para></listitem>
3818 <listitem><para>α<subscript>min</subscript> < 1 / (2 q)</para></listitem>
3821 <para>This will ensure that the amortized hash cost of each
3822 modifying operation is at most approximately 3.</para>
3824 <para>α<subscript>min</subscript> ~ α<subscript>max</subscript> is, in
3825 any case, a bad choice, and α<subscript>min</subscript> >
3826 α <subscript>max</subscript> is horrendous.</para>
3832 </section> <!-- details -->
3834 </section> <!-- hash -->
3837 <section xml:id="pbds.design.container.tree">
3838 <info><title>tree</title></info>
3840 <section xml:id="container.tree.interface">
3841 <info><title>Interface</title></info>
3843 <para>The tree-based container has the following declaration:</para>
3848 typename Cmp_Fn = std::less<Key>,
3849 typename Tag = rb_tree_tag,
3851 typename Const_Node_Iterator,
3852 typename Node_Iterator,
3854 typename Allocator_>
3855 class Node_Update = null_node_update,
3856 typename Allocator = std::allocator<char> >
3860 <para>The parameters have the following meaning:</para>
3864 <para><classname>Key</classname> is the key type.</para></listitem>
3867 <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3870 <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3873 <para><classname>Tag</classname> specifies which underlying data structure
3874 to use.</para></listitem>
3877 <para><classname>Node_Update</classname> is a policy for updating node
3878 invariants.</para></listitem>
3881 <para><classname>Allocator</classname> is an allocator
3882 type.</para></listitem>
3885 <para>The <classname>Tag</classname> parameter specifies which underlying
3886 data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3887 <classname>ov_tree_tag</classname>,
3888 specifies an underlying red-black tree, splay tree, or
3889 ordered-vector tree, respectively; any other tag is illegal.
3890 Note that containers based on the former two contain more types
3891 and methods than the latter (e.g.,
3892 <classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3893 exception and invalidation guarantees.</para>
3897 <section xml:id="container.tree.details">
3898 <info><title>Details</title></info>
3900 <section xml:id="container.tree.node">
3901 <info><title>Node Invariants</title></info>
3904 <para>Consider the two trees in the graphic below, labels A and B. The first
3905 is a tree of floats; the second is a tree of pairs, each
3906 signifying a geometric line interval. Each element in a tree is refered to as a node of the tree. Of course, each of
3907 these trees can support the usual queries: the first can easily
3908 search for <classname>0.4</classname>; the second can easily search for
3909 <classname>std::make_pair(10, 41)</classname>.</para>
3911 <para>Each of these trees can efficiently support other queries.
3912 The first can efficiently determine that the 2rd key in the
3913 tree is <constant>0.3</constant>; the second can efficiently determine
3914 whether any of its intervals overlaps
3915 <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3916 applications or distributed file systems with leases, for
3917 example). It should be noted that an <classname>std::set</classname> can
3918 only solve these types of problems with linear complexity.</para>
3920 <para>In order to do so, each tree stores some metadata in
3921 each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3922 each node the size of the sub-tree rooted at the node; the
3923 second stores at each node the maximal endpoint of the
3924 intervals at the sub-tree rooted at the node.</para>
3927 <title>Tree node invariants</title>
3930 <imagedata align="center" format="PNG" scale="100"
3931 fileref="../images/pbds_tree_node_invariants.png"/>
3934 <phrase>Tree node invariants</phrase>
3939 <para>Supporting such trees is difficult for a number of
3943 <listitem><para>There must be a way to specify what a node's metadata
3944 should be (if any).</para></listitem>
3946 <listitem><para>Various operations can invalidate node
3947 invariants. The graphic below shows how a right rotation,
3948 performed on A, results in B, with nodes x and y having
3949 corrupted invariants (the grayed nodes in C). The graphic shows
3950 how an insert, performed on D, results in E, with nodes x and y
3951 having corrupted invariants (the grayed nodes in F). It is not
3952 feasible to know outside the tree the effect of an operation on
3953 the nodes of the tree.</para></listitem>
3955 <listitem><para>The search paths of standard associative containers are
3956 defined by comparisons between keys, and not through
3957 metadata.</para></listitem>
3959 <listitem><para>It is not feasible to know in advance which methods trees
3960 can support. Besides the usual <classname>find</classname> method, the
3961 first tree can support a <classname>find_by_order</classname> method, while
3962 the second can support an <classname>overlaps</classname> method.</para></listitem>
3966 <title>Tree node invalidation</title>
3969 <imagedata align="center" format="PNG" scale="100"
3970 fileref="../images/pbds_tree_node_invalidations.png"/>
3973 <phrase>Tree node invalidation</phrase>
3978 <para>These problems are solved by a combination of two means:
3979 node iterators, and template-template node updater
3982 <section xml:id="container.tree.node.iterators">
3983 <info><title>Node Iterators</title></info>
3986 <para>Each tree-based container defines two additional iterator
3987 types, <classname>const_node_iterator</classname>
3988 and <classname>node_iterator</classname>.
3989 These iterators allow descending from a node to one of its
3990 children. Node iterator allow search paths different than those
3991 determined by the comparison functor. The <classname>tree</classname>
3992 supports the methods:</para>
4007 <para>The first pairs return node iterators corresponding to the
4008 root node of the tree; the latter pair returns node iterators
4009 corresponding to a just-after-leaf node.</para>
4012 <section xml:id="container.tree.node.updator">
4013 <info><title>Node Updator</title></info>
4015 <para>The tree-based containers are parametrized by a
4016 <classname>Node_Update</classname> template-template parameter. A
4017 tree-based container instantiates
4018 <classname>Node_Update</classname> to some
4019 <classname>node_update</classname> class, and publicly subclasses
4020 <classname>node_update</classname>. The graphic below shows this
4021 scheme, as well as some predefined policies (which are explained
4025 <title>A tree and its update policy</title>
4028 <imagedata align="center" format="PNG" scale="100"
4029 fileref="../images/pbds_tree_node_updator_policy_cd.png"/>
4032 <phrase>A tree and its update policy</phrase>
4037 <para><classname>node_update</classname> (an instantiation of
4038 <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4039 the type of metadata it requires. For order statistics,
4040 e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4041 The tree defines within each node a <classname>metadata_type</classname>
4044 <para><classname>node_update</classname> must also define the following method
4045 for restoring node invariants:</para>
4048 operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4051 <para>In this method, <varname>nd_it</varname> is a
4052 <classname>node_iterator</classname> corresponding to a node whose
4053 A) all descendants have valid invariants, and B) its own
4054 invariants might be violated; <classname>end_nd_it</classname> is
4055 a <classname>const_node_iterator</classname> corresponding to a
4056 just-after-leaf node. This method should correct the node
4057 invariants of the node pointed to by
4058 <classname>nd_it</classname>. For example, say node x in the
4059 graphic below label A has an invalid invariant, but its' children,
4060 y and z have valid invariants. After the invocation, all three
4061 nodes should have valid invariants, as in label B.</para>
4065 <title>Restoring node invariants</title>
4068 <imagedata align="center" format="PNG" scale="100"
4069 fileref="../images/pbds_restoring_node_invariants.png"/>
4072 <phrase>Restoring node invariants</phrase>
4077 <para>When a tree operation might invalidate some node invariant,
4078 it invokes this method in its <classname>node_update</classname> base to
4079 restore the invariant. For example, the graphic below shows
4080 an <function>insert</function> operation (point A); the tree performs some
4081 operations, and calls the update functor three times (points B,
4082 C, and D). (It is well known that any <function>insert</function>,
4083 <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4084 all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4088 <title>Insert update sequence</title>
4091 <imagedata align="center" format="PNG" scale="100"
4092 fileref="../images/pbds_update_seq_diagram.png"/>
4095 <phrase>Insert update sequence</phrase>
4100 <para>To complete the description of the scheme, three questions
4101 need to be answered:</para>
4104 <listitem><para>How can a tree which supports order statistics define a
4105 method such as <classname>find_by_order</classname>?</para></listitem>
4107 <listitem><para>How can the node updater base access methods of the
4108 tree?</para></listitem>
4110 <listitem><para>How can the following cyclic dependency be resolved?
4111 <classname>node_update</classname> is a base class of the tree, yet it
4112 uses node iterators defined in the tree (its child).</para></listitem>
4115 <para>The first two questions are answered by the fact that
4116 <classname>node_update</classname> (an instantiation of
4117 <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4118 of the tree. Consequently:</para>
4121 <listitem><para>Any public methods of
4122 <classname>node_update</classname> are automatically methods of
4123 the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4124 Thus an order-statistics node updater,
4125 <classname>tree_order_statistics_node_update</classname> defines
4126 the <function>find_by_order</function> method; any tree
4127 instantiated by this policy consequently supports this method as
4128 well.</para></listitem>
4130 <listitem><para>In C++, if a base class declares a method as
4131 <literal>virtual</literal>, it is
4132 <literal>virtual</literal> in its subclasses. If
4133 <classname>node_update</classname> needs to access one of the
4134 tree's methods, say the member function
4135 <function>end</function>, it simply declares that method as
4136 <literal>virtual</literal> abstract.</para></listitem>
4139 <para>The cyclic dependency is solved through template-template
4140 parameters. <classname>Node_Update</classname> is parametrized by
4141 the tree's node iterators, its comparison functor, and its
4142 allocator type. Thus, instantiations of
4143 <classname>Node_Update</classname> have all information
4146 <para>This library assumes that constructing a metadata object and
4147 modifying it are exception free. Suppose that during some method,
4148 say <classname>insert</classname>, a metadata-related operation
4149 (e.g., changing the value of a metadata) throws an exception. Ack!
4150 Rolling back the method is unusually complex.</para>
4152 <para>Previously, a distinction was made between redundant
4153 policies and null policies. Node invariants show a
4154 case where null policies are required.</para>
4156 <para>Assume a regular tree is required, one which need not
4157 support order statistics or interval overlap queries.
4158 Seemingly, in this case a redundant policy - a policy which
4159 doesn't affect nodes' contents would suffice. This, would lead
4160 to the following drawbacks:</para>
4163 <listitem><para>Each node would carry a useless metadata object, wasting
4164 space.</para></listitem>
4166 <listitem><para>The tree cannot know if its
4167 <classname>Node_Update</classname> policy actually modifies a
4168 node's metadata (this is halting reducible). In the graphic
4169 below, assume the shaded node is inserted. The tree would have
4170 to traverse the useless path shown to the root, applying
4171 redundant updates all the way.</para></listitem>
4174 <title>Useless update path</title>
4177 <imagedata align="center" format="PNG" scale="100"
4178 fileref="../images/pbds_rationale_null_node_updator.png"/>
4181 <phrase>Useless update path</phrase>
4187 <para>A null policy class, <classname>null_node_update</classname>
4188 solves both these problems. The tree detects that node
4189 invariants are irrelevant, and defines all accordingly.</para>
4195 <section xml:id="container.tree.details.split">
4196 <info><title>Split and Join</title></info>
4198 <para>Tree-based containers support split and join methods.
4199 It is possible to split a tree so that it passes
4200 all nodes with keys larger than a given key to a different
4201 tree. These methods have the following advantages over the
4202 alternative of externally inserting to the destination
4203 tree and erasing from the source tree:</para>
4206 <listitem><para>These methods are efficient - red-black trees are split
4207 and joined in poly-logarithmic complexity; ordered-vector
4208 trees are split and joined at linear complexity. The
4209 alternatives have super-linear complexity.</para></listitem>
4211 <listitem><para>Aside from orders of growth, these operations perform
4212 few allocations and de-allocations. For red-black trees, allocations are not performed,
4213 and the methods are exception-free. </para></listitem>
4217 </section> <!-- details -->
4219 </section> <!-- tree -->
4222 <section xml:id="pbds.design.container.trie">
4223 <info><title>Trie</title></info>
4225 <section xml:id="container.trie.interface">
4226 <info><title>Interface</title></info>
4228 <para>The trie-based container has the following declaration:</para>
4230 template<typename Key,
4232 typename Cmp_Fn = std::less<Key>,
4233 typename Tag = pat_trie_tag,
4234 template<typename Const_Node_Iterator,
4235 typename Node_Iterator,
4236 typename E_Access_Traits_,
4237 typename Allocator_>
4238 class Node_Update = null_node_update,
4239 typename Allocator = std::allocator<char> >
4243 <para>The parameters have the following meaning:</para>
4246 <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4248 <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4250 <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4252 <listitem><para><classname>Tag</classname> specifies which underlying data structure
4253 to use, and is described shortly.</para></listitem>
4255 <listitem><para><classname>Node_Update</classname> is a policy for updating node
4256 invariants. This is described below.</para></listitem>
4258 <listitem><para><classname>Allocator</classname> is an allocator
4259 type.</para></listitem>
4262 <para>The <classname>Tag</classname> parameter specifies which underlying
4263 data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4264 underlying PATRICIA trie (explained shortly); any other tag is
4265 currently illegal.</para>
4267 <para>Following is a description of a (PATRICIA) trie
4268 (this implementation follows <xref linkend="biblio.okasaki98mereable"/> and
4269 <xref linkend="biblio.filliatre2000ptset"/>).
4272 <para>A (PATRICIA) trie is similar to a tree, but with the
4273 following differences:</para>
4276 <listitem><para>It explicitly views keys as a sequence of elements.
4277 E.g., a trie can view a string as a sequence of
4278 characters; a trie can view a number as a sequence of
4279 bits.</para></listitem>
4281 <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4282 + 1, where n is the number of distinct
4283 elements.</para></listitem>
4285 <listitem><para>It stores values only at leaf nodes.</para></listitem>
4287 <listitem><para>Internal nodes have the properties that A) each has at
4288 least two children, and B) each shares the same prefix with
4289 any of its descendant.</para></listitem>
4292 <para>A (PATRICIA) trie has some useful properties:</para>
4295 <listitem><para>It can be configured to use large node fan-out, giving it
4296 very efficient find performance (albeit at insertion
4297 complexity and size).</para></listitem>
4299 <listitem><para>It works well for common-prefix keys.</para></listitem>
4301 <listitem><para>It can support efficiently queries such as which
4302 keys match a certain prefix. This is sometimes useful in file
4303 systems and routers, and for "type-ahead" aka predictive text matching
4304 on mobile devices.</para></listitem>
4310 <section xml:id="container.trie.details">
4311 <info><title>Details</title></info>
4313 <section xml:id="container.trie.details.etraits">
4314 <info><title>Element Access Traits</title></info>
4316 <para>A trie inherently views its keys as sequences of elements.
4317 For example, a trie can view a string as a sequence of
4318 characters. A trie needs to map each of n elements to a
4319 number in {0, n - 1}. For example, a trie can map a
4320 character <varname>c</varname> to
4321 <programlisting>static_cast<size_t>(c)</programlisting>.</para>
4323 <para>Seemingly, then, a trie can assume that its keys support
4324 (const) iterators, and that the <classname>value_type</classname> of this
4325 iterator can be cast to a <classname>size_t</classname>. There are several
4326 reasons, though, to decouple the mechanism by which the trie
4327 accesses its keys' elements from the trie:</para>
4330 <listitem><para>In some cases, the numerical value of an element is
4331 inappropriate. Consider a trie storing DNA strings. It is
4332 logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4333 'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4335 <listitem><para>In some cases the keys' iterators are different than what
4336 is needed. For example, a trie can be used to search for
4337 common suffixes, by using strings'
4338 <classname>reverse_iterator</classname>. As another example, a trie mapping
4339 UNICODE strings would have a huge fan-out if each node would
4340 branch on a UNICODE character; instead, one can define an
4341 iterator iterating over 8-bit (or less) groups.</para></listitem>
4345 consequently, parametrized by <classname>E_Access_Traits</classname> -
4346 traits which instruct how to access sequences' elements.
4347 <classname>string_trie_e_access_traits</classname>
4348 is a traits class for strings. Each such traits define some
4351 typename E_Access_Traits::const_iterator
4354 <para>is a const iterator iterating over a key's elements. The
4355 traits class must also define methods for obtaining an iterator
4356 to the first and last element of a key.</para>
4358 <para>The graphic below shows a
4359 (PATRICIA) trie resulting from inserting the words: "I wish
4360 that I could ever see a poem lovely as a trie" (which,
4361 unfortunately, does not rhyme).</para>
4363 <para>The leaf nodes contain values; each internal node contains
4364 two <classname>typename E_Access_Traits::const_iterator</classname>
4365 objects, indicating the maximal common prefix of all keys in
4366 the sub-tree. For example, the shaded internal node roots a
4367 sub-tree with leafs "a" and "as". The maximal common prefix is
4368 "a". The internal node contains, consequently, to const
4369 iterators, one pointing to <varname>'a'</varname>, and the other to
4370 <varname>'s'</varname>.</para>
4373 <title>A PATRICIA trie</title>
4376 <imagedata align="center" format="PNG" scale="100"
4377 fileref="../images/pbds_pat_trie.png"/>
4380 <phrase>A PATRICIA trie</phrase>
4387 <section xml:id="container.trie.details.node">
4388 <info><title>Node Invariants</title></info>
4390 <para>Trie-based containers support node invariants, as do
4391 tree-based containers. There are two minor
4392 differences, though, which, unfortunately, thwart sharing them
4393 sharing the same node-updating policies:</para>
4397 <para>A trie's <classname>Node_Update</classname> template-template
4398 parameter is parametrized by <classname>E_Access_Traits</classname>, while
4399 a tree's <classname>Node_Update</classname> template-template parameter is
4400 parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4402 <listitem><para>Tree-based containers store values in all nodes, while
4403 trie-based containers (at least in this implementation) store
4404 values in leafs.</para></listitem>
4407 <para>The graphic below shows the scheme, as well as some predefined
4408 policies (which are explained below).</para>
4411 <title>A trie and its update policy</title>
4414 <imagedata align="center" format="PNG" scale="100"
4415 fileref="../images/pbds_trie_node_updator_policy_cd.png"/>
4418 <phrase>A trie and its update policy</phrase>
4424 <para>This library offers the following pre-defined trie node
4425 updating policies:</para>
4430 <classname>trie_order_statistics_node_update</classname>
4431 supports order statistics.
4435 <listitem><para><classname>trie_prefix_search_node_update</classname>
4436 supports searching for ranges that match a given prefix.</para></listitem>
4438 <listitem><para><classname>null_node_update</classname>
4439 is the null node updater.</para></listitem>
4444 <section xml:id="container.trie.details.split">
4445 <info><title>Split and Join</title></info>
4446 <para>Trie-based containers support split and join methods; the
4447 rationale is equal to that of tree-based containers supporting
4448 these methods.</para>
4451 </section> <!-- details -->
4453 </section> <!-- trie -->
4455 <!-- list_update -->
4456 <section xml:id="pbds.design.container.list">
4457 <info><title>List</title></info>
4459 <section xml:id="container.list.interface">
4460 <info><title>Interface</title></info>
4462 <para>The list-based container has the following declaration:</para>
4464 template<typename Key,
4466 typename Eq_Fn = std::equal_to<Key>,
4467 typename Update_Policy = move_to_front_lu_policy<>,
4468 typename Allocator = std::allocator<char> >
4472 <para>The parameters have the following meaning:</para>
4477 <classname>Key</classname> is the key type.
4483 <classname>Mapped</classname> is the mapped-policy.
4489 <classname>Eq_Fn</classname> is a key equivalence functor.
4495 <classname>Update_Policy</classname> is a policy updating positions in
4496 the list based on access patterns. It is described in the
4497 following subsection.
4503 <classname>Allocator</classname> is an allocator type.
4508 <para>A list-based associative container is a container that
4509 stores elements in a linked-list. It does not order the elements
4510 by any particular order related to the keys. List-based
4511 containers are primarily useful for creating "multimaps". In fact,
4512 list-based containers are designed in this library expressly for
4513 this purpose.</para>
4515 <para>List-based containers might also be useful for some rare
4516 cases, where a key is encapsulated to the extent that only
4517 key-equivalence can be tested. Hash-based containers need to know
4518 how to transform a key into a size type, and tree-based containers
4519 need to know if some key is larger than another. List-based
4520 associative containers, conversely, only need to know if two keys
4521 are equivalent.</para>
4523 <para>Since a list-based associative container does not order
4524 elements by keys, is it possible to order the list in some
4525 useful manner? Remarkably, many on-line competitive
4526 algorithms exist for reordering lists to reflect access
4527 prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4532 <section xml:id="container.list.details">
4533 <info><title>Details</title></info>
4536 <section xml:id="container.list.details.ds">
4537 <info><title>Underlying Data Structure</title></info>
4539 <para>The graphic below shows a
4540 simple list of integer keys. If we search for the integer 6, we
4541 are paying an overhead: the link with key 6 is only the fifth
4542 link; if it were the first link, it could be accessed
4546 <title>A simple list</title>
4549 <imagedata align="center" format="PNG" scale="100"
4550 fileref="../images/pbds_simple_list.png"/>
4553 <phrase>A simple list</phrase>
4558 <para>List-update algorithms reorder lists as elements are
4559 accessed. They try to determine, by the access history, which
4560 keys to move to the front of the list. Some of these algorithms
4561 require adding some metadata alongside each entry.</para>
4563 <para>For example, in the graphic below label A shows the counter
4564 algorithm. Each node contains both a key and a count metadata
4565 (shown in bold). When an element is accessed (e.g. 6) its count is
4566 incremented, as shown in label B. If the count reaches some
4567 predetermined value, say 10, as shown in label C, the count is set
4568 to 0 and the node is moved to the front of the list, as in label
4573 <title>The counter algorithm</title>
4576 <imagedata align="center" format="PNG" scale="100"
4577 fileref="../images/pbds_list_update.png"/>
4580 <phrase>The counter algorithm</phrase>
4588 <section xml:id="container.list.details.policies">
4589 <info><title>Policies</title></info>
4591 <para>this library allows instantiating lists with policies
4592 implementing any algorithm moving nodes to the front of the
4593 list (policies implementing algorithms interchanging nodes are
4594 unsupported).</para>
4596 <para>Associative containers based on lists are parametrized by a
4597 <classname>Update_Policy</classname> parameter. This parameter defines the
4598 type of metadata each node contains, how to create the
4599 metadata, and how to decide, using this metadata, whether to
4600 move a node to the front of the list. A list-based associative
4601 container object derives (publicly) from its update policy.
4604 <para>An instantiation of <classname>Update_Policy</classname> must define
4605 internally <classname>update_metadata</classname> as the metadata it
4606 requires. Internally, each node of the list contains, besides
4607 the usual key and data, an instance of <classname>typename
4608 Update_Policy::update_metadata</classname>.</para>
4610 <para>An instantiation of <classname>Update_Policy</classname> must define
4611 internally two operators:</para>
4617 operator()(update_metadata &);
4620 <para>The first is called by the container object, when creating a
4621 new node, to create the node's metadata. The second is called
4622 by the container object, when a node is accessed (
4623 when a find operation's key is equivalent to the key of the
4624 node), to determine whether to move the node to the front of
4628 <para>The library contains two predefined implementations of
4629 list-update policies. The first
4630 is <classname>lu_counter_policy</classname>, which implements the
4631 counter algorithm described above. The second is
4632 <classname>lu_move_to_front_policy</classname>,
4633 which unconditionally move an accessed element to the front of
4634 the list. The latter type is very useful in this library,
4635 since there is no need to associate metadata with each element.
4636 (See <xref linkend="biblio.andrew04mtf"/>
4641 <section xml:id="container.list.details.mapped">
4642 <info><title>Use in Multimaps</title></info>
4644 <para>In this library, there are no equivalents for the standard's
4645 multimaps and multisets; instead one uses an associative
4646 container mapping primary keys to secondary keys.</para>
4648 <para>List-based containers are especially useful as associative
4649 containers for secondary keys. In fact, they are implemented
4650 here expressly for this purpose.</para>
4652 <para>To begin with, these containers use very little per-entry
4653 structure memory overhead, since they can be implemented as
4654 singly-linked lists. (Arrays use even lower per-entry memory
4655 overhead, but they are less flexible in moving around entries,
4656 and have weaker invalidation guarantees).</para>
4658 <para>More importantly, though, list-based containers use very
4659 little per-container memory overhead. The memory overhead of an
4660 empty list-based container is practically that of a pointer.
4661 This is important for when they are used as secondary
4662 associative-containers in situations where the average ratio of
4663 secondary keys to primary keys is low (or even 1).</para>
4665 <para>In order to reduce the per-container memory overhead as much
4666 as possible, they are implemented as closely as possible to
4667 singly-linked lists.</para>
4672 List-based containers do not store internally the number
4673 of values that they hold. This means that their <function>size</function>
4674 method has linear complexity (just like <classname>std::list</classname>).
4675 Note that finding the number of equivalent-key values in a
4676 standard multimap also has linear complexity (because it must be
4677 done, via <function>std::distance</function> of the
4678 multimap's <function>equal_range</function> method), but usually with
4685 Most associative-container objects each hold a policy
4686 object (a hash-based container object holds a
4687 hash functor). List-based containers, conversely, only have
4688 class-wide policy objects.
4696 </section> <!-- details -->
4698 </section> <!-- list -->
4701 <!-- priority_queue -->
4702 <section xml:id="pbds.design.container.priority_queue">
4703 <info><title>Priority Queue</title></info>
4705 <section xml:id="container.priority_queue.interface">
4706 <info><title>Interface</title></info>
4708 <para>The priority queue container has the following
4712 template<typename Value_Type,
4713 typename Cmp_Fn = std::less<Value_Type>,
4714 typename Tag = pairing_heap_tag,
4715 typename Allocator = std::allocator<char > >
4716 class priority_queue;
4719 <para>The parameters have the following meaning:</para>
4722 <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4724 <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4726 <listitem><para><classname>Tag</classname> specifies which underlying data structure
4727 to use.</para></listitem>
4729 <listitem><para><classname>Allocator</classname> is an allocator
4730 type.</para></listitem>
4733 <para>The <classname>Tag</classname> parameter specifies which underlying
4734 data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4735 <classname>binomial_heap_tag</classname>,
4736 <classname>rc_binomial_heap_tag</classname>,
4737 or <classname>thin_heap_tag</classname>,
4738 specifies, respectively,
4739 an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4740 binary heap (<xref linkend="biblio.clrs2001"/>),
4741 binomial heap (<xref linkend="biblio.clrs2001"/>),
4742 a binomial heap with a redundant binary counter (<xref linkend="biblio.maverik_lowerbounds"/>),
4743 or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4747 As mentioned in the tutorial,
4748 <classname>__gnu_pbds::priority_queue</classname> shares most of the
4749 same interface with <classname>std::priority_queue</classname>.
4750 E.g. if <varname>q</varname> is a priority queue of type
4751 <classname>Q</classname>, then <function>q.top()</function> will
4752 return the "largest" value in the container (according to
4754 Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4755 has a larger (and very slightly different) interface than
4756 <classname>std::priority_queue</classname>, however, since typically
4757 <classname>push</classname> and <classname>pop</classname> are deemed
4758 insufficient for manipulating priority-queues. </para>
4760 <para>Different settings require different priority-queue
4761 implementations which are described in later; see traits
4762 discusses ways to differentiate between the different traits of
4763 different implementations.</para>
4768 <section xml:id="container.priority_queue.details">
4769 <info><title>Details</title></info>
4771 <section xml:id="container.priority_queue.details.iterators">
4772 <info><title>Iterators</title></info>
4774 <para>There are many different underlying-data structures for
4775 implementing priority queues. Unfortunately, most such
4776 structures are oriented towards making <function>push</function> and
4777 <function>top</function> efficient, and consequently don't allow efficient
4778 access of other elements: for instance, they cannot support an efficient
4779 <function>find</function> method. In the use case where it
4780 is important to both access and "do something with" an
4781 arbitrary value, one would be out of luck. For example, many graph algorithms require
4782 modifying a value (typically increasing it in the sense of the
4783 priority queue's comparison functor).</para>
4785 <para>In order to access and manipulate an arbitrary value in a
4786 priority queue, one needs to reference the internals of the
4787 priority queue from some form of an associative container -
4788 this is unavoidable. Of course, in order to maintain the
4789 encapsulation of the priority queue, this needs to be done in a
4790 way that minimizes exposure to implementation internals.</para>
4792 <para>In this library the priority queue's <function>insert</function>
4793 method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4794 <function>erase</function> operations. This both preserves the priority
4795 queue's encapsulation, and allows accessing arbitrary values (since the
4796 returned iterators from the <function>push</function> operation can be
4797 stored in some form of associative container).</para>
4799 <para>Priority queues' iterators present a problem regarding their
4800 invalidation guarantees. One assumes that calling
4801 <function>operator++</function> on an iterator will associate it
4802 with the "next" value. Priority-queues are
4803 self-organizing: each operation changes what the "next" value
4804 means. Consequently, it does not make sense that <function>push</function>
4805 will return an iterator that can be incremented - this can have
4806 no possible use. Also, as in the case of hash-based containers,
4807 it is awkward to define if a subsequent <function>push</function> operation
4808 invalidates a prior returned iterator: it invalidates it in the
4809 sense that its "next" value is not related to what it
4810 previously considered to be its "next" value. However, it might not
4811 invalidate it, in the sense that it can be
4812 de-referenced and used for <function>modify</function> and <function>erase</function>
4815 <para>Similarly to the case of the other unordered associative
4816 containers, this library uses a distinction between
4817 point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4818 converted to a <classname>point_iterator</classname>, and a
4819 <classname>const_iterator</classname> can always be converted to a
4820 <classname>point_const_iterator</classname>.</para>
4822 <para>The following snippet demonstrates manipulating an arbitrary
4825 // A priority queue of integers.
4826 priority_queue<int > p;
4828 // Insert some values into the priority queue.
4829 priority_queue<int >::point_iterator it = p.push(0);
4834 // Now modify a value.
4837 assert(p.top() == 3);
4841 <para>It should be noted that an alternative design could embed an
4842 associative container in a priority queue. Could, but most
4843 probably should not. To begin with, it should be noted that one
4844 could always encapsulate a priority queue and an associative
4845 container mapping values to priority queue iterators with no
4846 performance loss. One cannot, however, "un-encapsulate" a priority
4847 queue embedding an associative container, which might lead to
4848 performance loss. Assume, that one needs to associate each value
4849 with some data unrelated to priority queues. Then using
4850 this library's design, one could use an
4851 associative container mapping each value to a pair consisting of
4852 this data and a priority queue's iterator. Using the embedded
4853 method would need to use two associative containers. Similar
4854 problems might arise in cases where a value can reside
4855 simultaneously in many priority queues.</para>
4860 <section xml:id="container.priority_queue.details.d">
4861 <info><title>Underlying Data Structure</title></info>
4863 <para>There are three main implementations of priority queues: the
4864 first employs a binary heap, typically one which uses a
4865 sequence; the second uses a tree (or forest of trees), which is
4866 typically less structured than an associative container's tree;
4867 the third simply uses an associative container. These are
4868 shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4871 <title>Underlying Priority-Queue Data-Structures.</title>
4874 <imagedata align="center" format="PNG" scale="100"
4875 fileref="../images/pbds_priority_queue_different_underlying_dss.png"/>
4878 <phrase>Underlying Priority-Queue Data-Structures.</phrase>
4883 <para>Roughly speaking, any value that is both pushed and popped
4884 from a priority queue must incur a logarithmic expense (in the
4885 amortized sense). Any priority queue implementation that would
4886 avoid this, would violate known bounds on comparison-based
4887 sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4890 <para>Most implementations do
4891 not differ in the asymptotic amortized complexity of
4892 <function>push</function> and <function>pop</function> operations, but they differ in
4893 the constants involved, in the complexity of other operations
4894 (e.g., <function>modify</function>), and in the worst-case
4895 complexity of single operations. In general, the more
4896 "structured" an implementation (i.e., the more internal
4897 invariants it possesses) - the higher its amortized complexity
4898 of <function>push</function> and <function>pop</function> operations.</para>
4900 <para>This library implements different algorithms using a
4901 single class: <classname>priority_queue</classname>.
4902 Instantiating the <classname>Tag</classname> template parameter, "selects"
4903 the implementation:</para>
4907 Instantiating <classname>Tag = binary_heap_tag</classname> creates
4908 a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4909 selected by priority_queue
4910 if <classname>Value_Type</classname> is instantiated by a primitive type
4911 (e.g., an <type>int</type>); the latter is
4912 internally selected for all other types (e.g.,
4913 <classname>std::string</classname>). This implementations is relatively
4914 unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4915 performance; it is the "best-in-kind" for primitive
4916 types, e.g., <type>int</type>s. Conversely, it has
4917 high worst-case performance, and can support only linear-time
4918 <function>modify</function> and <function>erase</function> operations.</para></listitem>
4920 <listitem><para>Instantiating <classname>Tag =
4921 pairing_heap_tag</classname> creates a pairing heap of the form
4922 in represented by label B in the graphic above. This
4923 implementations too is relatively unstructured, and so has good
4924 <function>push</function> and <function>pop</function>
4925 performance; it is the "best-in-kind" for non-primitive types,
4926 e.g., <classname>std:string</classname>s. It also has very good
4927 worst-case <function>push</function> and
4928 <function>join</function> performance (O(1)), but has high
4929 worst-case <function>pop</function>
4930 complexity.</para></listitem>
4932 <listitem><para>Instantiating <classname>Tag =
4933 binomial_heap_tag</classname> creates a binomial heap of the
4934 form repsented by label B in the graphic above. This
4935 implementations is more structured than a pairing heap, and so
4936 has worse <function>push</function> and <function>pop</function>
4937 performance. Conversely, it has sub-linear worst-case bounds for
4938 <function>pop</function>, e.g., and so it might be preferred in
4939 cases where responsiveness is important.</para></listitem>
4941 <listitem><para>Instantiating <classname>Tag =
4942 rc_binomial_heap_tag</classname> creates a binomial heap of the
4943 form represented in label B above, accompanied by a redundant
4944 counter which governs the trees. This implementations is
4945 therefore more structured than a binomial heap, and so has worse
4946 <function>push</function> and <function>pop</function>
4947 performance. Conversely, it guarantees O(1)
4948 <function>push</function> complexity, and so it might be
4949 preferred in cases where the responsiveness of a binomial heap
4950 is insufficient.</para></listitem>
4952 <listitem><para>Instantiating <classname>Tag =
4953 thin_heap_tag</classname> creates a thin heap of the form
4954 represented by the label B in the graphic above. This
4955 implementations too is more structured than a pairing heap, and
4956 so has worse <function>push</function> and
4957 <function>pop</function> performance. Conversely, it has better
4958 worst-case and identical amortized complexities than a Fibonacci
4959 heap, and so might be more appropriate for some graph
4960 algorithms.</para></listitem>
4963 <para>Of course, one can use any order-preserving associative
4964 container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4965 over the associative container (much as
4966 <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4967 This has the advantage that no cross-referencing is necessary
4968 at all; the priority queue itself is an associative container.
4969 Most associative containers are too structured to compete with
4970 priority queues in terms of <function>push</function> and <function>pop</function>
4977 <section xml:id="container.priority_queue.details.traits">
4978 <info><title>Traits</title></info>
4980 <para>It would be nice if all priority queues could
4981 share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4982 two binary heaps might throw an exception (not corrupt
4983 any of the heaps on which it operates), but joining two pairing
4984 heaps is exception free.</para>
4986 <para>Tags and traits are very useful for manipulating generic
4987 types. <classname>__gnu_pbds::priority_queue</classname>
4988 publicly defines <classname>container_category</classname> as one of the tags. Given any
4989 container <classname>Cntnr</classname>, the tag of the underlying
4990 data structure can be found via <classname>typename
4991 Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4995 <title>Priority-Queue Data-Structure Tags.</title>
4998 <imagedata align="center" format="PNG" scale="100"
4999 fileref="../images/pbds_priority_queue_tag_hierarchy.png"/>
5002 <phrase>Priority-Queue Data-Structure Tags.</phrase>
5008 <para>Additionally, a traits mechanism can be used to query a
5009 container type for its attributes. Given any container
5010 <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits<Cntnr></programlisting>
5011 is a traits class identifying the properties of the
5014 <para>To find if a container might throw if two of its objects are
5017 container_traits<Cntnr>::split_join_can_throw
5022 Different priority-queue implementations have different invalidation guarantees. This is
5023 especially important, since there is no way to access an arbitrary
5024 value of priority queues except for iterators. Similarly to
5025 associative containers, one can use
5027 container_traits<Cntnr>::invalidation_guarantee
5029 to get the invalidation guarantee type of a priority queue.</para>
5031 <para>It is easy to understand from the graphic above, what <classname>container_traits<Cntnr>::invalidation_guarantee</classname>
5032 will be for different implementations. All implementations of
5033 type represented by label B have <classname>point_invalidation_guarantee</classname>:
5034 the container can freely internally reorganize the nodes -
5035 range-type iterators are invalidated, but point-type iterators
5036 are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5037 the container can freely internally reallocate the array - both
5038 point-type and range-type iterators might be invalidated.</para>
5041 This has major implications, and constitutes a good reason to avoid
5042 using binary heaps. A binary heap can perform <function>modify</function>
5043 or <function>erase</function> efficiently given a valid point-type
5044 iterator. However, in order to supply it with a valid point-type
5045 iterator, one needs to iterate (linearly) over all
5046 values, then supply the relevant iterator (recall that a
5047 range-type iterator can always be converted to a point-type
5048 iterator). This means that if the number of <function>modify</function> or
5049 <function>erase</function> operations is non-negligible (say
5050 super-logarithmic in the total sequence of operations) - binary
5051 heaps will perform badly.
5056 </section> <!-- details -->
5058 </section> <!-- priority_queue -->
5062 </section> <!-- container -->
5064 </section> <!-- design -->
5069 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5070 href="test_policy_data_structures.xml">
5073 <!-- S05: Reference/Acknowledgments -->
5074 <section xml:id="pbds.ack">
5075 <info><title>Acknowledgments</title></info>
5076 <?dbhtml filename="policy_data_structures_biblio.html"?>
5079 Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5080 Laboratories), and Benjamin Kosnik (Red Hat).
5084 This library was partially written at
5085 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.haifa.il.ibm.com/">IBM's Haifa Research Labs</link>.
5086 It is based heavily on policy-based design and uses many useful
5087 techniques from Modern C++ Design: Generic Programming and Design
5088 Patterns Applied by Andrei Alexandrescu.
5092 Two ideas are borrowed from the SGI-STL implementation:
5098 The prime-based resize policies use a list of primes taken from
5099 the SGI-STL implementation.
5105 The red-black trees contain both a root node and a header node
5106 (containing metadata), connected in a way that forward and
5107 reverse iteration can be performed efficiently.
5113 Some test utilities borrow ideas from
5114 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.boost.org/doc/libs/release/libs/timer/index.html">boost::timer</link>.
5118 We would like to thank Scott Meyers for useful comments (without
5119 attributing to him any flaws in the design or implementation of the
5122 <para>We would like to thank Matt Austern for the suggestion to
5123 include tries.</para>
5126 <!-- S06: Biblio -->
5127 <bibliography xml:id="pbds.biblio">
5133 <?dbhtml filename="policy_data_structures_biblio.html"?>
5136 <biblioentry xml:id="biblio.abrahams97exception">
5138 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5139 xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1997/N1075.pdf">
5140 STL Exception Handling Contract
5165 <biblioentry xml:id="biblio.alexandrescu01modern">
5167 Modern C++ Design: Generic Programming and Design Patterns Applied
5186 Addison-Wesley Publishing Company
5193 <biblioentry xml:id="biblio.andrew04mtf">
5195 MTF, Bit, and COMB: A Guide to Deterministic and Randomized
5196 Algorithms for the List Update Problem
5225 <biblioentry xml:id="biblio.austern00noset">
5227 Why You Shouldn't Use set - and What You Should Use Instead
5252 <biblioentry xml:id="biblio.austern01htprop">
5254 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5255 xlink:href="http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2001/n1326.html">
5256 A Proposal to Add Hashtables to the Standard Library
5282 <biblioentry xml:id="biblio.austern98segmentedit">
5284 Segmented iterators and hierarchical algorithms
5309 <biblioentry xml:id="biblio.dawestimer">
5311 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5312 xlink:href="www.boost.org/doc/libs/release/libs/timer/">
5336 <biblioentry xml:id="biblio.clearypool">
5338 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5339 xlink:href="www.boost.org/doc/libs/release/libs/pool/">
5364 <biblioentry xml:id="biblio.maddocktraits">
5366 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5367 xlink:href="www.boost.org/doc/libs/release/libs/type_traits/">
5368 Boost Type Traits Library
5401 <biblioentry xml:id="biblio.brodal96priority">
5403 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5404 xlink:href="http://portal.acm.org/citation.cfm?id=313883">
5405 Worst-case efficient priority queues
5423 <biblioentry xml:id="biblio.bulkamayheweff">
5425 Efficient C++ Programming Techniques
5456 Addison-Wesley Publishing Company
5462 <biblioentry xml:id="biblio.clrs2001">
5464 Introduction to Algorithms, 2nd edition
5522 <biblioentry xml:id="biblio.dubhashi98neg">
5524 Balls and bins: A study in negative dependence
5554 Random Structures and Algorithms 13
5561 <biblioentry xml:id="biblio.fagin79extendible">
5563 Extendible hashing - a fast access method for dynamic files
5614 ACM Trans. Database Syst. 4
5622 <biblioentry xml:id="biblio.filliatre2000ptset">
5624 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5625 xlink:href="http://cristal.inria.fr/~frisch/icfp06_contest/advtr/applyOmatic/ptset.ml">
5626 Ptset: Sets of integers implemented as Patricia trees
5649 <biblioentry xml:id="biblio.fredman86pairing">
5651 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5652 xlink:href="http://www.cs.cmu.edu/~sleator/papers/pairing-heaps.pdf">
5653 The pairing heap: a new form of self-adjusting heap
5705 <biblioentry xml:id="biblio.gof">
5707 Design Patterns - Elements of Reusable Object-Oriented Software
5756 Addison-Wesley Publishing Company
5763 <biblioentry xml:id="biblio.garg86order">
5765 Order-preserving key transformations
5795 Trans. Database Syst. 11
5801 <biblioentry xml:id="biblio.hyslop02making">
5803 Making a real hash of things
5840 <biblioentry xml:id="biblio.jossutis01stl">
5842 The C++ Standard Library - A Tutorial and Reference
5860 Addison-Wesley Publishing Company
5866 <biblioentry xml:id="biblio.kt99fat_heaps">
5868 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5869 xlink:href="http://www.cs.princeton.edu/research/techreps/TR-597-99">
5870 New Heap Data Structures
5903 <biblioentry xml:id="biblio.kleft00sets">
5905 Are Set Iterators Mutable or Immutable?
5942 <biblioentry xml:id="biblio.knuth98sorting">
5944 The Art of Computer Programming - Sorting and Searching
5963 Addison-Wesley Publishing Company
5969 <biblioentry xml:id="biblio.liskov98data">
5971 Data abstraction and hierarchy
5996 <biblioentry xml:id="biblio.litwin80lh">
5998 Linear hashing: A new tool for file and table addressing
6017 Proceedings of International Conference on Very Large Data Bases
6023 <biblioentry xml:id="biblio.maverik_lowerbounds">
6025 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6026 xlink:href="http://magic.aladdin.cs.cmu.edu/2005/08/01/deamortization-part-2-binomial-heaps">
6027 Deamortization - Part 2: Binomial Heaps
6047 <biblioentry xml:id="biblio.meyers96more">
6049 More Effective C++: 35 New Ways to Improve Your Programs and Designs
6068 Addison-Wesley Publishing Company
6074 <biblioentry xml:id="biblio.meyers00nonmember">
6076 How Non-Member Functions Improve Encapsulation
6101 <biblioentry xml:id="biblio.meyers01stl">
6103 Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library
6122 Addison-Wesley Publishing Company
6128 <biblioentry xml:id="biblio.meyers02both">
6130 Class Template, Member Template - or Both?
6155 <biblioentry xml:id="biblio.motwani95random">
6157 Randomized Algorithms
6186 Cambridge University Press
6193 <biblioentry xml:id="biblio.mscom">
6195 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6196 xlink:href="http://www.microsoft.com/com">
6197 COM: Component Model Object Technologies
6208 <biblioentry xml:id="biblio.musser95rationale">
6210 Rationale for Adding Hash Tables to the C++ Standard Template Library
6230 <biblioentry xml:id="biblio.musser96stltutorial">
6232 STL Tutorial and Reference Guide
6262 Addison-Wesley Publishing Company
6270 <biblioentry xml:id="biblio.nelson96stlpq">
6272 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6273 xlink:href="http://www.dogma.net/markn/articles/pq_stl/priority.htm">Priority Queues and the STL
6300 <biblioentry xml:id="biblio.okasaki98mereable">
6302 Fast mergeable integer maps
6337 <biblioentry xml:id="biblio.sgi_stl">
6339 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6340 xlink:href="http://www.sgi.com/tech/stl">
6341 Standard Template Library Programmer's Guide
6363 <biblioentry xml:id="biblio.select_man">
6365 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6366 xlink:href="http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+select">
6374 <biblioentry xml:id="biblio.sleator84amortized">
6376 Amortized Efficiency of List Update Problems
6407 ACM Symposium on Theory of Computing
6413 <biblioentry xml:id="biblio.sleator85self">
6415 Self-Adjusting Binary Search Trees
6447 ACM Symposium on Theory of Computing
6453 <biblioentry xml:id="biblio.stepanov94standard">
6455 The Standard Template Library
6485 <biblioentry xml:id="biblio.stroustrup97cpp">
6487 The C++ Programming Langugage
6506 Addison-Wesley Publishing Company
6512 <biblioentry xml:id="biblio.vandevoorde2002cpptemplates">
6514 C++ Templates: The Complete Guide
6544 Addison-Wesley Publishing Company
6551 <biblioentry xml:id="biblio.wickland96thirty">
6553 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6554 xlink:href="http://myweb.wvnet.edu/~gsa00121/books/amongdead30.zip">
6555 Thirty Years Among the Dead
6575 National Psychological Institute