l4/pkg/libstdc++-v3/contrib/libstdc++-v3-4.7/doc/xml/manual/policy_data_structures.xml

   1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
   2          xml:id="manual.ext.containers.pbds" xreflabel="pbds">
   3   <info>
   4     <title>Policy-Based Data Structures</title>
   5     <keywordset>
   6       <keyword>
   7         ISO C++
   8       </keyword>
   9       <keyword>
  10         policy
  11       </keyword>
  12       <keyword>
  13         container
  14       </keyword>
  15       <keyword>
  16         data
  17       </keyword>
  18       <keyword>
  19         structure
  20       </keyword>
  21       <keyword>
  22         associated
  23       </keyword>
  24       <keyword>
  25         tree
  26       </keyword>
  27       <keyword>
  28         trie
  29       </keyword>
  30       <keyword>
  31         hash
  32       </keyword>
  33       <keyword>
  34         metaprogramming
  35       </keyword>
  36     </keywordset>
  37   </info>
  38   <?dbhtml filename="policy_data_structures.html"?>
  39
  40   <!-- 2006-04-01 Ami Tavory -->
  41   <!-- 2011-05-25 Benjamin Kosnik -->
  42
  43   <!-- S01: intro -->
  44   <section xml:id="pbds.intro">
  45     <info><title>Intro</title></info>
  46
  47     <para>
  48       This is a library of policy-based elementary data structures:
  49       associative containers and priority queues. It is designed for
  50       high-performance, flexibility, semantic safety, and conformance to
  51       the corresponding containers in <literal>std</literal> and
  52       <literal>std::tr1</literal> (except for some points where it differs
  53       by design).
  54     </para>
  55     <para>
  56     </para>
  57
  58     <section xml:id="pbds.intro.issues">
  59       <info><title>Performance Issues</title></info>
  60       <para>
  61       </para>
  62
  63       <para>
  64         An attempt is made to categorize the wide variety of possible
  65         container designs in terms of performance-impacting factors. These
  66         performance factors are translated into design policies and
  67         incorporated into container design.
  68       </para>
  69
  70       <para>
  71         There is tension between unravelling factors into a coherent set of
  72         policies. Every attempt is made to make a minimal set of
  73         factors. However, in many cases multiple factors make for long
  74         template names. Every attempt is made to alias and use typedefs in
  75         the source files, but the generated names for external symbols can
  76         be large for binary files or debuggers.
  77       </para>
  78
  79       <para>
  80         In many cases, the longer names allow capabilities and behaviours
  81         controlled by macros to also be unamibiguously emitted as distinct
  82         generated names.
  83       </para>
  84
  85       <para>
  86         Specific issues found while unraveling performance factors in the
  87         design of associative containers and priority queues follow.
  88       </para>
  89
  90       <section xml:id="pbds.intro.issues.associative">
  91         <info><title>Associative</title></info>
  92
  93         <para>
  94           Associative containers depend on their composite policies to a very
  95           large extent. Implicitly hard-wiring policies can hamper their
  96           performance and limit their functionality. An efficient hash-based
  97           container, for example, requires policies for testing key
  98           equivalence, hashing keys, translating hash values into positions
  99           within the hash table, and determining when and how to resize the
 100           table internally. A tree-based container can efficiently support
 101           order statistics, i.e. the ability to query what is the order of
 102           each key within the sequence of keys in the container, but only if
 103           the container is supplied with a policy to internally update
 104           meta-data. There are many other such examples.
 105         </para>
 106
 107         <para>
 108           Ideally, all associative containers would share the same
 109           interface. Unfortunately, underlying data structures and mapping
 110           semantics differentiate between different containers. For example,
 111           suppose one writes a generic function manipulating an associative
 112           container.
 113         </para>
 114
 115         <programlisting>
 116           template&lt;typename Cntnr&gt;
 117           void
 118           some_op_sequence(Cntnr&amp; r_cnt)
 119           {
 120           ...
 121           }
 122         </programlisting>
 123
 124         <para>
 125           Given this, then what can one assume about the instantiating
 126           container? The answer varies according to its underlying data
 127           structure. If the underlying data structure of
 128           <literal>Cntnr</literal> is based on a tree or trie, then the order
 129           of elements is well defined; otherwise, it is not, in general. If
 130           the underlying data structure of <literal>Cntnr</literal> is based
 131           on a collision-chaining hash table, then modifying
 132           r_<literal>Cntnr</literal> will not invalidate its iterators' order;
 133           if the underlying data structure is a probing hash table, then this
 134           is not the case. If the underlying data structure is based on a tree
 135           or trie, then a reference to the container can efficiently be split;
 136           otherwise, it cannot, in general. If the underlying data structure
 137           is a red-black tree, then splitting a reference to the container is
 138           exception-free; if it is an ordered-vector tree, exceptions can be
 139           thrown.
 140         </para>
 141
 142       </section>
 143
 144       <section xml:id="pbds.intro.issues.priority_queue">
 145         <info><title>Priority Que</title></info>
 146
 147         <para>
 148           Priority queues are useful when one needs to efficiently access a
 149           minimum (or maximum) value as the set of values changes.
 150         </para>
 151
 152         <para>
 153           Most useful data structures for priority queues have a relatively
 154           simple structure, as they are geared toward relatively simple
 155           requirements. Unfortunately, these structures do not support access
 156           to an arbitrary value, which turns out to be necessary in many
 157           algorithms. Say, decreasing an arbitrary value in a graph
 158           algorithm. Therefore, some extra mechanism is necessary and must be
 159           invented for accessing arbitrary values. There are at least two
 160           alternatives: embedding an associative container in a priority
 161           queue, or allowing cross-referencing through iterators. The first
 162           solution adds significant overhead; the second solution requires a
 163           precise definition of iterator invalidation. Which is the next
 164           point...
 165         </para>
 166
 167         <para>
 168           Priority queues, like hash-based containers, store values in an
 169           order that is meaningless and undefined externally. For example, a
 170           <code>push</code> operation can internally reorganize the
 171           values. Because of this characteristic, describing a priority
 172           queues' iterator is difficult: on one hand, the values to which
 173           iterators point can remain valid, but on the other, the logical
 174           order of iterators can change unpredictably.
 175         </para>
 176
 177         <para>
 178           Roughly speaking, any element that is both inserted to a priority
 179           queue (e.g. through <code>push</code>) and removed
 180           from it (e.g., through <code>pop</code>), incurs a
 181           logarithmic overhead (in the amortized sense). Different underlying
 182           data structures place the actual cost differently: some are
 183           optimized for amortized complexity, whereas others guarantee that
 184           specific operations only have a constant cost. One underlying data
 185           structure might be chosen if modifying a value is frequent
 186           (Dijkstra's shortest-path algorithm), whereas a different one might
 187           be chosen otherwise. Unfortunately, an array-based binary heap - an
 188           underlying data structure that optimizes (in the amortized sense)
 189           <code>push</code> and <code>pop</code> operations, differs from the
 190           others in terms of its invalidation guarantees. Other design
 191           decisions also impact the cost and placement of the overhead, at the
 192           expense of more difference in the the kinds of operations that the
 193           underlying data structure can support. These differences pose a
 194           challenge when creating a uniform interface for priority queues.
 195         </para>
 196       </section>
 197     </section>
 198
 199     <section xml:id="pbds.intro.motivation">
 200       <info><title>Goals</title></info>
 201
 202       <para>
 203         Many fine associative-container libraries were already written,
 204         most notably, the C++ standard's associative containers. Why
 205         then write another library? This section shows some possible
 206         advantages of this library, when considering the challenges in
 207         the introduction. Many of these points stem from the fact that
 208         the ISO C++ process introduced associative-containers in a
 209         two-step process (first standardizing tree-based containers,
 210         only then adding hash-based containers, which are fundamentally
 211         different), did not standardize priority queues as containers,
 212         and (in our opinion) overloads the iterator concept.
 213       </para>
 214
 215       <section xml:id="pbds.intro.motivation.associative">
 216         <info><title>Associative</title></info>
 217         <para>
 218         </para>
 219
 220         <section xml:id="motivation.associative.policy">
 221           <info><title>Policy Choices</title></info>
 222           <para>
 223             Associative containers require a relatively large number of
 224             policies to function efficiently in various settings. In some
 225             cases this is needed for making their common operations more
 226             efficient, and in other cases this allows them to support a
 227             larger set of operations
 228           </para>
 229
 230           <orderedlist>
 231             <listitem>
 232               <para>
 233                 Hash-based containers, for example, support look-up and
 234                 insertion methods (<function>find</function> and
 235                 <function>insert</function>). In order to locate elements
 236                 quickly, they are supplied a hash functor, which instruct
 237                 how to transform a key object into some size type; a hash
 238                 functor might transform <constant>"hello"</constant>
 239                 into <constant>1123002298</constant>. A hash table, though,
 240                 requires transforming each key object into some size-type
 241                 type in some specific domain; a hash table with a 128-long
 242                 table might transform <constant>"hello"</constant> into
 243                 position <constant>63</constant>. The policy by which the
 244                 hash value is transformed into a position within the table
 245                 can dramatically affect performance.  Hash-based containers
 246                 also do not resize naturally (as opposed to tree-based
 247                 containers, for example). The appropriate resize policy is
 248                 unfortunately intertwined with the policy that transforms
 249                 hash value into a position within the table.
 250               </para>
 251             </listitem>
 252
 253             <listitem>
 254               <para>
 255                 Tree-based containers, for example, also support look-up and
 256                 insertion methods, and are primarily useful when maintaining
 257                 order between elements is important. In some cases, though,
 258                 one can utilize their balancing algorithms for completely
 259                 different purposes.
 260               </para>
 261
 262               <para>
 263                 Figure A shows a tree whose each node contains two entries:
 264                 a floating-point key, and some size-type
 265                 <emphasis>metadata</emphasis> (in bold beneath it) that is
 266                 the number of nodes in the sub-tree. (The root has key 0.99,
 267                 and has 5 nodes (including itself) in its sub-tree.) A
 268                 container based on this data structure can obviously answer
 269                 efficiently whether 0.3 is in the container object, but it
 270                 can also answer what is the order of 0.3 among all those in
 271                 the container object: see <xref linkend="biblio.clrs2001"/>.
 272
 273               </para>
 274
 275               <para>
 276                 As another example, Figure B shows a tree whose each node
 277                 contains two entries: a half-open geometric line interval,
 278                 and a number <emphasis>metadata</emphasis> (in bold beneath
 279                 it) that is the largest endpoint of all intervals in its
 280                 sub-tree.  (The root describes the interval <constant>[20,
 281                 36)</constant>, and the largest endpoint in its sub-tree is
 282                 99.) A container based on this data structure can obviously
 283                 answer efficiently whether <constant>[3, 41)</constant> is
 284                 in the container object, but it can also answer efficiently
 285                 whether the container object has intervals that intersect
 286                 <constant>[3, 41)</constant>. These types of queries are
 287                 very useful in geometric algorithms and lease-management
 288                 algorithms.
 289               </para>
 290
 291               <para>
 292                 It is important to note, however, that as the trees are
 293                 modified, their internal structure changes. To maintain
 294                 these invariants, one must supply some policy that is aware
 295                 of these changes.  Without this, it would be better to use a
 296                 linked list (in itself very efficient for these purposes).
 297               </para>
 298
 299             </listitem>
 300           </orderedlist>
 301
 302           <figure>
 303             <title>Node Invariants</title>
 304             <mediaobject>
 305               <imageobject>
 306                 <imagedata align="center" format="PNG" scale="100"
 307                            fileref="../images/pbds_node_invariants.png"/>
 308               </imageobject>
 309               <textobject>
 310                 <phrase>Node Invariants</phrase>
 311               </textobject>
 312             </mediaobject>
 313           </figure>
 314
 315         </section>
 316
 317         <section xml:id="motivation.associative.underlying">
 318           <info><title>Underlying Data Structures</title></info>
 319           <para>
 320             The standard C++ library contains associative containers based on
 321             red-black trees and collision-chaining hash tables. These are
 322             very useful, but they are not ideal for all types of
 323             settings.
 324           </para>
 325
 326           <para>
 327             The figure below shows the different underlying data structures
 328             currently supported in this library.
 329           </para>
 330
 331           <figure>
 332             <title>Underlying Associative Data Structures</title>
 333             <mediaobject>
 334               <imageobject>
 335                 <imagedata align="center" format="PNG" scale="100"
 336                            fileref="../images/pbds_different_underlying_dss_1.png"/>
 337               </imageobject>
 338               <textobject>
 339                 <phrase>Underlying Associative Data Structures</phrase>
 340               </textobject>
 341             </mediaobject>
 342           </figure>
 343
 344           <para>
 345             A shows a collision-chaining hash-table, B shows a probing
 346             hash-table, C shows a red-black tree, D shows a splay tree, E shows
 347             a tree based on an ordered vector(implicit in the order of the
 348             elements), F shows a PATRICIA trie, and G shows a list-based
 349             container with update policies.
 350           </para>
 351
 352           <para>
 353             Each of these data structures has some performance benefits, in
 354             terms of speed, size or both. For now, note that vector-based trees
 355             and probing hash tables manipulate memory more efficiently than
 356             red-black trees and collision-chaining hash tables, and that
 357             list-based associative containers are very useful for constructing
 358             "multimaps".
 359           </para>
 360
 361           <para>
 362             Now consider a function manipulating a generic associative
 363             container,
 364           </para>
 365           <programlisting>
 366             template&lt;class Cntnr&gt;
 367             int
 368             some_op_sequence(Cntnr &amp;r_cnt)
 369             {
 370             ...
 371             }
 372           </programlisting>
 373
 374           <para>
 375             Ideally, the underlying data structure
 376             of <classname>Cntnr</classname> would not affect what can be
 377             done with <varname>r_cnt</varname>.  Unfortunately, this is not
 378             the case.
 379           </para>
 380
 381           <para>
 382             For example, if <classname>Cntnr</classname>
 383             is <classname>std::map</classname>, then the function can
 384             use
 385           </para>
 386           <programlisting>
 387             std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
 388           </programlisting>
 389           <para>
 390             in order to apply <classname>foobar</classname> to all
 391             elements between <classname>foo</classname> and
 392             <classname>bar</classname>. If
 393             <classname>Cntnr</classname> is a hash-based container,
 394             then this call's results are undefined.
 395           </para>
 396
 397           <para>
 398             Also, if <classname>Cntnr</classname> is tree-based, the type
 399             and object of the comparison functor can be
 400             accessed. If <classname>Cntnr</classname> is hash based, these
 401             queries are nonsensical.
 402           </para>
 403
 404           <para>
 405             There are various other differences based on the container's
 406             underlying data structure. For one, they can be constructed by,
 407             and queried for, different policies. Furthermore:
 408           </para>
 409
 410           <orderedlist>
 411             <listitem>
 412               <para>
 413                 Containers based on C, D, E and F store elements in a
 414                 meaningful order; the others store elements in a meaningless
 415                 (and probably time-varying) order. By implication, only
 416                 containers based on C, D, E and F can
 417                 support <function>erase</function> operations taking an
 418                 iterator and returning an iterator to the following element
 419                 without performance loss.
 420               </para>
 421             </listitem>
 422
 423             <listitem>
 424               <para>
 425                 Containers based on C, D, E, and F can be split and joined
 426                 efficiently, while the others cannot. Containers based on C
 427                 and D, furthermore, can guarantee that this is exception-free;
 428                 containers based on E cannot guarantee this.
 429               </para>
 430             </listitem>
 431
 432             <listitem>
 433               <para>
 434                 Containers based on all but E can guarantee that
 435                 erasing an element is exception free; containers based on E
 436                 cannot guarantee this. Containers based on all but B and E
 437                 can guarantee that modifying an object of their type does
 438                 not invalidate iterators or references to their elements,
 439                 while containers based on B and E cannot. Containers based
 440                 on C, D, and E can furthermore make a stronger guarantee,
 441                 namely that modifying an object of their type does not
 442                 affect the order of iterators.
 443               </para>
 444             </listitem>
 445           </orderedlist>
 446
 447           <para>
 448             A unified tag and traits system (as used for the C++ standard
 449             library iterators, for example) can ease generic manipulation of
 450             associative containers based on different underlying data
 451             structures.
 452           </para>
 453
 454         </section>
 455
 456         <section xml:id="motivation.associative.iterators">
 457           <info><title>Iterators</title></info>
 458           <para>
 459             Iterators are centric to the design of the standard library
 460             containers, because of the container/algorithm/iterator
 461             decomposition that allows an algorithm to operate on a range
 462             through iterators of some sequence.  Iterators, then, are useful
 463             because they allow going over a
 464             specific <emphasis>sequence</emphasis>.  The standard library
 465             also uses iterators for accessing a
 466             specific <emphasis>element</emphasis>: when an associative
 467             container returns one through <function>find</function>. The
 468             standard library consistently uses the same types of iterators
 469             for both purposes: going over a range, and accessing a specific
 470             found element. Before the introduction of hash-based containers
 471             to the standard library, this made sense (with the exception of
 472             priority queues, which are discussed later).
 473           </para>
 474
 475           <para>
 476             Using the standard associative containers together with
 477             non-order-preserving associative containers (and also because of
 478             priority-queues container), there is a possible need for
 479             different types of iterators for self-organizing containers:
 480             the iterator concept seems overloaded to mean two different
 481             things (in some cases). <remark> XXX
 482             "ds_gen.html#find_range">Design::Associative
 483             Containers::Data-Structure Genericity::Point-Type and Range-Type
 484             Methods</remark>.
 485           </para>
 486
 487           <section xml:id="associative.iterators.using">
 488             <info>
 489               <title>Using Point Iterators for Range Operations</title>
 490             </info>
 491             <para>
 492               Suppose <classname>cntnr</classname> is some associative
 493               container, and say <varname>c</varname> is an object of
 494               type <classname>cntnr</classname>. Then what will be the outcome
 495               of
 496             </para>
 497
 498             <programlisting>
 499               std::for_each(c.find(1), c.find(5), foo);
 500             </programlisting>
 501
 502             <para>
 503               If <classname>cntnr</classname> is a tree-based container
 504               object, then an in-order walk will
 505               apply <classname>foo</classname> to the relevant elements,
 506               as in the graphic below, label A. If <varname>c</varname> is
 507               a hash-based container, then the order of elements between any
 508               two elements is undefined (and probably time-varying); there is
 509               no guarantee that the elements traversed will coincide with the
 510               <emphasis>logical</emphasis> elements between 1 and 5, as in
 511               label B.
 512             </para>
 513
 514             <figure>
 515               <title>Range Iteration in Different Data Structures</title>
 516               <mediaobject>
 517                 <imageobject>
 518                   <imagedata align="center" format="PNG" scale="100"
 519                              fileref="../images/pbds_point_iterators_range_ops_1.png"/>
 520                 </imageobject>
 521                 <textobject>
 522                   <phrase>Node Invariants</phrase>
 523                 </textobject>
 524               </mediaobject>
 525             </figure>
 526
 527             <para>
 528               In our opinion, this problem is not caused just because
 529               red-black trees are order preserving while
 530               collision-chaining hash tables are (generally) not - it
 531               is more fundamental. Most of the standard's containers
 532               order sequences in a well-defined manner that is
 533               determined by their <emphasis>interface</emphasis>:
 534               calling <function>insert</function> on a tree-based
 535               container modifies its sequence in a predictable way, as
 536               does calling <function>push_back</function> on a list or
 537               a vector. Conversely, collision-chaining hash tables,
 538               probing hash tables, priority queues, and list-based
 539               containers (which are very useful for "multimaps") are
 540               self-organizing data structures; the effect of each
 541               operation modifies their sequences in a manner that is
 542               (practically) determined by their
 543               <emphasis>implementation</emphasis>.
 544             </para>
 545
 546             <para>
 547               Consequently, applying an algorithm to a sequence obtained from most
 548               containers may or may not make sense, but applying it to a
 549               sub-sequence of a self-organizing container does not.
 550             </para>
 551           </section>
 552
 553           <section xml:id="associative.iterators.cost">
 554             <info>
 555               <title>Cost to Point Iterators to Enable Range Operations</title>
 556             </info>
 557             <para>
 558               Suppose <varname>c</varname> is some collision-chaining
 559               hash-based container object, and one calls
 560             </para>
 561             <programlisting>c.find(3)</programlisting>
 562             <para>
 563               Then what composes the returned iterator?
 564             </para>
 565
 566             <para>
 567               In the graphic below, label A shows the simplest (and
 568               most efficient) implementation of a collision-chaining
 569               hash table.  The little box marked
 570               <classname>point_iterator</classname> shows an object
 571               that contains a pointer to the element's node. Note that
 572               this "iterator" has no way to move to the next element (
 573               it cannot support
 574               <function>operator++</function>). Conversely, the little
 575               box marked <classname>iterator</classname> stores both a
 576               pointer to the element, as well as some other
 577               information (the bucket number of the element). the
 578               second iterator, then, is "heavier" than the first one-
 579               it requires more time and space. If we were to use a
 580               different container to cross-reference into this
 581               hash-table using these iterators - it would take much
 582               more space. As noted above, nothing much can be done by
 583               incrementing these iterators, so why is this extra
 584               information needed?
 585             </para>
 586
 587             <para>
 588               Alternatively, one might create a collision-chaining hash-table
 589               where the lists might be linked, forming a monolithic total-element
 590               list, as in the graphic below, label B.  Here the iterators are as
 591               light as can be, but the hash-table's operations are more
 592               complicated.
 593             </para>
 594
 595             <figure>
 596               <title>Point Iteration in Hash Data Structures</title>
 597               <mediaobject>
 598                 <imageobject>
 599                   <imagedata align="center" format="PNG" scale="100"
 600                              fileref="../images/pbds_point_iterators_range_ops_2.png"/>
 601                 </imageobject>
 602                 <textobject>
 603                   <phrase>Point Iteration in Hash Data Structures</phrase>
 604                 </textobject>
 605               </mediaobject>
 606             </figure>
 607
 608             <para>
 609               It should be noted that containers based on collision-chaining
 610               hash-tables are not the only ones with this type of behavior;
 611               many other self-organizing data structures display it as well.
 612             </para>
 613           </section>
 614
 615           <section xml:id="associative.iterators.invalidation">
 616             <info><title>Invalidation Guarantees</title></info>
 617             <para>Consider the following snippet:</para>
 618             <programlisting>
 619               it = c.find(3);
 620               c.erase(5);
 621             </programlisting>
 622
 623             <para>
 624               Following the call to <classname>erase</classname>, what is the
 625               validity of <classname>it</classname>: can it be de-referenced?
 626               can it be incremented?
 627             </para>
 628
 629             <para>
 630               The answer depends on the underlying data structure of the
 631               container. The graphic below shows three cases: A1 and A2 show
 632               a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
 633               show a collision-chaining hash table.
 634             </para>
 635
 636             <figure>
 637               <title>Effect of erase in different underlying data structures</title>
 638               <mediaobject>
 639                 <imageobject>
 640                   <imagedata align="center" format="PNG" scale="100"
 641                              fileref="../images/pbds_invalidation_guarantee_erase.png"/>
 642                 </imageobject>
 643                 <textobject>
 644                   <phrase>Effect of erase in different underlying data structures</phrase>
 645                 </textobject>
 646               </mediaobject>
 647             </figure>
 648
 649             <orderedlist>
 650               <listitem>
 651                 <para>
 652                   Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
 653                   be de-referenced and incremented. The sequence of iterators
 654                   changed, but in a way that is well-defined by the interface.
 655                 </para>
 656               </listitem>
 657
 658               <listitem>
 659                 <para>
 660                   Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
 661                   not valid at all - it cannot be de-referenced or
 662                   incremented; the order of iterators changed in a way that is
 663                   (practically) determined by the implementation and not by
 664                   the interface.
 665                 </para>
 666               </listitem>
 667
 668               <listitem>
 669                 <para>
 670                   Erasing 5 from C1 yields C2. Here the situation is more
 671                   complicated. On the one hand, there is no problem in
 672                   de-referencing <classname>it</classname>. On the other hand,
 673                   the order of iterators changed in a way that is
 674                   (practically) determined by the implementation and not by
 675                   the interface.
 676                 </para>
 677               </listitem>
 678             </orderedlist>
 679
 680             <para>
 681               So in the standard library containers, it is not always possible
 682               to express whether <varname>it</varname> is valid or not. This
 683               is true also for <function>insert</function>. Again, the
 684               iterator concept seems overloaded.
 685             </para>
 686           </section>
 687         </section> <!--iterators-->
 688
 689
 690         <section xml:id="motivation.associative.functions">
 691           <info><title>Functional</title></info>
 692           <para>
 693           </para>
 694
 695           <para>
 696             The design of the functional overlay to the underlying data
 697             structures differs slightly from some of the conventions used in
 698             the C++ standard.  A strict public interface of methods that
 699             comprise only operations which depend on the class's internal
 700             structure; other operations are best designed as external
 701             functions. (See <xref linkend="biblio.meyers02both"/>).With this
 702             rubric, the standard associative containers lack some useful
 703             methods, and provide other methods which would be better
 704             removed.
 705           </para>
 706
 707           <section xml:id="motivation.associative.functions.erase">
 708             <info><title><function>erase</function></title></info>
 709
 710             <orderedlist>
 711               <listitem>
 712                 <para>
 713                   Order-preserving standard associative containers provide the
 714                   method
 715                 </para>
 716                 <programlisting>
 717                   iterator
 718                   erase(iterator it)
 719                 </programlisting>
 720
 721                 <para>
 722                   which takes an iterator, erases the corresponding
 723                   element, and returns an iterator to the following
 724                   element. Also standardd hash-based associative
 725                   containers provide this method. This seemingly
 726                   increasesgenericity between associative containers,
 727                   since it is possible to use
 728                 </para>
 729                 <programlisting>
 730                   typename C::iterator it = c.begin();
 731                   typename C::iterator e_it = c.end();
 732
 733                   while(it != e_it)
 734                   it = pred(*it)? c.erase(it) : ++it;
 735                 </programlisting>
 736
 737                 <para>
 738                   in order to erase from a container object <varname>
 739                   c</varname> all element which match a
 740                   predicate <classname>pred</classname>. However, in a
 741                   different sense this actually decreases genericity: an
 742                   integral implication of this method is that tree-based
 743                   associative containers' memory use is linear in the total
 744                   number of elements they store, while hash-based
 745                   containers' memory use is unbounded in the total number of
 746                   elements they store. Assume a hash-based container is
 747                   allowed to decrease its size when an element is
 748                   erased. Then the elements might be rehashed, which means
 749                   that there is no "next" element - it is simply
 750                   undefined. Consequently, it is possible to infer from the
 751                   fact that the standard library's hash-based containers
 752                   provide this method that they cannot downsize when
 753                   elements are erased. As a consequence, different code is
 754                   needed to manipulate different containers, assuming that
 755                   memory should be conserved. Therefor, this library's
 756                   non-order preserving associative containers omit this
 757                   method.
 758                 </para>
 759               </listitem>
 760
 761               <listitem>
 762                 <para>
 763                   All associative containers include a conditional-erase method
 764                 </para>
 765                 <programlisting>
 766                   template&lt;
 767                   class Pred&gt;
 768                   size_type
 769                   erase_if
 770                   (Pred pred)
 771                 </programlisting>
 772                 <para>
 773                   which erases all elements matching a predicate. This is probably the
 774                   only way to ensure linear-time multiple-item erase which can
 775                   actually downsize a container.
 776                 </para>
 777               </listitem>
 778
 779               <listitem>
 780                 <para>
 781                   The standard associative containers provide methods for
 782                   multiple-item erase of the form
 783                 </para>
 784                 <programlisting>
 785                   size_type
 786                   erase(It b, It e)
 787                 </programlisting>
 788                 <para>
 789                   erasing a range of elements given by a pair of
 790                   iterators. For tree-based or trie-based containers, this can
 791                   implemented more efficiently as a (small) sequence of split
 792                   and join operations. For other, unordered, containers, this
 793                   method isn't much better than an external loop. Moreover,
 794                   if <varname>c</varname> is a hash-based container,
 795                   then
 796                 </para>
 797                 <programlisting>
 798                   c.erase(c.find(2), c.find(5))
 799                 </programlisting>
 800                 <para>
 801                   is almost certain to do something
 802                   different than erasing all elements whose keys are between 2
 803                   and 5, and is likely to produce other undefined behavior.
 804                 </para>
 805               </listitem>
 806             </orderedlist>
 807           </section> <!-- erase -->
 808
 809           <section xml:id="motivation.associative.functions.split">
 810             <info>
 811               <title>
 812                 <function>split</function> and <function>join</function>
 813               </title>
 814             </info>
 815             <para>
 816               It is well-known that tree-based and trie-based container
 817               objects can be efficiently split or joined (See
 818               <xref linkend="biblio.clrs2001"/>). Externally splitting or
 819               joining trees is super-linear, and, furthermore, can throw
 820               exceptions. Split and join methods, consequently, seem good
 821               choices for tree-based container methods, especially, since as
 822               noted just before, they are efficient replacements for erasing
 823               sub-sequences.
 824             </para>
 825
 826           </section> <!-- split -->
 827
 828           <section xml:id="motivation.associative.functions.insert">
 829             <info>
 830               <title>
 831                 <function>insert</function>
 832               </title>
 833             </info>
 834             <para>
 835               The standard associative containers provide methods of the form
 836             </para>
 837             <programlisting>
 838               template&lt;class It&gt;
 839               size_type
 840               insert(It b, It e);
 841             </programlisting>
 842
 843             <para>
 844               for inserting a range of elements given by a pair of
 845               iterators. At best, this can be implemented as an external loop,
 846               or, even more efficiently, as a join operation (for the case of
 847               tree-based or trie-based containers). Moreover, these methods seem
 848               similar to constructors taking a range given by a pair of
 849               iterators; the constructors, however, are transactional, whereas
 850               the insert methods are not; this is possibly confusing.
 851             </para>
 852
 853           </section> <!-- insert -->
 854
 855           <section xml:id="motivation.associative.functions.compare">
 856             <info>
 857               <title>
 858                 <function>operator==</function> and <function>operator&lt;=</function>
 859               </title>
 860             </info>
 861
 862             <para>
 863               Associative containers are parametrized by policies allowing to
 864               test key equivalence: a hash-based container can do this through
 865               its equivalence functor, and a tree-based container can do this
 866               through its comparison functor. In addition, some standard
 867               associative containers have global function operators, like
 868               <function>operator==</function> and <function>operator&lt;=</function>,
 869               that allow comparing entire associative containers.
 870             </para>
 871
 872             <para>
 873               In our opinion, these functions are better left out. To begin
 874               with, they do not significantly improve over an external
 875               loop. More importantly, however, they are possibly misleading -
 876               <function>operator==</function>, for example, usually checks for
 877               equivalence, or interchangeability, but the associative
 878               container cannot check for values' equivalence, only keys'
 879               equivalence; also, are two containers considered equivalent if
 880               they store the same values in different order? this is an
 881               arbitrary decision.
 882             </para>
 883           </section> <!-- compare -->
 884
 885         </section>  <!-- functional -->
 886
 887       </section> <!--associative-->
 888
 889       <section xml:id="pbds.intro.motivation.priority_queue">
 890         <info><title>Priority Queues</title></info>
 891
 892         <section xml:id="motivation.priority_queue.policy">
 893           <info><title>Policy Choices</title></info>
 894
 895           <para>
 896             Priority queues are containers that allow efficiently inserting
 897             values and accessing the maximal value (in the sense of the
 898             container's comparison functor). Their interface
 899             supports <function>push</function>
 900             and <function>pop</function>. The standard
 901             container <classname>std::priorityqueue</classname> indeed support
 902             these methods, but little else. For algorithmic and
 903             software-engineering purposes, other methods are needed:
 904           </para>
 905
 906           <orderedlist>
 907             <listitem>
 908               <para>
 909                 Many graph algorithms (see
 910                 <xref linkend="biblio.clrs2001"/>) require increasing a
 911                 value in a priority queue (again, in the sense of the
 912                 container's comparison functor), or joining two
 913                 priority-queue objects.
 914               </para>
 915             </listitem>
 916
 917             <listitem>
 918               <para>The return type of <classname>priority_queue</classname>'s
 919               <function>push</function> method is a point-type iterator, which can
 920               be used for modifying or erasing arbitrary values. For
 921               example:</para>
 922               <programlisting>
 923                 priority_queue&lt;int&gt; p;
 924                 priority_queue&lt;int&gt;::point_iterator it = p.push(3);
 925                 p.modify(it, 4);
 926               </programlisting>
 927
 928               <para>These types of cross-referencing operations are necessary
 929               for making priority queues useful for different applications,
 930               especially graph applications.</para>
 931
 932             </listitem>
 933             <listitem>
 934               <para>
 935                 It is sometimes necessary to erase an arbitrary value in a
 936                 priority queue. For example, consider
 937                 the <function>select</function> function for monitoring
 938                 file descriptors:
 939               </para>
 940
 941               <programlisting>
 942                 int
 943                 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
 944                 struct timeval *timeout);
 945               </programlisting>
 946               <para>
 947                 then, as the select documentation states:
 948               </para>
 949               <para>
 950                 <quote>
 951                   The nfds argument specifies the range of file
 952                   descriptors to be tested. The select() function tests file
 953                 descriptors in the range of 0 to nfds-1.</quote>
 954               </para>
 955
 956               <para>
 957                 It stands to reason, therefore, that we might wish to
 958                 maintain a minimal value for <varname>nfds</varname>, and
 959                 priority queues immediately come to mind. Note, though, that
 960                 when a socket is closed, the minimal file description might
 961                 change; in the absence of an efficient means to erase an
 962                 arbitrary value from a priority queue, we might as well
 963                 avoid its use altogether.
 964               </para>
 965
 966               <para>
 967                 The standard containers typically support iterators. It is
 968                 somewhat unusual
 969                 for <classname>std::priority_queue</classname> to omit them
 970                 (See <xref linkend="biblio.meyers01stl"/>). One might
 971                 ask why do priority queues need to support iterators, since
 972                 they are self-organizing containers with a different purpose
 973                 than abstracting sequences. There are several reasons:
 974               </para>
 975               <orderedlist>
 976                 <listitem>
 977                   <para>
 978                     Iterators (even in self-organizing containers) are
 979                     useful for many purposes: cross-referencing
 980                     containers, serialization, and debugging code that uses
 981                     these containers.
 982                   </para>
 983                 </listitem>
 984
 985                 <listitem>
 986                   <para>
 987                     The standard library's hash-based containers support
 988                     iterators, even though they too are self-organizing
 989                     containers with a different purpose than abstracting
 990                     sequences.
 991                   </para>
 992                 </listitem>
 993
 994                 <listitem>
 995                   <para>
 996                     In standard-library-like containers, it is natural to specify the
 997                     interface of operations for modifying a value or erasing
 998                     a value (discussed previously) in terms of a iterators.
 999                     It should be noted that the standard
1000                     containers also use iterators for accessing and
1001                     manipulating a specific value. In hash-based
1002                     containers, one checks the existence of a key by
1003                     comparing the iterator returned by <function>find</function> to the
1004                     iterator returned by <function>end</function>, and not by comparing a
1005                     pointer returned by <function>find</function> to <type>NULL</type>.
1006                   </para>
1007                 </listitem>
1008               </orderedlist>
1009             </listitem>
1010           </orderedlist>
1011
1012         </section>
1013
1014         <section xml:id="motivation.priority_queue.underlying">
1015           <info><title>Underlying Data Structures</title></info>
1016
1017           <para>
1018             There are three main implementations of priority queues: the
1019             first employs a binary heap, typically one which uses a
1020             sequence; the second uses a tree (or forest of trees), which is
1021             typically less structured than an associative container's tree;
1022             the third simply uses an associative container. These are
1023             shown in the figure below with labels A1 and A2, B, and C.
1024           </para>
1025
1026           <figure>
1027             <title>Underlying Priority Queue Data Structures</title>
1028             <mediaobject>
1029               <imageobject>
1030                 <imagedata align="center" format="PNG" scale="100"
1031                            fileref="../images/pbds_different_underlying_dss_2.png"/>
1032               </imageobject>
1033               <textobject>
1034                 <phrase>Underlying Priority Queue Data Structures</phrase>
1035               </textobject>
1036             </mediaobject>
1037           </figure>
1038
1039           <para>
1040             No single implementation can completely replace any of the
1041             others. Some have better <function>push</function>
1042             and <function>pop</function> amortized performance, some have
1043             better bounded (worst case) response time than others, some
1044             optimize a single method at the expense of others, etc. In
1045             general the "best" implementation is dictated by the specific
1046             problem.
1047           </para>
1048
1049           <para>
1050             As with associative containers, the more implementations
1051             co-exist, the more necessary a traits mechanism is for handling
1052             generic containers safely and efficiently. This is especially
1053             important for priority queues, since the invalidation guarantees
1054             of one of the most useful data structures - binary heaps - is
1055             markedly different than those of most of the others.
1056           </para>
1057
1058         </section>
1059
1060         <section xml:id="motivation.priority_queue.binary_heap">
1061           <info><title>Binary Heaps</title></info>
1062
1063
1064           <para>
1065             Binary heaps are one of the most useful underlying
1066             data structures for priority queues. They are very efficient in
1067             terms of memory (since they don't require per-value structure
1068             metadata), and have the best amortized <function>push</function> and
1069             <function>pop</function> performance for primitive types like
1070             <type>int</type>.
1071           </para>
1072
1073           <para>
1074             The standard library's <classname>priority_queue</classname>
1075             implements this data structure as an adapter over a sequence,
1076             typically
1077             <classname>std::vector</classname>
1078             or <classname>std::deque</classname>, which correspond to labels
1079             A1 and A2 respectively in the graphic above.
1080           </para>
1081
1082           <para>
1083             This is indeed an elegant example of the adapter concept and
1084             the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1085             several reasons why a binary-heap priority queue
1086             may be better implemented as a container instead of a
1087             sequence adapter:
1088           </para>
1089
1090           <orderedlist>
1091             <listitem>
1092               <para>
1093                 <classname>std::priority_queue</classname> cannot erase values
1094                 from its adapted sequence (irrespective of the sequence
1095                 type). This means that the memory use of
1096                 an <classname>std::priority_queue</classname> object is always
1097                 proportional to the maximal number of values it ever contained,
1098                 and not to the number of values that it currently
1099                 contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1100                 This implementation of binary heaps acts very differently than
1101                 other underlying data structures (See also pairing heaps).
1102               </para>
1103             </listitem>
1104
1105             <listitem>
1106               <para>
1107                 Some combinations of adapted sequences and value types
1108                 are very inefficient or just don't make sense. If one uses
1109                 <classname>std::priority_queue&lt;std::vector&lt;std::string&gt;
1110                 &gt; &gt;</classname>, for example, then not only will each
1111                 operation perform a logarithmic number of
1112                 <classname>std::string</classname> assignments, but, furthermore, any
1113                 operation (including <function>pop</function>) can render the container
1114                 useless due to exceptions. Conversely, if one uses
1115                 <classname>std::priority_queue&lt;std::deque&lt;int&gt; &gt;
1116                 &gt;</classname>, then each operation uses incurs a logarithmic
1117                 number of indirect accesses (through pointers) unnecessarily.
1118                 It might be better to let the container make a conservative
1119                 deduction whether to use the structure in the graphic above, labels A1 or A2.
1120               </para>
1121             </listitem>
1122
1123             <listitem>
1124               <para>
1125                 There does not seem to be a systematic way to determine
1126                 what exactly can be done with the priority queue.
1127               </para>
1128               <orderedlist>
1129                 <listitem>
1130                   <para>
1131                     If <classname>p</classname> is a priority queue adapting an
1132                     <classname>std::vector</classname>, then it is possible to iterate over
1133                     all values by using <function>&amp;p.top()</function> and
1134                     <function>&amp;p.top() + p.size()</function>, but this will not work
1135                     if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1136                     case, one cannot use <classname>p.begin()</classname> and
1137                     <classname>p.end()</classname>. If a different sequence is adapted, it
1138                     is even more difficult to determine what can be
1139                     done.
1140                   </para>
1141                 </listitem>
1142
1143                 <listitem>
1144                   <para>
1145                     If <varname>p</varname> is a priority queue adapting an
1146                     <classname>std::deque</classname>, then the reference return by
1147                   </para>
1148                   <programlisting>
1149                     p.top()
1150                   </programlisting>
1151                   <para>
1152                     will remain valid until it is popped,
1153                     but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1154                     next <function>push</function> will invalidate it. If a different
1155                     sequence is adapted, it is even more difficult to
1156                     determine what can be done.
1157                   </para>
1158                 </listitem>
1159               </orderedlist>
1160             </listitem>
1161
1162             <listitem>
1163               <para>
1164                 Sequence-based binary heaps can still implement
1165                 linear-time <function>erase</function> and <function>modify</function> operations.
1166                 This means that if one needs to erase a small
1167                 (say logarithmic) number of values, then one might still
1168                 choose this underlying data structure. Using
1169                 <classname>std::priority_queue</classname>, however, this will generally
1170                 change the order of growth of the entire sequence of
1171                 operations.
1172               </para>
1173             </listitem>
1174           </orderedlist>
1175
1176         </section>
1177       </section>
1178     </section> <!-- goals/motivation -->
1179   </section> <!-- intro -->
1180
1181   <!-- S02: Using -->
1182   <section xml:id="containers.pbds.using">
1183     <info><title>Using</title></info>
1184     <?dbhtml filename="policy_data_structures_using.html"?>
1185
1186     <section xml:id="pbds.using.prereq">
1187       <info><title>Prerequisites</title></info>
1188
1189       <para>The library contains only header files, and does not require any
1190       other libraries except the standard C++ library . All classes are
1191       defined in namespace <code>__gnu_pbds</code>. The library internally
1192       uses macros beginning with <code>PB_DS</code>, but
1193       <code>#undef</code>s anything it <code>#define</code>s (except for
1194       header guards). Compiling the library in an environment where macros
1195       beginning in <code>PB_DS</code> are defined, may yield unpredictable
1196       results in compilation, execution, or both.</para>
1197
1198       <para>
1199         Further dependencies are necessary to create the visual output
1200         for the performance tests. To create these graphs, an
1201         additional package is needed: <command>pychart</command>.
1202       </para>
1203     </section>
1204
1205     <section xml:id="pbds.using.organization">
1206       <info><title>Organization</title></info>
1207
1208       <para>
1209         The various data structures are organized as follows.
1210       </para>
1211
1212       <itemizedlist>
1213         <listitem>
1214           <para>
1215             Branch-Based
1216           </para>
1217
1218           <itemizedlist>
1219             <listitem>
1220               <para>
1221                 <classname>basic_branch</classname>
1222                 is an abstract base class for branched-based
1223                 associative-containers
1224               </para>
1225             </listitem>
1226
1227             <listitem>
1228               <para>
1229                 <classname>tree</classname>
1230                 is a concrete base class for tree-based
1231                 associative-containers
1232               </para>
1233             </listitem>
1234
1235             <listitem>
1236               <para>
1237                 <classname>trie</classname>
1238                 is a concrete base class trie-based
1239                 associative-containers
1240               </para>
1241             </listitem>
1242           </itemizedlist>
1243         </listitem>
1244
1245         <listitem>
1246           <para>
1247             Hash-Based
1248           </para>
1249           <itemizedlist>
1250             <listitem>
1251               <para>
1252                 <classname>basic_hash_table</classname>
1253                 is an abstract base class for hash-based
1254                 associative-containers
1255               </para>
1256             </listitem>
1257
1258             <listitem>
1259               <para>
1260                 <classname>cc_hash_table</classname>
1261                 is a concrete collision-chaining hash-based
1262                 associative-containers
1263               </para>
1264             </listitem>
1265
1266             <listitem>
1267               <para>
1268                 <classname>gp_hash_table</classname>
1269                 is a concrete (general) probing hash-based
1270                 associative-containers
1271               </para>
1272             </listitem>
1273           </itemizedlist>
1274         </listitem>
1275
1276         <listitem>
1277           <para>
1278             List-Based
1279           </para>
1280           <itemizedlist>
1281             <listitem>
1282               <para>
1283                 <classname>list_update</classname>
1284                 list-based update-policy associative container
1285               </para>
1286             </listitem>
1287           </itemizedlist>
1288         </listitem>
1289         <listitem>
1290           <para>
1291             Heap-Based
1292           </para>
1293           <itemizedlist>
1294             <listitem>
1295               <para>
1296                 <classname>priority_queue</classname>
1297                 A priority queue.
1298               </para>
1299             </listitem>
1300           </itemizedlist>
1301         </listitem>
1302       </itemizedlist>
1303
1304       <para>
1305         The hierarchy is composed naturally so that commonality is
1306         captured by base classes. Thus <function>operator[]</function>
1307         is defined at the base of any hierarchy, since all derived
1308         containers support it. Conversely <function>split</function> is
1309         defined in <classname>basic_branch</classname>, since only
1310         tree-like containers support it.
1311       </para>
1312
1313       <para>
1314         In addition, there are the following diagnostics classes,
1315         used to report errors specific to this library's data
1316         structures.
1317       </para>
1318
1319       <figure>
1320         <title>Exception Hierarchy</title>
1321         <mediaobject>
1322           <imageobject>
1323             <imagedata align="center" format="PDF" scale="75"
1324                        fileref="../images/pbds_exception_hierarchy.pdf"/>
1325           </imageobject>
1326           <imageobject>
1327             <imagedata align="center" format="PNG" scale="100"
1328                        fileref="../images/pbds_exception_hierarchy.png"/>
1329           </imageobject>
1330           <textobject>
1331             <phrase>Exception Hierarchy</phrase>
1332           </textobject>
1333         </mediaobject>
1334       </figure>
1335
1336     </section>
1337
1338     <section xml:id="pbds.using.tutorial">
1339       <info><title>Tutorial</title></info>
1340
1341       <section xml:id="pbds.using.tutorial.basic">
1342         <info><title>Basic Use</title></info>
1343
1344         <para>
1345           For the most part, the policy-based containers containers in
1346           namespace <literal>__gnu_pbds</literal> have the same interface as
1347           the equivalent containers in the standard C++ library, except for
1348           the names used for the container classes themselves. For example,
1349           this shows basic operations on a collision-chaining hash-based
1350           container:
1351         </para>
1352         <programlisting>
1353           #include &lt;ext/pb_ds/assoc_container.h&gt;
1354
1355           int main()
1356           {
1357           __gnu_pbds::cc_hash_table&lt;int, char&gt; c;
1358           c[2] = 'b';
1359           assert(c.find(1) == c.end());
1360           };
1361         </programlisting>
1362
1363         <para>
1364           The container is called
1365           <classname>__gnu_pbds::cc_hash_table</classname> instead of
1366           <classname>std::unordered_map</classname>, since <quote>unordered
1367           map</quote> does not necessarily mean a hash-based map as implied by
1368           the C++ library (C++11 or TR1). For example, list-based associative
1369           containers, which are very useful for the construction of
1370           "multimaps," are also unordered.
1371         </para>
1372
1373         <para>This snippet shows a red-black tree based container:</para>
1374
1375         <programlisting>
1376           #include &lt;ext/pb_ds/assoc_container.h&gt;
1377
1378           int main()
1379           {
1380           __gnu_pbds::tree&lt;int, char&gt; c;
1381           c[2] = 'b';
1382           assert(c.find(2) != c.end());
1383           };
1384         </programlisting>
1385
1386         <para>The container is called <classname>tree</classname> instead of
1387         <classname>map</classname> since the underlying data structures are
1388         being named with specificity.
1389         </para>
1390
1391         <para>
1392           The member function naming convention is to strive to be the same as
1393           the equivalent member functions in other C++ standard library
1394           containers. The familiar methods are unchanged:
1395           <function>begin</function>, <function>end</function>,
1396           <function>size</function>, <function>empty</function>, and
1397           <function>clear</function>.
1398         </para>
1399
1400         <para>
1401           This isn't to say that things are exactly as one would expect, given
1402           the container requirments and interfaces in the C++ standard.
1403         </para>
1404
1405         <para>
1406           The names of containers' policies and policy accessors are
1407           different then the usual. For example, if <type>hash_type</type> is
1408         some type of hash-based container, then</para>
1409
1410         <programlisting>
1411           hash_type::hash_fn
1412         </programlisting>
1413
1414         <para>
1415           gives the type of its hash functor, and if <varname>obj</varname> is
1416           some hash-based container object, then
1417         </para>
1418
1419         <programlisting>
1420           obj.get_hash_fn()
1421         </programlisting>
1422
1423         <para>will return a reference to its hash-functor object.</para>
1424
1425
1426         <para>
1427           Similarly, if <type>tree_type</type> is some type of tree-based
1428           container, then
1429         </para>
1430
1431         <programlisting>
1432           tree_type::cmp_fn
1433         </programlisting>
1434
1435         <para>
1436           gives the type of its comparison functor, and if
1437           <varname>obj</varname> is some tree-based container object,
1438           then
1439         </para>
1440
1441         <programlisting>
1442           obj.get_cmp_fn()
1443         </programlisting>
1444
1445         <para>will return a reference to its comparison-functor object.</para>
1446
1447         <para>
1448           It would be nice to give names consistent with those in the existing
1449           C++ standard (inclusive of TR1). Unfortunately, these standard
1450           containers don't consistently name types and methods. For example,
1451           <classname>std::tr1::unordered_map</classname> uses
1452           <type>hasher</type> for the hash functor, but
1453           <classname>std::map</classname> uses <type>key_compare</type> for
1454           the comparison functor. Also, we could not find an accessor for
1455           <classname>std::tr1::unordered_map</classname>'s hash functor, but
1456           <classname>std::map</classname> uses <classname>compare</classname>
1457           for accessing the comparison functor.
1458         </para>
1459
1460         <para>
1461           Instead, <literal>__gnu_pbds</literal> attempts to be internally
1462           consistent, and uses standard-derived terminology if possible.
1463         </para>
1464
1465         <para>
1466           Another source of difference is in scope:
1467           <literal>__gnu_pbds</literal> contains more types of associative
1468           containers than the standard C++ library, and more opportunities
1469           to configure these new containers, since different types of
1470           associative containers are useful in different settings.
1471         </para>
1472
1473         <para>
1474           Namespace <literal>__gnu_pbds</literal> contains different classes for
1475           hash-based containers, tree-based containers, trie-based containers,
1476           and list-based containers.
1477         </para>
1478
1479         <para>
1480           Since associative containers share parts of their interface, they
1481           are organized as a class hierarchy.
1482         </para>
1483
1484         <para>Each type or method is defined in the most-common ancestor
1485         in which it makes sense.
1486         </para>
1487
1488         <para>For example, all associative containers support iteration
1489         expressed in the following form:
1490         </para>
1491
1492         <programlisting>
1493           const_iterator
1494           begin() const;
1495
1496           iterator
1497           begin();
1498
1499           const_iterator
1500           end() const;
1501
1502           iterator
1503           end();
1504         </programlisting>
1505
1506         <para>
1507           But not all containers contain or use hash functors. Yet, both
1508           collision-chaining and (general) probing hash-based associative
1509           containers have a hash functor, so
1510           <classname>basic_hash_table</classname> contains the interface:
1511         </para>
1512
1513         <programlisting>
1514           const hash_fn&amp;
1515           get_hash_fn() const;
1516
1517           hash_fn&amp;
1518           get_hash_fn();
1519         </programlisting>
1520
1521         <para>
1522           so all hash-based associative containers inherit the same
1523           hash-functor accessor methods.
1524         </para>
1525
1526       </section> <!--basic use -->
1527
1528       <section xml:id="pbds.using.tutorial.configuring">
1529         <info>
1530           <title>
1531             Configuring via Template Parameters
1532           </title>
1533         </info>
1534
1535         <para>
1536           In general, each of this library's containers is
1537           parametrized by more policies than those of the standard library. For
1538           example, the standard hash-based container is parametrized as
1539           follows:
1540         </para>
1541         <programlisting>
1542           template&lt;typename Key, typename Mapped, typename Hash,
1543           typename Pred, typename Allocator, bool Cache_Hashe_Code&gt;
1544           class unordered_map;
1545         </programlisting>
1546
1547         <para>
1548           and so can be configured by key type, mapped type, a functor
1549           that translates keys to unsigned integral types, an equivalence
1550           predicate, an allocator, and an indicator whether to store hash
1551           values with each entry. this library's collision-chaining
1552           hash-based container is parametrized as
1553         </para>
1554         <programlisting>
1555           template&lt;typename Key, typename Mapped, typename Hash_Fn,
1556           typename Eq_Fn, typename Comb_Hash_Fn,
1557           typename Resize_Policy, bool Store_Hash
1558           typename Allocator&gt;
1559           class cc_hash_table;
1560         </programlisting>
1561
1562         <para>
1563           and so can be configured by the first four types of
1564           <classname>std::tr1::unordered_map</classname>, then a
1565           policy for translating the key-hash result into a position
1566           within the table, then a policy by which the table resizes,
1567           an indicator whether to store hash values with each entry,
1568           and an allocator (which is typically the last template
1569           parameter in standard containers).
1570         </para>
1571
1572         <para>
1573           Nearly all policy parameters have default values, so this
1574           need not be considered for casual use. It is important to
1575           note, however, that hash-based containers' policies can
1576           dramatically alter their performance in different settings,
1577           and that tree-based containers' policies can make them
1578           useful for other purposes than just look-up.
1579         </para>
1580
1581
1582         <para>As opposed to associative containers, priority queues have
1583         relatively few configuration options. The priority queue is
1584         parametrized as follows:</para>
1585         <programlisting>
1586           template&lt;typename Value_Type, typename Cmp_Fn,typename Tag,
1587           typename Allocator&gt;
1588           class priority_queue;
1589         </programlisting>
1590
1591         <para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1592         <classname>Allocator</classname> parameters are the container's value type,
1593         comparison-functor type, and allocator type, respectively;
1594         these are very similar to the standard's priority queue. The
1595         <classname>Tag</classname> parameter is different: there are a number of
1596         pre-defined tag types corresponding to binary heaps, binomial
1597         heaps, etc., and <classname>Tag</classname> should be instantiated
1598         by one of them.</para>
1599
1600         <para>Note that as opposed to the
1601         <classname>std::priority_queue</classname>,
1602         <classname>__gnu_pbds::priority_queue</classname> is not a
1603         sequence-adapter; it is a regular container.</para>
1604
1605       </section>
1606
1607       <section xml:id="pbds.using.tutorial.traits">
1608         <info>
1609           <title>
1610             Querying Container Attributes
1611           </title>
1612         </info>
1613         <para></para>
1614
1615         <para>A containers underlying data structure
1616         affect their performance; Unfortunately, they can also affect
1617         their interface. When manipulating generically associative
1618         containers, it is often useful to be able to statically
1619         determine what they can support and what the cannot.
1620         </para>
1621
1622         <para>Happily, the standard provides a good solution to a similar
1623         problem - that of the different behavior of iterators. If
1624         <classname>It</classname> is an iterator, then
1625         </para>
1626         <programlisting>
1627           typename std::iterator_traits&lt;It&gt;::iterator_category
1628         </programlisting>
1629
1630         <para>is one of a small number of pre-defined tag classes, and
1631         </para>
1632         <programlisting>
1633           typename std::iterator_traits&lt;It&gt;::value_type
1634         </programlisting>
1635
1636         <para>is the value type to which the iterator "points".</para>
1637
1638         <para>
1639           Similarly, in this library, if <type>C</type> is a
1640           container, then <classname>container_traits</classname> is a
1641           trait class that stores information about the kind of
1642           container that is implemented.
1643         </para>
1644         <programlisting>
1645           typename container_traits&lt;C&gt;::container_category
1646         </programlisting>
1647         <para>
1648           is one of a small number of predefined tag structures that
1649           uniquely identifies the type of underlying data structure.
1650         </para>
1651
1652         <para>In most cases, however, the exact underlying data
1653         structure is not really important, but what is important is
1654         one of its other attributes: whether it guarantees storing
1655         elements by key order, for example. For this one can
1656         use</para>
1657         <programlisting>
1658           typename container_traits&lt;C&gt;::order_preserving
1659         </programlisting>
1660         <para>
1661           Also,
1662         </para>
1663         <programlisting>
1664           typename container_traits&lt;C&gt;::invalidation_guarantee
1665         </programlisting>
1666
1667         <para>is the container's invalidation guarantee. Invalidation
1668         guarantees are especially important regarding priority queues,
1669         since in this library's design, iterators are practically the
1670         only way to manipulate them.</para>
1671       </section>
1672
1673       <section xml:id="pbds.using.tutorial.point_range_iteration">
1674         <info>
1675           <title>
1676             Point and Range Iteration
1677           </title>
1678         </info>
1679         <para></para>
1680
1681         <para>This library differentiates between two types of methods
1682         and iterators: point-type, and range-type. For example,
1683         <function>find</function> and <function>insert</function> are point-type methods, since
1684         they each deal with a specific element; their returned
1685         iterators are point-type iterators. <function>begin</function> and
1686         <function>end</function> are range-type methods, since they are not used to
1687         find a specific element, but rather to go over all elements in
1688         a container object; their returned iterators are range-type
1689         iterators.
1690         </para>
1691
1692         <para>Most containers store elements in an order that is
1693         determined by their interface. Correspondingly, it is fine that
1694         their point-type iterators are synonymous with their range-type
1695         iterators. For example, in the following snippet
1696         </para>
1697         <programlisting>
1698           std::for_each(c.find(1), c.find(5), foo);
1699         </programlisting>
1700         <para>
1701           two point-type iterators (returned by <function>find</function>) are used
1702           for a range-type purpose - going over all elements whose key is
1703           between 1 and 5.
1704         </para>
1705
1706         <para>
1707           Conversely, the above snippet makes no sense for
1708           self-organizing containers - ones that order (and reorder)
1709           their elements by implementation. It would be nice to have a
1710           uniform iterator system that would allow the above snippet to
1711           compile only if it made sense.
1712         </para>
1713
1714         <para>
1715           This could trivially be done by specializing
1716           <function>std::for_each</function> for the case of iterators returned by
1717           <classname>std::tr1::unordered_map</classname>, but this would only solve the
1718           problem for one algorithm and one container. Fundamentally, the
1719           problem is that one can loop using a self-organizing
1720           container's point-type iterators.
1721         </para>
1722
1723         <para>
1724           This library's containers define two families of
1725           iterators: <type>point_const_iterator</type> and
1726           <type>point_iterator</type> are the iterator types returned by
1727           point-type methods; <type>const_iterator</type> and
1728           <type>iterator</type> are the iterator types returned by range-type
1729           methods.
1730         </para>
1731         <programlisting>
1732           class &lt;- some container -&gt;
1733           {
1734           public:
1735           ...
1736
1737           typedef &lt;- something -&gt; const_iterator;
1738
1739           typedef &lt;- something -&gt; iterator;
1740
1741           typedef &lt;- something -&gt; point_const_iterator;
1742
1743           typedef &lt;- something -&gt; point_iterator;
1744
1745           ...
1746
1747           public:
1748           ...
1749
1750           const_iterator begin () const;
1751
1752           iterator begin();
1753
1754           point_const_iterator find(...) const;
1755
1756           point_iterator find(...);
1757           };
1758         </programlisting>
1759
1760         <para>For
1761         containers whose interface defines sequence order , it
1762         is very simple: point-type and range-type iterators are exactly
1763         the same, which means that the above snippet will compile if it
1764         is used for an order-preserving associative container.
1765         </para>
1766
1767         <para>
1768           For self-organizing containers, however, (hash-based
1769           containers as a special example), the preceding snippet will
1770           not compile, because their point-type iterators do not support
1771           <function>operator++</function>.
1772         </para>
1773
1774         <para>In any case, both for order-preserving and self-organizing
1775         containers, the following snippet will compile:
1776         </para>
1777         <programlisting>
1778           typename Cntnr::point_iterator it = c.find(2);
1779         </programlisting>
1780
1781         <para>
1782           because a range-type iterator can always be converted to a
1783           point-type iterator.
1784         </para>
1785
1786         <para>Distingushing between iterator types also
1787         raises the point that a container's iterators might have
1788         different invalidation rules concerning their de-referencing
1789         abilities and movement abilities. This now corresponds exactly
1790         to the question of whether point-type and range-type iterators
1791         are valid. As explained above, <classname>container_traits</classname> allows
1792         querying a container for its data structure attributes. The
1793         iterator-invalidation guarantees are certainly a property of
1794         the underlying data structure, and so
1795         </para>
1796         <programlisting>
1797           container_traits&lt;C&gt;::invalidation_guarantee
1798         </programlisting>
1799
1800         <para>
1801           gives one of three pre-determined types that answer this
1802           query.
1803         </para>
1804
1805       </section>
1806     </section> <!-- tutorial -->
1807
1808     <section xml:id="pbds.using.examples">
1809       <info><title>Examples</title></info>
1810       <para>
1811         Additional code examples are provided in the source
1812         distribution, as part of the regression and performance
1813         testsuite.
1814       </para>
1815
1816       <section xml:id="pbds.using.examples.basic">
1817         <info><title>Intermediate Use</title></info>
1818
1819         <itemizedlist>
1820           <listitem>
1821             <para>
1822               Basic use of maps:
1823               <filename>basic_map.cc</filename>
1824             </para>
1825           </listitem>
1826
1827           <listitem>
1828             <para>
1829               Basic use of sets:
1830               <filename>basic_set.cc</filename>
1831             </para>
1832           </listitem>
1833
1834           <listitem>
1835             <para>
1836               Conditionally erasing values from an associative container object:
1837               <filename>erase_if.cc</filename>
1838             </para>
1839           </listitem>
1840
1841           <listitem>
1842             <para>
1843               Basic use of multimaps:
1844               <filename>basic_multimap.cc</filename>
1845             </para>
1846           </listitem>
1847
1848           <listitem>
1849             <para>
1850               Basic use of multisets:
1851               <filename>basic_multiset.cc</filename>
1852             </para>
1853           </listitem>
1854
1855           <listitem>
1856             <para>
1857               Basic use of priority queues:
1858               <filename>basic_priority_queue.cc</filename>
1859             </para>
1860           </listitem>
1861
1862           <listitem>
1863             <para>
1864               Splitting and joining priority queues:
1865               <filename>priority_queue_split_join.cc</filename>
1866             </para>
1867           </listitem>
1868
1869           <listitem>
1870             <para>
1871               Conditionally erasing values from a priority queue:
1872               <filename>priority_queue_erase_if.cc</filename>
1873             </para>
1874           </listitem>
1875         </itemizedlist>
1876
1877       </section>
1878
1879       <section xml:id="pbds.using.examples.query">
1880         <info><title>Querying with <classname>container_traits</classname> </title></info>
1881         <itemizedlist>
1882           <listitem>
1883             <para>
1884               Using <classname>container_traits</classname> to query
1885               about underlying data structure behavior:
1886               <filename>assoc_container_traits.cc</filename>
1887             </para>
1888           </listitem>
1889
1890           <listitem>
1891             <para>
1892               A non-compiling example showing wrong use of finding keys in
1893               hash-based containers: <filename>hash_find_neg.cc</filename>
1894             </para>
1895           </listitem>
1896           <listitem>
1897             <para>
1898               Using <classname>container_traits</classname>
1899               to query about underlying data structure behavior:
1900               <filename>priority_queue_container_traits.cc</filename>
1901             </para>
1902           </listitem>
1903
1904         </itemizedlist>
1905
1906       </section>
1907
1908       <section xml:id="pbds.using.examples.container">
1909         <info><title>By Container Method</title></info>
1910         <para></para>
1911
1912         <section xml:id="pbds.using.examples.container.hash">
1913           <info><title>Hash-Based</title></info>
1914
1915           <section xml:id="pbds.using.examples.container.hash.resize">
1916             <info><title>size Related</title></info>
1917
1918             <itemizedlist>
1919               <listitem>
1920                 <para>
1921                   Setting the initial size of a hash-based container
1922                   object:
1923                   <filename>hash_initial_size.cc</filename>
1924                 </para>
1925               </listitem>
1926
1927               <listitem>
1928                 <para>
1929                   A non-compiling example showing how not to resize a
1930                   hash-based container object:
1931                   <filename>hash_resize_neg.cc</filename>
1932                 </para>
1933               </listitem>
1934
1935               <listitem>
1936                 <para>
1937                   Resizing the size of a hash-based container object:
1938                   <filename>hash_resize.cc</filename>
1939                 </para>
1940               </listitem>
1941
1942               <listitem>
1943                 <para>
1944                   Showing an illegal resize of a hash-based container
1945                   object:
1946                   <filename>hash_illegal_resize.cc</filename>
1947                 </para>
1948               </listitem>
1949
1950               <listitem>
1951                 <para>
1952                   Changing the load factors of a hash-based container
1953                   object: <filename>hash_load_set_change.cc</filename>
1954                 </para>
1955               </listitem>
1956             </itemizedlist>
1957           </section>
1958
1959           <section xml:id="pbds.using.examples.container.hash.hashor">
1960             <info><title>Hashing Function Related</title></info>
1961             <para></para>
1962
1963             <itemizedlist>
1964               <listitem>
1965                 <para>
1966                   Using a modulo range-hashing function for the case of an
1967                   unknown skewed key distribution:
1968                   <filename>hash_mod.cc</filename>
1969                 </para>
1970               </listitem>
1971
1972               <listitem>
1973                 <para>
1974                   Writing a range-hashing functor for the case of a known
1975                   skewed key distribution:
1976                   <filename>shift_mask.cc</filename>
1977                 </para>
1978               </listitem>
1979
1980               <listitem>
1981                 <para>
1982                   Storing the hash value along with each key:
1983                   <filename>store_hash.cc</filename>
1984                 </para>
1985               </listitem>
1986
1987               <listitem>
1988                 <para>
1989                   Writing a ranged-hash functor:
1990                   <filename>ranged_hash.cc</filename>
1991                 </para>
1992               </listitem>
1993             </itemizedlist>
1994
1995           </section>
1996
1997         </section>
1998
1999         <section xml:id="pbds.using.examples.container.branch">
2000           <info><title>Branch-Based</title></info>
2001
2002
2003           <section xml:id="pbds.using.examples.container.branch.split">
2004             <info><title>split or join Related</title></info>
2005
2006             <itemizedlist>
2007               <listitem>
2008                 <para>
2009                   Joining two tree-based container objects:
2010                   <filename>tree_join.cc</filename>
2011                 </para>
2012               </listitem>
2013
2014               <listitem>
2015                 <para>
2016                   Splitting a PATRICIA trie container object:
2017                   <filename>trie_split.cc</filename>
2018                 </para>
2019               </listitem>
2020
2021               <listitem>
2022                 <para>
2023                   Order statistics while joining two tree-based container
2024                   objects:
2025                   <filename>tree_order_statistics_join.cc</filename>
2026                 </para>
2027               </listitem>
2028             </itemizedlist>
2029
2030           </section>
2031
2032           <section xml:id="pbds.using.examples.container.branch.invariants">
2033             <info><title>Node Invariants</title></info>
2034
2035             <itemizedlist>
2036               <listitem>
2037                 <para>
2038                   Using trees for order statistics:
2039                   <filename>tree_order_statistics.cc</filename>
2040                 </para>
2041               </listitem>
2042
2043               <listitem>
2044                 <para>
2045                   Augmenting trees to support operations on line
2046                   intervals:
2047                   <filename>tree_intervals.cc</filename>
2048                 </para>
2049               </listitem>
2050             </itemizedlist>
2051
2052           </section>
2053
2054           <section xml:id="pbds.using.examples.container.branch.trie">
2055             <info><title>trie</title></info>
2056             <itemizedlist>
2057               <listitem>
2058                 <para>
2059                   Using a PATRICIA trie for DNA strings:
2060                   <filename>trie_dna.cc</filename>
2061                 </para>
2062               </listitem>
2063
2064               <listitem>
2065                 <para>
2066                   Using a PATRICIA
2067                   trie for finding all entries whose key matches a given prefix:
2068                   <filename>trie_prefix_search.cc</filename>
2069                 </para>
2070               </listitem>
2071             </itemizedlist>
2072
2073           </section>
2074
2075         </section>
2076
2077         <section xml:id="pbds.using.examples.container.priority_queue">
2078           <info><title>Priority Queues</title></info>
2079           <itemizedlist>
2080             <listitem>
2081               <para>
2082                 Cross referencing an associative container and a priority
2083                 queue: <filename>priority_queue_xref.cc</filename>
2084               </para>
2085             </listitem>
2086
2087             <listitem>
2088               <para>
2089                 Cross referencing a vector and a priority queue using a
2090                 very simple version of Dijkstra's shortest path
2091                 algorithm:
2092                 <filename>priority_queue_dijkstra.cc</filename>
2093               </para>
2094             </listitem>
2095           </itemizedlist>
2096
2097         </section>
2098
2099
2100       </section>
2101
2102     </section>
2103
2104   </section> <!-- using -->
2105
2106   <!-- S03: Design -->
2107
2108
2109 <section xml:id="containers.pbds.design">
2110   <info><title>Design</title></info>
2111   <?dbhtml filename="policy_data_structures_design.html"?>
2112   <para></para>
2113
2114   <section xml:id="pbds.design.concepts">
2115     <info><title>Concepts</title></info>
2116
2117     <section xml:id="pbds.design.concepts.null_type">
2118       <info><title>Null Policy Classes</title></info>
2119
2120       <para>
2121         Associative containers are typically parametrized by various
2122         policies. For example, a hash-based associative container is
2123         parametrized by a hash-functor, transforming each key into an
2124         non-negative numerical type. Each such value is then further mapped
2125         into a position within the table. The mapping of a key into a
2126         position within the table is therefore a two-step process.
2127       </para>
2128
2129       <para>
2130         In some cases, instantiations are redundant. For example, when the
2131         keys are integers, it is possible to use a redundant hash policy,
2132         which transforms each key into its value.
2133       </para>
2134
2135       <para>
2136         In some other cases, these policies are irrelevant.  For example, a
2137         hash-based associative container might transform keys into positions
2138         within a table by a different method than the two-step method
2139         described above. In such a case, the hash functor is simply
2140         irrelevant.
2141       </para>
2142
2143       <para>
2144         When a policy is either redundant or irrelevant, it can be replaced
2145         by <classname>null_type</classname>.
2146       </para>
2147
2148       <para>
2149         For example, a <emphasis>set</emphasis> is an associative
2150         container with one of its template parameters (the one for the
2151         mapped type) replaced with <classname>null_type</classname>. Other
2152         places simplifications are made possible with this technique
2153         include node updates in tree and trie data structures, and hash
2154         and probe functions for hash data structures.
2155       </para>
2156     </section>
2157
2158     <section xml:id="pbds.design.concepts.associative_semantics">
2159       <info><title>Map and Set Semantics</title></info>
2160
2161       <section xml:id="concepts.associative_semantics.set_vs_map">
2162         <info>
2163           <title>
2164             Distinguishing Between Maps and Sets
2165           </title>
2166         </info>
2167
2168         <para>
2169           Anyone familiar with the standard knows that there are four kinds
2170           of associative containers: maps, sets, multimaps, and
2171           multisets. The map datatype associates each key to
2172           some data.
2173         </para>
2174
2175         <para>
2176           Sets are associative containers that simply store keys -
2177           they do not map them to anything. In the standard, each map class
2178           has a corresponding set class. E.g.,
2179           <classname>std::map&lt;int, char&gt;</classname> maps each
2180           <classname>int</classname> to a <classname>char</classname>, but
2181           <classname>std::set&lt;int, char&gt;</classname> simply stores
2182           <classname>int</classname>s. In this library, however, there are no
2183           distinct classes for maps and sets. Instead, an associative
2184           container's <classname>Mapped</classname> template parameter is a policy: if
2185           it is instantiated by <classname>null_type</classname>, then it
2186           is a "set"; otherwise, it is a "map". E.g.,
2187         </para>
2188         <programlisting>
2189           cc_hash_table&lt;int, char&gt;
2190         </programlisting>
2191         <para>
2192           is a "map" mapping each <type>int</type> value to a <type>
2193           char</type>, but
2194         </para>
2195         <programlisting>
2196           cc_hash_table&lt;int, null_type&gt;
2197         </programlisting>
2198         <para>
2199           is a type that uniquely stores <type>int</type> values.
2200         </para>
2201         <para>Once the <classname>Mapped</classname> template parameter is instantiated
2202         by <classname>null_type</classname>, then
2203         the "set" acts very similarly to the standard's sets - it does not
2204         map each key to a distinct <classname>null_type</classname> object. Also,
2205         , the container's <type>value_type</type> is essentially
2206         its <type>key_type</type> - just as with the standard's sets
2207         .</para>
2208
2209         <para>
2210           The standard's multimaps and multisets allow, respectively,
2211           non-uniquely mapping keys and non-uniquely storing keys. As
2212           discussed, the
2213           reasons why this might be necessary are 1) that a key might be
2214           decomposed into a primary key and a secondary key, 2) that a
2215           key might appear more than once, or 3) any arbitrary
2216           combination of 1)s and 2)s. Correspondingly,
2217           one should use 1) "maps" mapping primary keys to secondary
2218           keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2219           combination of 1)s and 2)s. Thus, for example, an
2220           <classname>std::multiset&lt;int&gt;</classname> might be used to store
2221           multiple instances of integers, but using this library's
2222           containers, one might use
2223         </para>
2224         <programlisting>
2225           tree&lt;int, size_t&gt;
2226         </programlisting>
2227
2228         <para>
2229           i.e., a <classname>map</classname> of <type>int</type>s to
2230           <type>size_t</type>s.
2231         </para>
2232         <para>
2233           These "multimaps" and "multisets" might be confusing to
2234           anyone familiar with the standard's <classname>std::multimap</classname> and
2235           <classname>std::multiset</classname>, because there is no clear
2236           correspondence between the two. For example, in some cases
2237           where one uses <classname>std::multiset</classname> in the standard, one might use
2238           in this library a "multimap" of "multisets" - i.e., a
2239           container that maps primary keys each to an associative
2240           container that maps each secondary key to the number of times
2241           it occurs.
2242         </para>
2243
2244         <para>
2245           When one uses a "multimap," one should choose with care the
2246           type of container used for secondary keys.
2247         </para>
2248       </section> <!-- map vs set -->
2249
2250
2251       <section xml:id="concepts.associative_semantics.multi">
2252         <info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2253
2254         <para>
2255           Brace onself: this library does not contain containers like
2256           <classname>std::multimap</classname> or
2257           <classname>std::multiset</classname>. Instead, these data
2258           structures can be synthesized via manipulation of the
2259           <classname>Mapped</classname> template parameter.
2260         </para>
2261         <para>
2262           One maps the unique part of a key - the primary key, into an
2263           associative-container of the (originally) non-unique parts of
2264           the key - the secondary key. A primary associative-container
2265           is an associative container of primary keys; a secondary
2266           associative-container is an associative container of
2267           secondary keys.
2268         </para>
2269
2270         <para>
2271           Stepping back a bit, and starting in from the beginning.
2272         </para>
2273
2274
2275         <para>
2276           Maps (or sets) allow mapping (or storing) unique-key values.
2277           The standard library also supplies associative containers which
2278           map (or store) multiple values with equivalent keys:
2279           <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2280           <classname>std::tr1::unordered_multimap</classname>, and
2281           <classname>unordered_multiset</classname>. We first discuss how these might
2282           be used, then why we think it is best to avoid them.
2283         </para>
2284
2285         <para>
2286           Suppose one builds a simple bank-account application that
2287           records for each client (identified by an <classname>std::string</classname>)
2288           and account-id (marked by an <type>unsigned long</type>) -
2289           the balance in the account (described by a
2290           <type>float</type>). Suppose further that ordering this
2291           information is not useful, so a hash-based container is
2292           preferable to a tree based container. Then one can use
2293         </para>
2294
2295         <programlisting>
2296           std::tr1::unordered_map&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2297         </programlisting>
2298
2299         <para>
2300           which hashes every combination of client and account-id. This
2301           might work well, except for the fact that it is now impossible
2302           to efficiently list all of the accounts of a specific client
2303           (this would practically require iterating over all
2304           entries). Instead, one can use
2305         </para>
2306
2307         <programlisting>
2308           std::tr1::unordered_multimap&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2309         </programlisting>
2310
2311         <para>
2312           which hashes every client, and decides equivalence based on
2313           client only. This will ensure that all accounts belonging to a
2314           specific user are stored consecutively.
2315         </para>
2316
2317         <para>
2318           Also, suppose one wants an integers' priority queue
2319           (a container that supports <function>push</function>,
2320           <function>pop</function>, and <function>top</function> operations, the last of which
2321           returns the largest <type>int</type>) that also supports
2322           operations such as <function>find</function> and <function>lower_bound</function>. A
2323           reasonable solution is to build an adapter over
2324           <classname>std::set&lt;int&gt;</classname>. In this adapter,
2325           <function>push</function> will just call the tree-based
2326           associative container's <function>insert</function> method; <function>pop</function>
2327           will call its <function>end</function> method, and use it to return the
2328           preceding element (which must be the largest). Then this might
2329           work well, except that the container object cannot hold
2330           multiple instances of the same integer (<function>push(4)</function>,
2331           will be a no-op if <constant>4</constant> is already in the
2332           container object). If multiple keys are necessary, then one
2333           might build the adapter over an
2334           <classname>std::multiset&lt;int&gt;</classname>.
2335         </para>
2336
2337         <para>
2338           The standard library's non-unique-mapping containers are useful
2339           when (1) a key can be decomposed in to a primary key and a
2340           secondary key, (2) a key is needed multiple times, or (3) any
2341           combination of (1) and (2).
2342         </para>
2343
2344         <para>
2345           The graphic below shows how the standard library's container
2346           design works internally; in this figure nodes shaded equally
2347           represent equivalent-key values. Equivalent keys are stored
2348           consecutively using the properties of the underlying data
2349           structure: binary search trees (label A) store equivalent-key
2350           values consecutively (in the sense of an in-order walk)
2351           naturally; collision-chaining hash tables (label B) store
2352           equivalent-key values in the same bucket, the bucket can be
2353           arranged so that equivalent-key values are consecutive.
2354         </para>
2355
2356         <figure>
2357           <title>Non-unique Mapping Standard Containers</title>
2358           <mediaobject>
2359             <imageobject>
2360               <imagedata align="center" format="PNG" scale="100"
2361                          fileref="../images/pbds_embedded_lists_1.png"/>
2362             </imageobject>
2363             <textobject>
2364               <phrase>Non-unique Mapping Standard Containers</phrase>
2365             </textobject>
2366           </mediaobject>
2367         </figure>
2368
2369         <para>
2370           Put differently, the standards' non-unique mapping
2371           associative-containers are associative containers that map
2372           primary keys to linked lists that are embedded into the
2373           container. The graphic below shows again the two
2374           containers from the first graphic above, this time with
2375           the embedded linked lists of the grayed nodes marked
2376           explicitly.
2377         </para>
2378
2379         <figure xml:id="fig.pbds_embedded_lists_2">
2380           <title>
2381             Effect of embedded lists in
2382             <classname>std::multimap</classname>
2383           </title>
2384           <mediaobject>
2385             <imageobject>
2386               <imagedata align="center" format="PNG" scale="100"
2387                          fileref="../images/pbds_embedded_lists_2.png"/>
2388             </imageobject>
2389             <textobject>
2390               <phrase>
2391                 Effect of embedded lists in
2392                 <classname>std::multimap</classname>
2393               </phrase>
2394             </textobject>
2395           </mediaobject>
2396         </figure>
2397
2398         <para>
2399           These embedded linked lists have several disadvantages.
2400         </para>
2401
2402         <orderedlist>
2403           <listitem>
2404             <para>
2405               The underlying data structure embeds the linked lists
2406               according to its own consideration, which means that the
2407               search path for a value might include several different
2408               equivalent-key values. For example, the search path for the
2409               the black node in either of the first graphic, labels A or B,
2410               includes more than a single gray node.
2411             </para>
2412           </listitem>
2413
2414           <listitem>
2415             <para>
2416               The links of the linked lists are the underlying data
2417               structures' nodes, which typically are quite structured.  In
2418               the case of tree-based containers (the grapic above, label
2419               B), each "link" is actually a node with three pointers (one
2420               to a parent and two to children), and a
2421               relatively-complicated iteration algorithm. The linked
2422               lists, therefore, can take up quite a lot of memory, and
2423               iterating over all values equal to a given key (through the
2424               return value of the standard
2425               library's <function>equal_range</function>) can be
2426               expensive.
2427             </para>
2428           </listitem>
2429
2430           <listitem>
2431             <para>
2432               The primary key is stored multiply; this uses more memory.
2433             </para>
2434           </listitem>
2435
2436           <listitem>
2437             <para>
2438               Finally, the interface of this design excludes several
2439               useful underlying data structures. Of all the unordered
2440               self-organizing data structures, practically only
2441               collision-chaining hash tables can (efficiently) guarantee
2442               that equivalent-key values are stored consecutively.
2443             </para>
2444           </listitem>
2445         </orderedlist>
2446
2447         <para>
2448           The above reasons hold even when the ratio of secondary keys to
2449           primary keys (or average number of identical keys) is small, but
2450           when it is large, there are more severe problems:
2451         </para>
2452
2453         <orderedlist>
2454           <listitem>
2455             <para>
2456               The underlying data structures order the links inside each
2457               embedded linked-lists according to their internal
2458               considerations, which effectively means that each of the
2459               links is unordered. Irrespective of the underlying data
2460               structure, searching for a specific value can degrade to
2461               linear complexity.
2462             </para>
2463           </listitem>
2464
2465           <listitem>
2466             <para>
2467               Similarly to the above point, it is impossible to apply
2468               to the secondary keys considerations that apply to primary
2469               keys. For example, it is not possible to maintain secondary
2470               keys by sorted order.
2471             </para>
2472           </listitem>
2473
2474           <listitem>
2475             <para>
2476               While the interface "understands" that all equivalent-key
2477               values constitute a distinct list (through
2478               <function>equal_range</function>), the underlying data
2479               structure typically does not. This means that operations such
2480               as erasing from a tree-based container all values whose keys
2481               are equivalent to a a given key can be super-linear in the
2482               size of the tree; this is also true also for several other
2483               operations that target a specific list.
2484             </para>
2485           </listitem>
2486
2487         </orderedlist>
2488
2489         <para>
2490           In this library, all associative containers map
2491           (or store) unique-key values. One can (1) map primary keys to
2492           secondary associative-containers (containers of
2493           secondary keys) or non-associative containers (2) map identical
2494           keys to a size-type representing the number of times they
2495           occur, or (3) any combination of (1) and (2). Instead of
2496           allowing multiple equivalent-key values, this library
2497           supplies associative containers based on underlying
2498           data structures that are suitable as secondary
2499           associative-containers.
2500         </para>
2501
2502         <para>
2503           In the figure below, labels A and B show the equivalent
2504           underlying data structures in this library, as mapped to the
2505           first graphic above. Labels A and B, respectively. Each shaded
2506           box represents some size-type or secondary
2507           associative-container.
2508         </para>
2509
2510         <figure>
2511           <title>Non-unique Mapping Containers</title>
2512           <mediaobject>
2513             <imageobject>
2514               <imagedata align="center" format="PNG" scale="100"
2515                          fileref="../images/pbds_embedded_lists_3.png"/>
2516             </imageobject>
2517             <textobject>
2518               <phrase>Non-unique Mapping Containers</phrase>
2519             </textobject>
2520           </mediaobject>
2521         </figure>
2522
2523         <para>
2524           In the first example above, then, one would use an associative
2525           container mapping each user to an associative container which
2526           maps each application id to a start time (see
2527           <filename>example/basic_multimap.cc</filename>); in the second
2528           example, one would use an associative container mapping
2529           each <classname>int</classname> to some size-type indicating the
2530           number of times it logically occurs
2531           (see <filename>example/basic_multiset.cc</filename>.
2532         </para>
2533
2534         <para>
2535           See the discussion in list-based container types for containers
2536           especially suited as secondary associative-containers.
2537         </para>
2538       </section>
2539
2540     </section> <!-- map and set semantics -->
2541
2542     <section xml:id="pbds.design.concepts.iterator_semantics">
2543       <info><title>Iterator Semantics</title></info>
2544
2545       <section xml:id="concepts.iterator_semantics.point_and_range">
2546         <info><title>Point and Range Iterators</title></info>
2547
2548         <para>
2549           Iterator concepts are bifurcated in this design, and are
2550           comprised of point-type and range-type iteration.
2551         </para>
2552
2553         <para>
2554           A point-type iterator is an iterator that refers to a specific
2555           element as returned through an
2556           associative-container's <function>find</function> method.
2557         </para>
2558
2559         <para>
2560           A range-type iterator is an iterator that is used to go over a
2561           sequence of elements, as returned by a container's
2562           <function>find</function> method.
2563         </para>
2564
2565         <para>
2566           A point-type method is a method that
2567           returns a point-type iterator; a range-type method is a method
2568           that returns a range-type iterator.
2569         </para>
2570
2571         <para>For most containers, these types are synonymous; for
2572         self-organizing containers, such as hash-based containers or
2573         priority queues, these are inherently different (in any
2574         implementation, including that of C++ standard library
2575         components), but in this design, it is made explicit. They are
2576         distinct types.
2577         </para>
2578       </section>
2579
2580
2581       <section xml:id="concepts.iterator_semantics.both">
2582         <info><title>Distinguishing Point and Range Iterators</title></info>
2583
2584         <para>When using this library, is necessary to differentiate
2585         between two types of methods and iterators: point-type methods and
2586         iterators, and range-type methods and iterators. Each associative
2587         container's interface includes the methods:</para>
2588         <programlisting>
2589           point_const_iterator
2590           find(const_key_reference r_key) const;
2591
2592           point_iterator
2593           find(const_key_reference r_key);
2594
2595           std::pair&lt;point_iterator,bool&gt;
2596           insert(const_reference r_val);
2597         </programlisting>
2598
2599         <para>The relationship between these iterator types varies between
2600         container types. The figure below
2601         shows the most general invariant between point-type and
2602         range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2603         always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2604         shows invariants for order-preserving containers: point-type
2605         iterators are synonymous with range-type iterators.
2606         Orthogonally,  <emphasis>C</emphasis>shows invariants for "set"
2607         containers: iterators are synonymous with const iterators.</para>
2608
2609         <figure>
2610           <title>Point Iterator Hierarchy</title>
2611           <mediaobject>
2612             <imageobject>
2613               <imagedata align="center" format="PNG" scale="100"
2614                          fileref="../images/pbds_point_iterator_hierarchy.png"/>
2615             </imageobject>
2616             <textobject>
2617               <phrase>Point Iterator Hierarchy</phrase>
2618             </textobject>
2619           </mediaobject>
2620         </figure>
2621
2622
2623         <para>Note that point-type iterators in self-organizing containers
2624         (hash-based associative containers) lack movement
2625         operators, such as <literal>operator++</literal> - in fact, this
2626         is the reason why this library differentiates from the standard C++ librarys
2627         design on this point.</para>
2628
2629         <para>Typically, one can determine an iterator's movement
2630         capabilities using
2631         <literal>std::iterator_traits&lt;It&gt;iterator_category</literal>,
2632         which is a <literal>struct</literal> indicating the iterator's
2633         movement capabilities. Unfortunately, none of the standard predefined
2634         categories reflect a pointer's <emphasis>not</emphasis> having any
2635         movement capabilities whatsoever. Consequently,
2636         <literal>pb_ds</literal> adds a type
2637         <literal>trivial_iterator_tag</literal> (whose name is taken from
2638         a concept in C++ standardese, which is the category of iterators
2639         with no movement capabilities.) All other standard C++ library
2640         tags, such as <literal>forward_iterator_tag</literal> retain their
2641         common use.</para>
2642
2643       </section>
2644
2645       <section xml:id="pbds.design.concepts.invalidation">
2646         <info><title>Invalidation Guarantees</title></info>
2647         <para>
2648           If one manipulates a container object, then iterators previously
2649           obtained from it can be invalidated. In some cases a
2650           previously-obtained iterator cannot be de-referenced; in other cases,
2651           the iterator's next or previous element might have changed
2652           unpredictably. This corresponds exactly to the question whether a
2653           point-type or range-type iterator (see previous concept) is valid or
2654           not. In this design, one can query a container (in compile time) about
2655           its invalidation guarantees.
2656         </para>
2657
2658
2659         <para>
2660           Given three different types of associative containers, a modifying
2661           operation (in that example, <function>erase</function>) invalidated
2662           iterators in three different ways: the iterator of one container
2663           remained completely valid - it could be de-referenced and
2664           incremented; the iterator of a different container could not even be
2665           de-referenced; the iterator of the third container could be
2666           de-referenced, but its "next" iterator changed unpredictably.
2667         </para>
2668
2669         <para>
2670           Distinguishing between find and range types allows fine-grained
2671           invalidation guarantees, because these questions correspond exactly
2672           to the question of whether point-type iterators and range-type
2673           iterators are valid. The graphic below shows tags corresponding to
2674           different types of invalidation guarantees.
2675         </para>
2676
2677         <figure>
2678           <title>Invalidation Guarantee Tags Hierarchy</title>
2679           <mediaobject>
2680             <imageobject>
2681               <imagedata align="center" format="PDF" scale="75"
2682                          fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2683             </imageobject>
2684             <imageobject>
2685               <imagedata align="center" format="PNG" scale="100"
2686                          fileref="../images/pbds_invalidation_tag_hierarchy.png"/>
2687             </imageobject>
2688             <textobject>
2689               <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2690             </textobject>
2691           </mediaobject>
2692         </figure>
2693
2694         <itemizedlist>
2695           <listitem>
2696             <para>
2697               <classname>basic_invalidation_guarantee</classname>
2698               corresponds to a basic guarantee that a point-type iterator,
2699               a found pointer, or a found reference, remains valid as long
2700               as the container object is not modified.
2701             </para>
2702           </listitem>
2703
2704           <listitem>
2705             <para>
2706               <classname>point_invalidation_guarantee</classname>
2707               corresponds to a guarantee that a point-type iterator, a
2708               found pointer, or a found reference, remains valid even if
2709               the container object is modified.
2710             </para>
2711           </listitem>
2712
2713           <listitem>
2714             <para>
2715               <classname>range_invalidation_guarantee</classname>
2716               corresponds to a guarantee that a range-type iterator remains
2717               valid even if the container object is modified.
2718             </para>
2719           </listitem>
2720         </itemizedlist>
2721
2722         <para>To find the invalidation guarantee of a
2723         container, one can use</para>
2724         <programlisting>
2725           typename container_traits&lt;Cntnr&gt;::invalidation_guarantee
2726         </programlisting>
2727
2728         <para>Note that this hierarchy corresponds to the logic it
2729         represents: if a container has range-invalidation guarantees,
2730         then it must also have find invalidation guarantees;
2731         correspondingly, its invalidation guarantee (in this case
2732         <classname>range_invalidation_guarantee</classname>)
2733         can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2734         This means that this this hierarchy can be used easily using
2735         standard metaprogramming techniques, by specializing on the
2736         type of <literal>invalidation_guarantee</literal>.</para>
2737
2738         <para>
2739           These types of problems were addressed, in a more general
2740           setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2741           our opinion, an invalidation-guarantee hierarchy would solve
2742           these problems in all container types - not just associative
2743           containers.
2744         </para>
2745
2746       </section>
2747     </section> <!-- iterator semantics -->
2748
2749     <section xml:id="pbds.design.concepts.genericity">
2750       <info><title>Genericity</title></info>
2751
2752       <para>
2753         The design attempts to address the following problem of
2754         data-structure genericity. When writing a function manipulating
2755         a generic container object, what is the behavior of the object?
2756         Suppose one writes
2757       </para>
2758       <programlisting>
2759         template&lt;typename Cntnr&gt;
2760         void
2761         some_op_sequence(Cntnr &amp;r_container)
2762         {
2763         ...
2764         }
2765       </programlisting>
2766
2767       <para>
2768         then one needs to address the following questions in the body
2769         of <function>some_op_sequence</function>:
2770       </para>
2771
2772       <itemizedlist>
2773         <listitem>
2774           <para>
2775             Which types and methods does <literal>Cntnr</literal> support?
2776             Containers based on hash tables can be queries for the
2777             hash-functor type and object; this is meaningless for tree-based
2778             containers. Containers based on trees can be split, joined, or
2779             can erase iterators and return the following iterator; this
2780             cannot be done by hash-based containers.
2781           </para>
2782         </listitem>
2783
2784         <listitem>
2785           <para>
2786             What are the exception and invalidation guarantees
2787             of <literal>Cntnr</literal>? A container based on a probing
2788             hash-table invalidates all iterators when it is modified; this
2789             is not the case for containers based on node-based
2790             trees. Containers based on a node-based tree can be split or
2791             joined without exceptions; this is not the case for containers
2792             based on vector-based trees.
2793           </para>
2794         </listitem>
2795
2796         <listitem>
2797           <para>
2798             How does the container maintain its elements? Tree-based and
2799             Trie-based containers store elements by key order; others,
2800             typically, do not. A container based on a splay trees or lists
2801             with update policies "cache" "frequently accessed" elements;
2802             containers based on most other underlying data structures do
2803             not.
2804           </para>
2805         </listitem>
2806         <listitem>
2807           <para>
2808             How does one query a container about characteristics and
2809             capabilities? What is the relationship between two different
2810             data structures, if anything?
2811           </para>
2812         </listitem>
2813       </itemizedlist>
2814
2815       <para>The remainder of this section explains these issues in
2816       detail.</para>
2817
2818
2819       <section xml:id="concepts.genericity.tag">
2820         <info><title>Tag</title></info>
2821         <para>
2822           Tags are very useful for manipulating generic types. For example, if
2823           <literal>It</literal> is an iterator class, then <literal>typename
2824           It::iterator_category</literal> or <literal>typename
2825           std::iterator_traits&lt;It&gt;::iterator_category</literal> will
2826           yield its category, and <literal>typename
2827           std::iterator_traits&lt;It&gt;::value_type</literal> will yield its
2828           value type.
2829         </para>
2830
2831         <para>
2832           This library contains a container tag hierarchy corresponding to the
2833           diagram below.
2834         </para>
2835
2836         <figure>
2837           <title>Container Tag Hierarchy</title>
2838           <mediaobject>
2839             <imageobject>
2840               <imagedata align="center" format="PDF" scale="75"
2841                          fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2842             </imageobject>
2843             <imageobject>
2844               <imagedata align="center" format="PNG" scale="100"
2845                          fileref="../images/pbds_container_tag_hierarchy.png"/>
2846             </imageobject>
2847             <textobject>
2848               <phrase>Container Tag Hierarchy</phrase>
2849             </textobject>
2850           </mediaobject>
2851         </figure>
2852
2853         <para>
2854           Given any container <type>Cntnr</type>, the tag of
2855           the underlying data structure can be found via <literal>typename
2856           Cntnr::container_category</literal>.
2857         </para>
2858
2859       </section> <!-- tag -->
2860
2861       <section xml:id="concepts.genericity.traits">
2862         <info><title>Traits</title></info>
2863         <para></para>
2864
2865         <para>Additionally, a traits mechanism can be used to query a
2866         container type for its attributes. Given any container
2867         <literal>Cntnr</literal>, then <literal>&lt;Cntnr&gt;</literal>
2868         is a traits class identifying the properties of the
2869         container.</para>
2870
2871         <para>To find if a container can throw when a key is erased (which
2872         is true for vector-based trees, for example), one can
2873         use
2874         </para>
2875         <programlisting>container_traits&lt;Cntnr&gt;::erase_can_throw</programlisting>
2876
2877         <para>
2878           Some of the definitions in <classname>container_traits</classname>
2879           are dependent on other
2880           definitions. If <classname>container_traits&lt;Cntnr&gt;::order_preserving</classname>
2881           is <constant>true</constant> (which is the case for containers
2882           based on trees and tries), then the container can be split or
2883           joined; in this
2884           case, <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2885           indicates whether splits or joins can throw exceptions (which is
2886           true for vector-based trees);
2887           otherwise <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2888           will yield a compilation error. (This is somewhat similar to a
2889           compile-time version of the COM model).
2890         </para>
2891
2892       </section> <!-- traits -->
2893
2894     </section> <!-- genericity -->
2895   </section> <!-- concepts -->
2896
2897   <section xml:id="pbds.design.container">
2898     <info><title>By Container</title></info>
2899
2900     <!-- hash -->
2901     <section xml:id="pbds.design.container.hash">
2902       <info><title>hash</title></info>
2903
2904       <!--
2905
2906 // hash policies
2907 /// general terms / background
2908 /// range hashing policies
2909 /// ranged-hash policies
2910 /// implementation
2911
2912 // resize policies
2913 /// general
2914 /// size policies
2915 /// trigger policies
2916 /// implementation
2917
2918 // policy interactions
2919 /// probe/size/trigger
2920 /// hash/trigger
2921 /// eq/hash/storing hash values
2922 /// size/load-check trigger
2923       -->
2924       <section xml:id="container.hash.interface">
2925         <info><title>Interface</title></info>
2926
2927
2928
2929         <para>
2930           The collision-chaining hash-based container has the
2931         following declaration.</para>
2932         <programlisting>
2933           template&lt;
2934           typename Key,
2935           typename Mapped,
2936           typename Hash_Fn = std::hash&lt;Key&gt;,
2937           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2938           typename Comb_Hash_Fn =  direct_mask_range_hashing&lt;&gt;
2939           typename Resize_Policy = default explained below.
2940           bool Store_Hash = false,
2941           typename Allocator = std::allocator&lt;char&gt; &gt;
2942           class cc_hash_table;
2943         </programlisting>
2944
2945         <para>The parameters have the following meaning:</para>
2946
2947         <orderedlist>
2948           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2949
2950           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2951
2952           <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2953
2954           <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2955
2956           <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2957           it describes how to translate hash values into positions
2958           within the table. </para></listitem>
2959
2960           <listitem><para><classname>Resize_Policy</classname> describes how a container object
2961           should change its internal size. </para></listitem>
2962
2963           <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2964           should be stored with each entry. </para></listitem>
2965
2966           <listitem><para><classname>Allocator</classname> is an allocator
2967           type.</para></listitem>
2968         </orderedlist>
2969
2970         <para>The probing hash-based container has the following
2971         declaration.</para>
2972         <programlisting>
2973           template&lt;
2974           typename Key,
2975           typename Mapped,
2976           typename Hash_Fn = std::hash&lt;Key&gt;,
2977           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2978           typename Comb_Probe_Fn = direct_mask_range_hashing&lt;&gt;
2979           typename Probe_Fn = default explained below.
2980           typename Resize_Policy = default explained below.
2981           bool Store_Hash = false,
2982           typename Allocator =  std::allocator&lt;char&gt; &gt;
2983           class gp_hash_table;
2984         </programlisting>
2985
2986         <para>The parameters are identical to those of the
2987         collision-chaining container, except for the following.</para>
2988
2989         <orderedlist>
2990           <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2991           sequence into a sequence of positions within the table.</para></listitem>
2992
2993           <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2994         </orderedlist>
2995
2996         <para>Some of the default template values depend on the values of
2997         other parameters, and are explained below.</para>
2998
2999       </section>
3000       <section xml:id="container.hash.details">
3001         <info><title>Details</title></info>
3002
3003         <section xml:id="container.hash.details.hash_policies">
3004           <info><title>Hash Policies</title></info>
3005
3006           <section xml:id="details.hash_policies.general">
3007             <info><title>General</title></info>
3008
3009             <para>Following is an explanation of some functions which hashing
3010             involves. The graphic below illustrates the discussion.</para>
3011
3012             <figure>
3013               <title>Hash functions, ranged-hash functions, and
3014               range-hashing functions</title>
3015               <mediaobject>
3016                 <imageobject>
3017                   <imagedata align="center" format="PNG" scale="100"
3018                              fileref="../images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
3019                 </imageobject>
3020                 <textobject>
3021                   <phrase>Hash functions, ranged-hash functions, and
3022                   range-hashing functions</phrase>
3023                 </textobject>
3024               </mediaobject>
3025             </figure>
3026
3027             <para>Let U be a domain (e.g., the integers, or the
3028             strings of 3 characters). A hash-table algorithm needs to map
3029             elements of U "uniformly" into the range [0,..., m -
3030             1] (where m is a non-negative integral value, and
3031             is, in general, time varying). I.e., the algorithm needs
3032             a ranged-hash function</para>
3033
3034             <para>
3035               f : U × Z<subscript>+</subscript> → Z<subscript>+</subscript>
3036             </para>
3037
3038             <para>such that for any u in U ,</para>
3039
3040             <para>0 ≤ f(u, m) ≤ m - 1</para>
3041
3042             <para>and which has "good uniformity" properties (say
3043             <xref linkend="biblio.knuth98sorting"/>.)
3044             One
3045             common solution is to use the composition of the hash
3046             function</para>
3047
3048             <para>h : U → Z<subscript>+</subscript> ,</para>
3049
3050             <para>which maps elements of U into the non-negative
3051             integrals, and</para>
3052
3053             <para>g : Z<subscript>+</subscript> × Z<subscript>+</subscript> →
3054             Z<subscript>+</subscript>,</para>
3055
3056             <para>which maps a non-negative hash value, and a non-negative
3057             range upper-bound into a non-negative integral in the range
3058             between 0 (inclusive) and the range upper bound (exclusive),
3059             i.e., for any r in Z<subscript>+</subscript>,</para>
3060
3061             <para>0 ≤ g(r, m) ≤ m - 1</para>
3062
3063
3064             <para>The resulting ranged-hash function, is</para>
3065
3066             <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3067             <equation>
3068               <title>Ranged Hash Function</title>
3069               <mathphrase>
3070                 f(u , m) = g(h(u), m)
3071               </mathphrase>
3072             </equation>
3073
3074             <para>From the above, it is obvious that given g and
3075             h, f can always be composed (however the converse
3076             is not true). The standard's hash-based containers allow specifying
3077             a hash function, and use a hard-wired range-hashing function;
3078             the ranged-hash function is implicitly composed.</para>
3079
3080             <para>The above describes the case where a key is to be mapped
3081             into a single position within a hash table, e.g.,
3082             in a collision-chaining table. In other cases, a key is to be
3083             mapped into a sequence of positions within a table,
3084             e.g., in a probing table. Similar terms apply in this
3085             case: the table requires a ranged probe function,
3086             mapping a key into a sequence of positions withing the table.
3087             This is typically achieved by composing a hash function
3088             mapping the key into a non-negative integral type, a
3089             probe function transforming the hash value into a
3090             sequence of hash values, and a range-hashing function
3091             transforming the sequence of hash values into a sequence of
3092             positions.</para>
3093
3094           </section>
3095
3096           <section xml:id="details.hash_policies.range">
3097             <info><title>Range Hashing</title></info>
3098
3099             <para>Some common choices for range-hashing functions are the
3100             division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3101             as</para>
3102
3103             <equation>
3104               <title>Range-Hashing, Division Method</title>
3105               <mathphrase>
3106                 g(r, m) = r mod m
3107               </mathphrase>
3108             </equation>
3109
3110
3111
3112             <para>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</para>
3113
3114             <para>and</para>
3115
3116             <para>g(r, m) = ⌈ u/v ( r<superscript>2</superscript> mod v ) ⌉</para>
3117
3118             <para>respectively, for some positive integrals u and
3119             v (typically powers of 2), and some a. Each of
3120             these range-hashing functions works best for some different
3121             setting.</para>
3122
3123             <para>The division method (see above) is a
3124             very common choice. However, even this single method can be
3125             implemented in two very different ways. It is possible to
3126             implement using the low
3127             level % (modulo) operation (for any m), or the
3128             low level &amp; (bit-mask) operation (for the case where
3129             m is a power of 2), i.e.,</para>
3130
3131             <equation>
3132               <title>Division via Prime Modulo</title>
3133               <mathphrase>
3134                 g(r, m) = r % m
3135               </mathphrase>
3136             </equation>
3137
3138             <para>and</para>
3139
3140             <equation>
3141               <title>Division via Bit Mask</title>
3142               <mathphrase>
3143                 g(r, m) = r &amp; m - 1, (with m =
3144                 2<superscript>k</superscript> for some k)
3145               </mathphrase>
3146             </equation>
3147
3148
3149             <para>respectively.</para>
3150
3151             <para>The % (modulo) implementation has the advantage that for
3152             m a prime far from a power of 2, g(r, m) is
3153             affected by all the bits of r (minimizing the chance of
3154             collision). It has the disadvantage of using the costly modulo
3155             operation. This method is hard-wired into SGI's implementation
3156             .</para>
3157
3158             <para>The &amp; (bit-mask) implementation has the advantage of
3159             relying on the fast bit-wise and operation. It has the
3160             disadvantage that for g(r, m) is affected only by the
3161             low order bits of r. This method is hard-wired into
3162             Dinkumware's implementation.</para>
3163
3164
3165           </section>
3166
3167           <section xml:id="details.hash_policies.ranged">
3168             <info><title>Ranged Hash</title></info>
3169
3170             <para>In cases it is beneficial to allow the
3171             client to directly specify a ranged-hash hash function. It is
3172             true, that the writer of the ranged-hash function cannot rely
3173             on the values of m having specific numerical properties
3174             suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3175             the values of m are determined by a resize policy with
3176             possibly orthogonal considerations.</para>
3177
3178             <para>There are two cases where a ranged-hash function can be
3179             superior. The firs is when using perfect hashing: the
3180             second is when the values of m can be used to estimate
3181             the "general" number of distinct values required. This is
3182             described in the following.</para>
3183
3184             <para>Let</para>
3185
3186             <para>
3187               s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3188             </para>
3189
3190             <para>be a string of t characters, each of which is from
3191             domain S. Consider the following ranged-hash
3192             function:</para>
3193             <equation>
3194               <title>
3195                 A Standard String Hash Function
3196               </title>
3197               <mathphrase>
3198                 f<subscript>1</subscript>(s, m) = ∑ <subscript>i =
3199                 0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3200               </mathphrase>
3201             </equation>
3202
3203
3204             <para>where a is some non-negative integral value. This is
3205             the standard string-hashing function used in SGI's
3206             implementation (with a = 5). Its advantage is that
3207             it takes into account all of the characters of the string.</para>
3208
3209             <para>Now assume that s is the string representation of a
3210             of a long DNA sequence (and so S = {'A', 'C', 'G',
3211             'T'}). In this case, scanning the entire string might be
3212             prohibitively expensive. A possible alternative might be to use
3213             only the first k characters of the string, where</para>
3214
3215             <para>|S|<superscript>k</superscript> ≥ m ,</para>
3216
3217             <para>i.e., using the hash function</para>
3218
3219             <equation>
3220               <title>
3221                 Only k String DNA Hash
3222               </title>
3223               <mathphrase>
3224                 f<subscript>2</subscript>(s, m) = ∑ <subscript>i
3225                 = 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3226               </mathphrase>
3227             </equation>
3228
3229             <para>requiring scanning over only</para>
3230
3231             <para>k = log<subscript>4</subscript>( m )</para>
3232
3233             <para>characters.</para>
3234
3235             <para>Other more elaborate hash-functions might scan k
3236             characters starting at a random position (determined at each
3237             resize), or scanning k random positions (determined at
3238             each resize), i.e., using</para>
3239
3240             <para>f<subscript>3</subscript>(s, m) = ∑ <subscript>i =
3241             r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3242             a<superscript>i</superscript> mod m ,</para>
3243
3244             <para>or</para>
3245
3246             <para>f<subscript>4</subscript>(s, m) = ∑ <subscript>i = 0</subscript><superscript>k -
3247             1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3248             m ,</para>
3249
3250             <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3251             each in the (inclusive) range [0,...,t-1].</para>
3252
3253             <para>It should be noted that the above functions cannot be
3254             decomposed as per a ranged hash composed of hash and range hashing.</para>
3255
3256
3257           </section>
3258
3259           <section xml:id="details.hash_policies.implementation">
3260             <info><title>Implementation</title></info>
3261
3262             <para>This sub-subsection describes the implementation of
3263             the above in this library. It first explains range-hashing
3264             functions in collision-chaining tables, then ranged-hash
3265             functions in collision-chaining tables, then probing-based
3266             tables, and finally lists the relevant classes in this
3267             library.</para>
3268
3269             <section xml:id="hash_policies.implementation.collision-chaining">
3270               <info><title>
3271                 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3272               </title></info>
3273
3274
3275               <para><classname>cc_hash_table</classname> is
3276               parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3277               hash functor and a combining hash functor, respectively.</para>
3278
3279               <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3280               range-hashing functor. <classname>cc_hash_table</classname>
3281               synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3282               <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3283               diagram for this case. The user inserts an element (point A),
3284               the container transforms the key into a non-negative integral
3285               using the hash functor (points B and C), and transforms the
3286               result into a position using the combining functor (points D
3287               and E).</para>
3288
3289               <figure>
3290                 <title>Insert hash sequence diagram</title>
3291                 <mediaobject>
3292                   <imageobject>
3293                     <imagedata align="center" format="PNG" scale="100"
3294                                fileref="../images/pbds_hash_range_hashing_seq_diagram.png"/>
3295                   </imageobject>
3296                   <textobject>
3297                     <phrase>Insert hash sequence diagram</phrase>
3298                   </textobject>
3299                 </mediaobject>
3300               </figure>
3301
3302               <para>If <classname>cc_hash_table</classname>'s
3303               hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3304               a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3305               diagram. The user inserts an element (point A), the container
3306               transforms the key into a position using the combining functor
3307               (points B and C).</para>
3308
3309               <figure>
3310                 <title>Insert hash sequence diagram with a null policy</title>
3311                 <mediaobject>
3312                   <imageobject>
3313                     <imagedata align="center" format="PNG" scale="100"
3314                                fileref="../images/pbds_hash_range_hashing_seq_diagram2.png"/>
3315                   </imageobject>
3316                   <textobject>
3317                     <phrase>Insert hash sequence diagram with a null policy</phrase>
3318                   </textobject>
3319                 </mediaobject>
3320               </figure>
3321
3322             </section>
3323
3324             <section xml:id="hash_policies.implementation.probe">
3325               <info><title>
3326                 Probing tables
3327               </title></info>
3328               <para><classname>gp_hash_table</classname> is parametrized by
3329               <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3330               and <classname>Comb_Probe_Fn</classname>. As before, if
3331               <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3332               are both <classname>null_type</classname>, then
3333               <classname>Comb_Probe_Fn</classname> is a ranged-probe
3334               functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3335               functor, <classname>Probe_Fn</classname> is a functor for offsets
3336               from a hash value, and <classname>Comb_Probe_Fn</classname>
3337               transforms a probe sequence into a sequence of positions within
3338               the table.</para>
3339
3340             </section>
3341
3342             <section xml:id="hash_policies.implementation.predefined">
3343               <info><title>
3344                 Pre-Defined Policies
3345               </title></info>
3346
3347               <para>This library contains some pre-defined classes
3348               implementing range-hashing and probing functions:</para>
3349
3350               <orderedlist>
3351                 <listitem><para><classname>direct_mask_range_hashing</classname>
3352                 and <classname>direct_mod_range_hashing</classname>
3353                 are range-hashing functions based on a bit-mask and a modulo
3354                 operation, respectively.</para></listitem>
3355
3356                 <listitem><para><classname>linear_probe_fn</classname>, and
3357                 <classname>quadratic_probe_fn</classname> are
3358                 a linear probe and a quadratic probe function,
3359                 respectively.</para></listitem>
3360               </orderedlist>
3361
3362               <para>
3363                 The graphic below shows the relationships.
3364               </para>
3365               <figure>
3366                 <title>Hash policy class diagram</title>
3367                 <mediaobject>
3368                   <imageobject>
3369                     <imagedata align="center" format="PNG" scale="100"
3370                                fileref="../images/pbds_hash_policy_cd.png"/>
3371                   </imageobject>
3372                   <textobject>
3373                     <phrase>Hash policy class diagram</phrase>
3374                   </textobject>
3375                 </mediaobject>
3376               </figure>
3377
3378
3379             </section>
3380
3381           </section> <!-- impl -->
3382
3383         </section>
3384
3385         <section xml:id="container.hash.details.resize_policies">
3386           <info><title>Resize Policies</title></info>
3387
3388           <section xml:id="resize_policies.general">
3389             <info><title>General</title></info>
3390
3391             <para>Hash-tables, as opposed to trees, do not naturally grow or
3392             shrink. It is necessary to specify policies to determine how
3393             and when a hash table should change its size. Usually, resize
3394             policies can be decomposed into orthogonal policies:</para>
3395
3396             <orderedlist>
3397               <listitem><para>A size policy indicating how a hash table
3398               should grow (e.g., it should multiply by powers of
3399               2).</para></listitem>
3400
3401               <listitem><para>A trigger policy indicating when a hash
3402               table should grow (e.g., a load factor is
3403               exceeded).</para></listitem>
3404             </orderedlist>
3405
3406           </section>
3407
3408           <section xml:id="resize_policies.size">
3409             <info><title>Size Policies</title></info>
3410
3411
3412             <para>Size policies determine how a hash table changes size. These
3413             policies are simple, and there are relatively few sensible
3414             options. An exponential-size policy (with the initial size and
3415             growth factors both powers of 2) works well with a mask-based
3416             range-hashing function, and is the
3417             hard-wired policy used by Dinkumware. A
3418             prime-list based policy works well with a modulo-prime range
3419             hashing function and is the hard-wired policy used by SGI's
3420             implementation.</para>
3421
3422           </section>
3423
3424           <section xml:id="resize_policies.trigger">
3425             <info><title>Trigger Policies</title></info>
3426
3427             <para>Trigger policies determine when a hash table changes size.
3428             Following is a description of two policies: load-check
3429             policies, and collision-check policies.</para>
3430
3431             <para>Load-check policies are straightforward. The user specifies
3432             two factors, Α<subscript>min</subscript> and
3433             Α<subscript>max</subscript>, and the hash table maintains the
3434             invariant that</para>
3435
3436             <para>Α<subscript>min</subscript> ≤ (number of
3437             stored elements) / (hash-table size) ≤
3438             Α<subscript>max</subscript><remark>load factor min max</remark></para>
3439
3440             <para>Collision-check policies work in the opposite direction of
3441             load-check policies. They focus on keeping the number of
3442             collisions moderate and hoping that the size of the table will
3443             not grow very large, instead of keeping a moderate load-factor
3444             and hoping that the number of collisions will be small. A
3445             maximal collision-check policy resizes when the longest
3446             probe-sequence grows too large.</para>
3447
3448             <para>Consider the graphic below. Let the size of the hash table
3449             be denoted by m, the length of a probe sequence be denoted by k,
3450             and some load factor be denoted by Α. We would like to
3451             calculate the minimal length of k, such that if there were Α
3452             m elements in the hash table, a probe sequence of length k would
3453             be found with probability at most 1/m.</para>
3454
3455             <figure>
3456               <title>Balls and bins</title>
3457               <mediaobject>
3458                 <imageobject>
3459                   <imagedata align="center" format="PNG" scale="100"
3460                              fileref="../images/pbds_balls_and_bins.png"/>
3461                 </imageobject>
3462                 <textobject>
3463                   <phrase>Balls and bins</phrase>
3464                 </textobject>
3465               </mediaobject>
3466             </figure>
3467
3468             <para>Denote the probability that a probe sequence of length
3469             k appears in bin i by p<subscript>i</subscript>, the
3470             length of the probe sequence of bin i by
3471             l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3472
3473
3474
3475             <equation>
3476               <title>
3477                 Probability of Probe Sequence of Length k
3478               </title>
3479               <mathphrase>
3480                 p<subscript>1</subscript> =
3481               </mathphrase>
3482             </equation>
3483
3484             <para>P(l<subscript>1</subscript> ≥ k) =</para>
3485
3486             <para>
3487               P(l<subscript>1</subscript> ≥ α ( 1 + k / α - 1) ≤ (a)
3488             </para>
3489
3490             <para>
3491               e ^ ( - ( α ( k / α - 1 )<superscript>2</superscript> ) /2)
3492             </para>
3493
3494             <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3495             calculate the probability that some bin contains a probe
3496             sequence greater than k, we note that the
3497             l<subscript>i</subscript> are negatively-dependent
3498             (<xref linkend="biblio.dubhashi98neg"/>)
3499             . Let
3500             I(.) denote the indicator function. Then</para>
3501
3502             <equation>
3503               <title>
3504                 Probability Probe Sequence in Some Bin
3505               </title>
3506               <mathphrase>
3507                 P( exists<subscript>i</subscript> l<subscript>i</subscript> ≥ k ) =
3508               </mathphrase>
3509             </equation>
3510
3511             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript>
3512             I(l<subscript>i</subscript> ≥ k) ≥ 1 ) =</para>
3513
3514             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript> I (
3515             l<subscript>i</subscript> ≥ k ) ≥ m p<subscript>1</subscript> ( 1 + 1 / (m
3516             p<subscript>1</subscript>) - 1 ) ) ≤ (a)</para>
3517
3518             <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3519             - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3520
3521             <para>where (a) follows from the fact that the Chernoff bound can
3522             be applied to negatively-dependent variables (<xref
3523             linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3524             equation into the second one, and equating with 1/m, we
3525             obtain</para>
3526
3527
3528             <para>k ~ √ ( 2 α ln 2 m ln(m) )
3529             ) .</para>
3530
3531           </section>
3532
3533           <section xml:id="resize_policies.impl">
3534             <info><title>Implementation</title></info>
3535
3536             <para>This sub-subsection describes the implementation of the
3537             above in this library. It first describes resize policies and
3538             their decomposition into trigger and size policies, then
3539             describes pre-defined classes, and finally discusses controlled
3540             access the policies' internals.</para>
3541
3542             <section xml:id="resize_policies.impl.decomposition">
3543               <info><title>Decomposition</title></info>
3544
3545
3546               <para>Each hash-based container is parametrized by a
3547               <classname>Resize_Policy</classname> parameter; the container derives
3548               <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3549               example:</para>
3550               <programlisting>
3551                 cc_hash_table&lt;typename Key,
3552                 typename Mapped,
3553                 ...
3554                 typename Resize_Policy
3555                 ...&gt; : public Resize_Policy
3556               </programlisting>
3557
3558               <para>As a container object is modified, it continuously notifies
3559               its <classname>Resize_Policy</classname> base of internal changes
3560               (e.g., collisions encountered and elements being
3561               inserted). It queries its <classname>Resize_Policy</classname> base whether
3562               it needs to be resized, and if so, to what size.</para>
3563
3564               <para>The graphic below shows a (possible) sequence diagram
3565               of an insert operation. The user inserts an element; the hash
3566               table notifies its resize policy that a search has started
3567               (point A); in this case, a single collision is encountered -
3568               the table notifies its resize policy of this (point B); the
3569               container finally notifies its resize policy that the search
3570               has ended (point C); it then queries its resize policy whether
3571               a resize is needed, and if so, what is the new size (points D
3572               to G); following the resize, it notifies the policy that a
3573               resize has completed (point H); finally, the element is
3574               inserted, and the policy notified (point I).</para>
3575
3576               <figure>
3577                 <title>Insert resize sequence diagram</title>
3578                 <mediaobject>
3579                   <imageobject>
3580                     <imagedata align="center" format="PNG" scale="100"
3581                                fileref="../images/pbds_insert_resize_sequence_diagram1.png"/>
3582                   </imageobject>
3583                   <textobject>
3584                     <phrase>Insert resize sequence diagram</phrase>
3585                   </textobject>
3586                 </mediaobject>
3587               </figure>
3588
3589
3590               <para>In practice, a resize policy can be usually orthogonally
3591               decomposed to a size policy and a trigger policy. Consequently,
3592               the library contains a single class for instantiating a resize
3593               policy: <classname>hash_standard_resize_policy</classname>
3594               is parametrized by <classname>Size_Policy</classname> and
3595               <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3596               both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3597               to these policies.</para>
3598
3599               <para>The two graphics immediately below show sequence diagrams
3600               illustrating the interaction between the standard resize policy
3601               and its trigger and size policies, respectively.</para>
3602
3603               <figure>
3604                 <title>Standard resize policy trigger sequence
3605                 diagram</title>
3606                 <mediaobject>
3607                   <imageobject>
3608                     <imagedata align="center" format="PNG" scale="100"
3609                                fileref="../images/pbds_insert_resize_sequence_diagram2.png"/>
3610                   </imageobject>
3611                   <textobject>
3612                     <phrase>Standard resize policy trigger sequence
3613                     diagram</phrase>
3614                   </textobject>
3615                 </mediaobject>
3616               </figure>
3617
3618               <figure>
3619                 <title>Standard resize policy size sequence
3620                 diagram</title>
3621                 <mediaobject>
3622                   <imageobject>
3623                     <imagedata align="center" format="PNG" scale="100"
3624                                fileref="../images/pbds_insert_resize_sequence_diagram3.png"/>
3625                   </imageobject>
3626                   <textobject>
3627                     <phrase>Standard resize policy size sequence
3628                     diagram</phrase>
3629                   </textobject>
3630                 </mediaobject>
3631               </figure>
3632
3633
3634             </section>
3635
3636             <section xml:id="resize_policies.impl.predefined">
3637               <info><title>Predefined Policies</title></info>
3638               <para>The library includes the following
3639               instantiations of size and trigger policies:</para>
3640
3641               <orderedlist>
3642                 <listitem><para><classname>hash_load_check_resize_trigger</classname>
3643                 implements a load check trigger policy.</para></listitem>
3644
3645                 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3646                 implements a collision check trigger policy.</para></listitem>
3647
3648                 <listitem><para><classname>hash_exponential_size_policy</classname>
3649                 implements an exponential-size policy (which should be used
3650                 with mask range hashing).</para></listitem>
3651
3652                 <listitem><para><classname>hash_prime_size_policy</classname>
3653                 implementing a size policy based on a sequence of primes
3654                 (which should
3655                 be used with mod range hashing</para></listitem>
3656               </orderedlist>
3657
3658               <para>The graphic below gives an overall picture of the resize-related
3659               classes. <classname>basic_hash_table</classname>
3660               is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3661               publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>.
3662               <classname>hash_standard_resize_policy</classname>
3663               itself is parametrized by <classname>Trigger_Policy</classname> and
3664               <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3665               instantiated by <classname>hash_load_check_resize_trigger</classname>,
3666               or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3667               <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3668               or <classname>hash_prime_size_policy</classname>.</para>
3669
3670             </section>
3671
3672             <section xml:id="resize_policies.impl.internals">
3673               <info><title>Controling Access to Internals</title></info>
3674
3675               <para>There are cases where (controlled) access to resize
3676               policies' internals is beneficial. E.g., it is sometimes
3677               useful to query a hash-table for the table's actual size (as
3678               opposed to its <function>size()</function> - the number of values it
3679               currently holds); it is sometimes useful to set a table's
3680               initial size, externally resize it, or change load factors.</para>
3681
3682               <para>Clearly, supporting such methods both decreases the
3683               encapsulation of hash-based containers, and increases the
3684               diversity between different associative-containers' interfaces.
3685               Conversely, omitting such methods can decrease containers'
3686               flexibility.</para>
3687
3688               <para>In order to avoid, to the extent possible, the above
3689               conflict, the hash-based containers themselves do not address
3690               any of these questions; this is deferred to the resize policies,
3691               which are easier to change or replace. Thus, for example,
3692               neither <classname>cc_hash_table</classname> nor
3693               <classname>gp_hash_table</classname>
3694               contain methods for querying the actual size of the table; this
3695               is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3696
3697               <para>Furthermore, the policies themselves are parametrized by
3698               template arguments that determine the methods they support
3699               (
3700               <xref linkend="biblio.alexandrescu01modern"/>
3701               shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3702               is parametrized by <classname>External_Size_Access</classname> that
3703               determines whether it supports methods for querying the actual
3704               size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3705               is parametrized by <classname>External_Load_Access</classname> that
3706               determines whether it supports methods for querying or
3707               modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3708               is parametrized by <classname>External_Load_Access</classname> that
3709               determines whether it supports methods for querying the
3710               load.</para>
3711
3712               <para>Some operations, for example, resizing a container at
3713               run time, or changing the load factors of a load-check trigger
3714               policy, require the container itself to resize. As mentioned
3715               above, the hash-based containers themselves do not contain
3716               these types of methods, only their resize policies.
3717               Consequently, there must be some mechanism for a resize policy
3718               to manipulate the hash-based container. As the hash-based
3719               container is a subclass of the resize policy, this is done
3720               through virtual methods. Each hash-based container has a
3721               <classname>private</classname> <classname>virtual</classname> method:</para>
3722               <programlisting>
3723                 virtual void
3724                 do_resize
3725                 (size_type new_size);
3726               </programlisting>
3727
3728               <para>which resizes the container. Implementations of
3729               <classname>Resize_Policy</classname> can export public methods for resizing
3730               the container externally; these methods internally call
3731               <classname>do_resize</classname> to resize the table.</para>
3732
3733
3734             </section>
3735
3736           </section>
3737
3738
3739         </section> <!-- resize policies -->
3740
3741         <section xml:id="container.hash.details.policy_interaction">
3742           <info><title>Policy Interactions</title></info>
3743           <para>
3744           </para>
3745           <para>Hash-tables are unfortunately especially susceptible to
3746           choice of policies. One of the more complicated aspects of this
3747           is that poor combinations of good policies can form a poor
3748           container. Following are some considerations.</para>
3749
3750           <section xml:id="policy_interaction.probesizetrigger">
3751             <info><title>probe/size/trigger</title></info>
3752
3753             <para>Some combinations do not work well for probing containers.
3754             For example, combining a quadratic probe policy with an
3755             exponential size policy can yield a poor container: when an
3756             element is inserted, a trigger policy might decide that there
3757             is no need to resize, as the table still contains unused
3758             entries; the probe sequence, however, might never reach any of
3759             the unused entries.</para>
3760
3761             <para>Unfortunately, this library cannot detect such problems at
3762             compilation (they are halting reducible). It therefore defines
3763             an exception class <classname>insert_error</classname> to throw an
3764             exception in this case.</para>
3765
3766           </section>
3767
3768           <section xml:id="policy_interaction.hashtrigger">
3769             <info><title>hash/trigger</title></info>
3770
3771             <para>Some trigger policies are especially susceptible to poor
3772             hash functions. Suppose, as an extreme case, that the hash
3773             function transforms each key to the same hash value. After some
3774             inserts, a collision detecting policy will always indicate that
3775             the container needs to grow.</para>
3776
3777             <para>The library, therefore, by design, limits each operation to
3778             one resize. For each <classname>insert</classname>, for example, it queries
3779             only once whether a resize is needed.</para>
3780
3781           </section>
3782
3783           <section xml:id="policy_interaction.eqstorehash">
3784             <info><title>equivalence functors/storing hash values/hash</title></info>
3785
3786             <para><classname>cc_hash_table</classname> and
3787             <classname>gp_hash_table</classname> are
3788             parametrized by an equivalence functor and by a
3789             <classname>Store_Hash</classname> parameter. If the latter parameter is
3790             <classname>true</classname>, then the container stores with each entry
3791             a hash value, and uses this value in case of collisions to
3792             determine whether to apply a hash value. This can lower the
3793             cost of collision for some types, but increase the cost of
3794             collisions for other types.</para>
3795
3796             <para>If a ranged-hash function or ranged probe function is
3797             directly supplied, however, then it makes no sense to store the
3798             hash value with each entry. This library's container will
3799             fail at compilation, by design, if this is attempted.</para>
3800
3801           </section>
3802
3803           <section xml:id="policy_interaction.sizeloadtrigger">
3804             <info><title>size/load-check trigger</title></info>
3805
3806             <para>Assume a size policy issues an increasing sequence of sizes
3807             a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3808             example, an exponential size policy might issue the sequence of
3809             sizes 8, 16, 32, 64, ...</para>
3810
3811             <para>If a load-check trigger policy is used, with loads
3812             α<subscript>min</subscript> and α<subscript>max</subscript>,
3813             respectively, then it is a good idea to have:</para>
3814
3815             <orderedlist>
3816               <listitem><para>α<subscript>max</subscript> ~ 1 / q</para></listitem>
3817
3818               <listitem><para>α<subscript>min</subscript> &lt; 1 / (2 q)</para></listitem>
3819             </orderedlist>
3820
3821             <para>This will ensure that the amortized hash cost of each
3822             modifying operation is at most approximately 3.</para>
3823
3824             <para>α<subscript>min</subscript> ~ α<subscript>max</subscript> is, in
3825             any case, a bad choice, and α<subscript>min</subscript> &gt;
3826             α <subscript>max</subscript> is horrendous.</para>
3827
3828           </section>
3829
3830         </section>
3831
3832       </section> <!-- details -->
3833
3834     </section> <!-- hash -->
3835
3836     <!-- tree -->
3837     <section xml:id="pbds.design.container.tree">
3838       <info><title>tree</title></info>
3839
3840       <section xml:id="container.tree.interface">
3841         <info><title>Interface</title></info>
3842
3843         <para>The tree-based container has the following declaration:</para>
3844         <programlisting>
3845           template&lt;
3846           typename Key,
3847           typename Mapped,
3848           typename Cmp_Fn = std::less&lt;Key&gt;,
3849           typename Tag = rb_tree_tag,
3850           template&lt;
3851           typename Const_Node_Iterator,
3852           typename Node_Iterator,
3853           typename Cmp_Fn_,
3854           typename Allocator_&gt;
3855           class Node_Update = null_node_update,
3856           typename Allocator = std::allocator&lt;char&gt; &gt;
3857           class tree;
3858         </programlisting>
3859
3860         <para>The parameters have the following meaning:</para>
3861
3862         <orderedlist>
3863           <listitem>
3864           <para><classname>Key</classname> is the key type.</para></listitem>
3865
3866           <listitem>
3867           <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3868
3869           <listitem>
3870           <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3871
3872           <listitem>
3873             <para><classname>Tag</classname> specifies which underlying data structure
3874           to use.</para></listitem>
3875
3876           <listitem>
3877             <para><classname>Node_Update</classname> is a policy for updating node
3878           invariants.</para></listitem>
3879
3880           <listitem>
3881             <para><classname>Allocator</classname> is an allocator
3882           type.</para></listitem>
3883         </orderedlist>
3884
3885         <para>The <classname>Tag</classname> parameter specifies which underlying
3886         data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3887         <classname>ov_tree_tag</classname>,
3888         specifies an underlying red-black tree, splay tree, or
3889         ordered-vector tree, respectively; any other tag is illegal.
3890         Note that containers based on the former two contain more types
3891         and methods than the latter (e.g.,
3892         <classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3893         exception and invalidation guarantees.</para>
3894
3895       </section>
3896
3897       <section xml:id="container.tree.details">
3898         <info><title>Details</title></info>
3899
3900         <section xml:id="container.tree.node">
3901           <info><title>Node Invariants</title></info>
3902
3903
3904           <para>Consider the two trees in the graphic below, labels A and B. The first
3905           is a tree of floats; the second is a tree of pairs, each
3906           signifying a geometric line interval. Each element in a tree is refered to as a node of the tree. Of course, each of
3907           these trees can support the usual queries: the first can easily
3908           search for <classname>0.4</classname>; the second can easily search for
3909           <classname>std::make_pair(10, 41)</classname>.</para>
3910
3911           <para>Each of these trees can efficiently support other queries.
3912           The first can efficiently determine that the 2rd key in the
3913           tree is <constant>0.3</constant>; the second can efficiently determine
3914           whether any of its intervals overlaps
3915           <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3916           applications or distributed file systems with leases, for
3917           example).  It should be noted that an <classname>std::set</classname> can
3918           only solve these types of problems with linear complexity.</para>
3919
3920           <para>In order to do so, each tree stores some metadata in
3921           each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3922           each node the size of the sub-tree rooted at the node; the
3923           second stores at each node the maximal endpoint of the
3924           intervals at the sub-tree rooted at the node.</para>
3925
3926           <figure>
3927             <title>Tree node invariants</title>
3928             <mediaobject>
3929               <imageobject>
3930                 <imagedata align="center" format="PNG" scale="100"
3931                            fileref="../images/pbds_tree_node_invariants.png"/>
3932               </imageobject>
3933               <textobject>
3934                 <phrase>Tree node invariants</phrase>
3935               </textobject>
3936             </mediaobject>
3937           </figure>
3938
3939           <para>Supporting such trees is difficult for a number of
3940           reasons:</para>
3941
3942           <orderedlist>
3943             <listitem><para>There must be a way to specify what a node's metadata
3944             should be (if any).</para></listitem>
3945
3946             <listitem><para>Various operations can invalidate node
3947             invariants.  The graphic below shows how a right rotation,
3948             performed on A, results in B, with nodes x and y having
3949             corrupted invariants (the grayed nodes in C). The graphic shows
3950             how an insert, performed on D, results in E, with nodes x and y
3951             having corrupted invariants (the grayed nodes in F). It is not
3952             feasible to know outside the tree the effect of an operation on
3953             the nodes of the tree.</para></listitem>
3954
3955             <listitem><para>The search paths of standard associative containers are
3956             defined by comparisons between keys, and not through
3957             metadata.</para></listitem>
3958
3959             <listitem><para>It is not feasible to know in advance which methods trees
3960             can support. Besides the usual <classname>find</classname> method, the
3961             first tree can support a <classname>find_by_order</classname> method, while
3962             the second can support an <classname>overlaps</classname> method.</para></listitem>
3963           </orderedlist>
3964
3965           <figure>
3966             <title>Tree node invalidation</title>
3967             <mediaobject>
3968               <imageobject>
3969                 <imagedata align="center" format="PNG" scale="100"
3970                            fileref="../images/pbds_tree_node_invalidations.png"/>
3971               </imageobject>
3972               <textobject>
3973                 <phrase>Tree node invalidation</phrase>
3974               </textobject>
3975             </mediaobject>
3976           </figure>
3977
3978           <para>These problems are solved by a combination of two means:
3979           node iterators, and template-template node updater
3980           parameters.</para>
3981
3982           <section xml:id="container.tree.node.iterators">
3983             <info><title>Node Iterators</title></info>
3984
3985
3986             <para>Each tree-based container defines two additional iterator
3987             types, <classname>const_node_iterator</classname>
3988             and <classname>node_iterator</classname>.
3989             These iterators allow descending from a node to one of its
3990             children. Node iterator allow search paths different than those
3991             determined by the comparison functor. The <classname>tree</classname>
3992             supports the methods:</para>
3993             <programlisting>
3994               const_node_iterator
3995               node_begin() const;
3996
3997               node_iterator
3998               node_begin();
3999
4000               const_node_iterator
4001               node_end() const;
4002
4003               node_iterator
4004               node_end();
4005             </programlisting>
4006
4007             <para>The first pairs return node iterators corresponding to the
4008             root node of the tree; the latter pair returns node iterators
4009             corresponding to a just-after-leaf node.</para>
4010           </section>
4011
4012           <section xml:id="container.tree.node.updator">
4013             <info><title>Node Updator</title></info>
4014
4015             <para>The tree-based containers are parametrized by a
4016             <classname>Node_Update</classname> template-template parameter. A
4017             tree-based container instantiates
4018             <classname>Node_Update</classname> to some
4019             <classname>node_update</classname> class, and publicly subclasses
4020             <classname>node_update</classname>. The graphic below shows this
4021             scheme, as well as some predefined policies (which are explained
4022             below).</para>
4023
4024             <figure>
4025               <title>A tree and its update policy</title>
4026               <mediaobject>
4027                 <imageobject>
4028                   <imagedata align="center" format="PNG" scale="100"
4029                              fileref="../images/pbds_tree_node_updator_policy_cd.png"/>
4030                 </imageobject>
4031                 <textobject>
4032                   <phrase>A tree and its update policy</phrase>
4033                 </textobject>
4034               </mediaobject>
4035             </figure>
4036
4037             <para><classname>node_update</classname> (an instantiation of
4038             <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4039             the type of metadata it requires. For order statistics,
4040             e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4041             The tree defines within each node a <classname>metadata_type</classname>
4042             object.</para>
4043
4044             <para><classname>node_update</classname> must also define the following method
4045             for restoring node invariants:</para>
4046             <programlisting>
4047               void
4048               operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4049             </programlisting>
4050
4051             <para>In this method, <varname>nd_it</varname> is a
4052             <classname>node_iterator</classname> corresponding to a node whose
4053             A) all descendants have valid invariants, and B) its own
4054             invariants might be violated; <classname>end_nd_it</classname> is
4055             a <classname>const_node_iterator</classname> corresponding to a
4056             just-after-leaf node. This method should correct the node
4057             invariants of the node pointed to by
4058             <classname>nd_it</classname>. For example, say node x in the
4059             graphic below label A has an invalid invariant, but its' children,
4060             y and z have valid invariants. After the invocation, all three
4061             nodes should have valid invariants, as in label B.</para>
4062
4063
4064             <figure>
4065               <title>Restoring node invariants</title>
4066               <mediaobject>
4067                 <imageobject>
4068                   <imagedata align="center" format="PNG" scale="100"
4069                              fileref="../images/pbds_restoring_node_invariants.png"/>
4070                 </imageobject>
4071                 <textobject>
4072                   <phrase>Restoring node invariants</phrase>
4073                 </textobject>
4074               </mediaobject>
4075             </figure>
4076
4077             <para>When a tree operation might invalidate some node invariant,
4078             it invokes this method in its <classname>node_update</classname> base to
4079             restore the invariant. For example, the graphic below shows
4080             an <function>insert</function> operation (point A); the tree performs some
4081             operations, and calls the update functor three times (points B,
4082             C, and D). (It is well known that any <function>insert</function>,
4083             <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4084             all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4085             .</para>
4086
4087             <figure>
4088               <title>Insert update sequence</title>
4089               <mediaobject>
4090                 <imageobject>
4091                   <imagedata align="center" format="PNG" scale="100"
4092                              fileref="../images/pbds_update_seq_diagram.png"/>
4093                 </imageobject>
4094                 <textobject>
4095                   <phrase>Insert update sequence</phrase>
4096                 </textobject>
4097               </mediaobject>
4098             </figure>
4099
4100             <para>To complete the description of the scheme, three questions
4101             need to be answered:</para>
4102
4103             <orderedlist>
4104               <listitem><para>How can a tree which supports order statistics define a
4105               method such as <classname>find_by_order</classname>?</para></listitem>
4106
4107               <listitem><para>How can the node updater base access methods of the
4108               tree?</para></listitem>
4109
4110               <listitem><para>How can the following cyclic dependency be resolved?
4111               <classname>node_update</classname> is a base class of the tree, yet it
4112               uses node iterators defined in the tree (its child).</para></listitem>
4113             </orderedlist>
4114
4115             <para>The first two questions are answered by the fact that
4116             <classname>node_update</classname> (an instantiation of
4117             <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4118             of the tree. Consequently:</para>
4119
4120             <orderedlist>
4121               <listitem><para>Any public methods of
4122               <classname>node_update</classname> are automatically methods of
4123               the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4124               Thus an order-statistics node updater,
4125               <classname>tree_order_statistics_node_update</classname> defines
4126               the <function>find_by_order</function> method; any tree
4127               instantiated by this policy consequently supports this method as
4128               well.</para></listitem>
4129
4130               <listitem><para>In C++, if a base class declares a method as
4131               <literal>virtual</literal>, it is
4132               <literal>virtual</literal> in its subclasses. If
4133               <classname>node_update</classname> needs to access one of the
4134               tree's methods, say the member function
4135               <function>end</function>, it simply declares that method as
4136               <literal>virtual</literal> abstract.</para></listitem>
4137             </orderedlist>
4138
4139             <para>The cyclic dependency is solved through template-template
4140             parameters. <classname>Node_Update</classname> is parametrized by
4141             the tree's node iterators, its comparison functor, and its
4142             allocator type. Thus, instantiations of
4143             <classname>Node_Update</classname> have all information
4144             required.</para>
4145
4146             <para>This library assumes that constructing a metadata object and
4147             modifying it are exception free. Suppose that during some method,
4148             say <classname>insert</classname>, a metadata-related operation
4149             (e.g., changing the value of a metadata) throws an exception. Ack!
4150             Rolling back the method is unusually complex.</para>
4151
4152             <para>Previously, a distinction was made between redundant
4153             policies and null policies. Node invariants show a
4154             case where null policies are required.</para>
4155
4156             <para>Assume a regular tree is required, one which need not
4157             support order statistics or interval overlap queries.
4158             Seemingly, in this case a redundant policy - a policy which
4159             doesn't affect nodes' contents would suffice. This, would lead
4160             to the following drawbacks:</para>
4161
4162             <orderedlist>
4163               <listitem><para>Each node would carry a useless metadata object, wasting
4164               space.</para></listitem>
4165
4166               <listitem><para>The tree cannot know if its
4167               <classname>Node_Update</classname> policy actually modifies a
4168               node's metadata (this is halting reducible). In the graphic
4169               below, assume the shaded node is inserted. The tree would have
4170               to traverse the useless path shown to the root, applying
4171               redundant updates all the way.</para></listitem>
4172             </orderedlist>
4173             <figure>
4174               <title>Useless update path</title>
4175               <mediaobject>
4176                 <imageobject>
4177                   <imagedata align="center" format="PNG" scale="100"
4178                              fileref="../images/pbds_rationale_null_node_updator.png"/>
4179                 </imageobject>
4180                 <textobject>
4181                   <phrase>Useless update path</phrase>
4182                 </textobject>
4183               </mediaobject>
4184             </figure>
4185
4186
4187             <para>A null policy class, <classname>null_node_update</classname>
4188             solves both these problems. The tree detects that node
4189             invariants are irrelevant, and defines all accordingly.</para>
4190
4191           </section>
4192
4193         </section>
4194
4195         <section xml:id="container.tree.details.split">
4196           <info><title>Split and Join</title></info>
4197
4198           <para>Tree-based containers support split and join methods.
4199           It is possible to split a tree so that it passes
4200           all nodes with keys larger than a given key to a different
4201           tree. These methods have the following advantages over the
4202           alternative of externally inserting to the destination
4203           tree and erasing from the source tree:</para>
4204
4205           <orderedlist>
4206             <listitem><para>These methods are efficient - red-black trees are split
4207             and joined in poly-logarithmic complexity; ordered-vector
4208             trees are split and joined at linear complexity. The
4209             alternatives have super-linear complexity.</para></listitem>
4210
4211             <listitem><para>Aside from orders of growth, these operations perform
4212             few allocations and de-allocations. For red-black trees, allocations are not performed,
4213             and the methods are exception-free. </para></listitem>
4214           </orderedlist>
4215         </section>
4216
4217       </section> <!-- details -->
4218
4219     </section> <!-- tree -->
4220
4221     <!-- trie -->
4222     <section xml:id="pbds.design.container.trie">
4223       <info><title>Trie</title></info>
4224
4225       <section xml:id="container.trie.interface">
4226         <info><title>Interface</title></info>
4227
4228         <para>The trie-based container has the following declaration:</para>
4229         <programlisting>
4230           template&lt;typename Key,
4231           typename Mapped,
4232           typename Cmp_Fn = std::less&lt;Key&gt;,
4233           typename Tag = pat_trie_tag,
4234           template&lt;typename Const_Node_Iterator,
4235           typename Node_Iterator,
4236           typename E_Access_Traits_,
4237           typename Allocator_&gt;
4238           class Node_Update = null_node_update,
4239           typename Allocator = std::allocator&lt;char&gt; &gt;
4240           class trie;
4241         </programlisting>
4242
4243         <para>The parameters have the following meaning:</para>
4244
4245         <orderedlist>
4246           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4247
4248           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4249
4250           <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4251
4252           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4253           to use, and is described shortly.</para></listitem>
4254
4255           <listitem><para><classname>Node_Update</classname> is a policy for updating node
4256           invariants. This is described below.</para></listitem>
4257
4258           <listitem><para><classname>Allocator</classname> is an allocator
4259           type.</para></listitem>
4260         </orderedlist>
4261
4262         <para>The <classname>Tag</classname> parameter specifies which underlying
4263         data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4264         underlying PATRICIA trie (explained shortly); any other tag is
4265         currently illegal.</para>
4266
4267         <para>Following is a description of a (PATRICIA) trie
4268         (this implementation follows <xref linkend="biblio.okasaki98mereable"/> and
4269         <xref linkend="biblio.filliatre2000ptset"/>).
4270         </para>
4271
4272         <para>A (PATRICIA) trie is similar to a tree, but with the
4273         following differences:</para>
4274
4275         <orderedlist>
4276           <listitem><para>It explicitly views keys as a sequence of elements.
4277           E.g., a trie can view a string as a sequence of
4278           characters; a trie can view a number as a sequence of
4279           bits.</para></listitem>
4280
4281           <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4282           + 1, where n is the number of distinct
4283           elements.</para></listitem>
4284
4285           <listitem><para>It stores values only at leaf nodes.</para></listitem>
4286
4287           <listitem><para>Internal nodes have the properties that A) each has at
4288           least two children, and B) each shares the same prefix with
4289           any of its descendant.</para></listitem>
4290         </orderedlist>
4291
4292         <para>A (PATRICIA) trie has some useful properties:</para>
4293
4294         <orderedlist>
4295           <listitem><para>It can be configured to use large node fan-out, giving it
4296           very efficient find performance (albeit at insertion
4297           complexity and size).</para></listitem>
4298
4299           <listitem><para>It works well for common-prefix keys.</para></listitem>
4300
4301           <listitem><para>It can support efficiently queries such as which
4302           keys match a certain prefix. This is sometimes useful in file
4303           systems and routers, and for "type-ahead" aka predictive text matching
4304           on mobile devices.</para></listitem>
4305         </orderedlist>
4306
4307
4308       </section>
4309
4310       <section xml:id="container.trie.details">
4311         <info><title>Details</title></info>
4312
4313         <section xml:id="container.trie.details.etraits">
4314           <info><title>Element Access Traits</title></info>
4315
4316           <para>A trie inherently views its keys as sequences of elements.
4317           For example, a trie can view a string as a sequence of
4318           characters. A trie needs to map each of n elements to a
4319           number in {0, n - 1}. For example, a trie can map a
4320           character <varname>c</varname> to
4321           <programlisting>static_cast&lt;size_t&gt;(c)</programlisting>.</para>
4322
4323           <para>Seemingly, then, a trie can assume that its keys support
4324           (const) iterators, and that the <classname>value_type</classname> of this
4325           iterator can be cast to a <classname>size_t</classname>. There are several
4326           reasons, though, to decouple the mechanism by which the trie
4327           accesses its keys' elements from the trie:</para>
4328
4329           <orderedlist>
4330             <listitem><para>In some cases, the numerical value of an element is
4331             inappropriate. Consider a trie storing DNA strings. It is
4332             logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4333             'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4334
4335             <listitem><para>In some cases the keys' iterators are different than what
4336             is needed. For example, a trie can be used to search for
4337             common suffixes, by using strings'
4338             <classname>reverse_iterator</classname>. As another example, a trie mapping
4339             UNICODE strings would have a huge fan-out if each node would
4340             branch on a UNICODE character; instead, one can define an
4341             iterator iterating over 8-bit (or less) groups.</para></listitem>
4342           </orderedlist>
4343
4344           <para>trie is,
4345           consequently, parametrized by <classname>E_Access_Traits</classname> -
4346           traits which instruct how to access sequences' elements.
4347           <classname>string_trie_e_access_traits</classname>
4348           is a traits class for strings. Each such traits define some
4349           types, like:</para>
4350           <programlisting>
4351             typename E_Access_Traits::const_iterator
4352           </programlisting>
4353
4354           <para>is a const iterator iterating over a key's elements. The
4355           traits class must also define methods for obtaining an iterator
4356           to the first and last element of a key.</para>
4357
4358           <para>The graphic below shows a
4359           (PATRICIA) trie resulting from inserting the words: "I wish
4360           that I could ever see a poem lovely as a trie" (which,
4361           unfortunately, does not rhyme).</para>
4362
4363           <para>The leaf nodes contain values; each internal node contains
4364           two <classname>typename E_Access_Traits::const_iterator</classname>
4365           objects, indicating the maximal common prefix of all keys in
4366           the sub-tree. For example, the shaded internal node roots a
4367           sub-tree with leafs "a" and "as". The maximal common prefix is
4368           "a". The internal node contains, consequently, to const
4369           iterators, one pointing to <varname>'a'</varname>, and the other to
4370           <varname>'s'</varname>.</para>
4371
4372           <figure>
4373             <title>A PATRICIA trie</title>
4374             <mediaobject>
4375               <imageobject>
4376                 <imagedata align="center" format="PNG" scale="100"
4377                            fileref="../images/pbds_pat_trie.png"/>
4378               </imageobject>
4379               <textobject>
4380                 <phrase>A PATRICIA trie</phrase>
4381               </textobject>
4382             </mediaobject>
4383           </figure>
4384
4385         </section>
4386
4387         <section xml:id="container.trie.details.node">
4388           <info><title>Node Invariants</title></info>
4389
4390           <para>Trie-based containers support node invariants, as do
4391           tree-based containers. There are two minor
4392           differences, though, which, unfortunately, thwart sharing them
4393           sharing the same node-updating policies:</para>
4394
4395           <orderedlist>
4396             <listitem>
4397               <para>A trie's <classname>Node_Update</classname> template-template
4398               parameter is parametrized by <classname>E_Access_Traits</classname>, while
4399               a tree's <classname>Node_Update</classname> template-template parameter is
4400             parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4401
4402             <listitem><para>Tree-based containers store values in all nodes, while
4403             trie-based containers (at least in this implementation) store
4404             values in leafs.</para></listitem>
4405           </orderedlist>
4406
4407           <para>The graphic below shows the scheme, as well as some predefined
4408           policies (which are explained below).</para>
4409
4410           <figure>
4411             <title>A trie and its update policy</title>
4412             <mediaobject>
4413               <imageobject>
4414                 <imagedata align="center" format="PNG" scale="100"
4415                            fileref="../images/pbds_trie_node_updator_policy_cd.png"/>
4416               </imageobject>
4417               <textobject>
4418                 <phrase>A trie and its update policy</phrase>
4419               </textobject>
4420             </mediaobject>
4421           </figure>
4422
4423
4424           <para>This library offers the following pre-defined trie node
4425           updating policies:</para>
4426
4427           <orderedlist>
4428             <listitem>
4429               <para>
4430                 <classname>trie_order_statistics_node_update</classname>
4431                 supports order statistics.
4432               </para>
4433             </listitem>
4434
4435             <listitem><para><classname>trie_prefix_search_node_update</classname>
4436             supports searching for ranges that match a given prefix.</para></listitem>
4437
4438             <listitem><para><classname>null_node_update</classname>
4439             is the null node updater.</para></listitem>
4440           </orderedlist>
4441
4442         </section>
4443
4444         <section xml:id="container.trie.details.split">
4445           <info><title>Split and Join</title></info>
4446           <para>Trie-based containers support split and join methods; the
4447           rationale is equal to that of tree-based containers supporting
4448           these methods.</para>
4449         </section>
4450
4451       </section> <!-- details -->
4452
4453     </section> <!-- trie -->
4454
4455     <!-- list_update -->
4456     <section xml:id="pbds.design.container.list">
4457       <info><title>List</title></info>
4458
4459       <section xml:id="container.list.interface">
4460         <info><title>Interface</title></info>
4461
4462         <para>The list-based container has the following declaration:</para>
4463         <programlisting>
4464           template&lt;typename Key,
4465           typename Mapped,
4466           typename Eq_Fn = std::equal_to&lt;Key&gt;,
4467           typename Update_Policy = move_to_front_lu_policy&lt;&gt;,
4468           typename Allocator = std::allocator&lt;char&gt; &gt;
4469           class list_update;
4470         </programlisting>
4471
4472         <para>The parameters have the following meaning:</para>
4473
4474         <orderedlist>
4475           <listitem>
4476             <para>
4477               <classname>Key</classname> is the key type.
4478             </para>
4479           </listitem>
4480
4481           <listitem>
4482             <para>
4483               <classname>Mapped</classname> is the mapped-policy.
4484             </para>
4485           </listitem>
4486
4487           <listitem>
4488             <para>
4489               <classname>Eq_Fn</classname> is a key equivalence functor.
4490             </para>
4491           </listitem>
4492
4493           <listitem>
4494             <para>
4495               <classname>Update_Policy</classname> is a policy updating positions in
4496               the list based on access patterns. It is described in the
4497               following subsection.
4498             </para>
4499           </listitem>
4500
4501           <listitem>
4502             <para>
4503               <classname>Allocator</classname> is an allocator type.
4504             </para>
4505           </listitem>
4506         </orderedlist>
4507
4508         <para>A list-based associative container is a container that
4509         stores elements in a linked-list. It does not order the elements
4510         by any particular order related to the keys.  List-based
4511         containers are primarily useful for creating "multimaps". In fact,
4512         list-based containers are designed in this library expressly for
4513         this purpose.</para>
4514
4515         <para>List-based containers might also be useful for some rare
4516         cases, where a key is encapsulated to the extent that only
4517         key-equivalence can be tested. Hash-based containers need to know
4518         how to transform a key into a size type, and tree-based containers
4519         need to know if some key is larger than another.  List-based
4520         associative containers, conversely, only need to know if two keys
4521         are equivalent.</para>
4522
4523         <para>Since a list-based associative container does not order
4524         elements by keys, is it possible to order the list in some
4525         useful manner? Remarkably, many on-line competitive
4526         algorithms exist for reordering lists to reflect access
4527         prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4528         </para>
4529
4530       </section>
4531
4532       <section xml:id="container.list.details">
4533         <info><title>Details</title></info>
4534         <para>
4535         </para>
4536         <section xml:id="container.list.details.ds">
4537           <info><title>Underlying Data Structure</title></info>
4538
4539           <para>The graphic below shows a
4540           simple list of integer keys. If we search for the integer 6, we
4541           are paying an overhead: the link with key 6 is only the fifth
4542           link; if it were the first link, it could be accessed
4543           faster.</para>
4544
4545           <figure>
4546             <title>A simple list</title>
4547             <mediaobject>
4548               <imageobject>
4549                 <imagedata align="center" format="PNG" scale="100"
4550                            fileref="../images/pbds_simple_list.png"/>
4551               </imageobject>
4552               <textobject>
4553                 <phrase>A simple list</phrase>
4554               </textobject>
4555             </mediaobject>
4556           </figure>
4557
4558           <para>List-update algorithms reorder lists as elements are
4559           accessed. They try to determine, by the access history, which
4560           keys to move to the front of the list. Some of these algorithms
4561           require adding some metadata alongside each entry.</para>
4562
4563           <para>For example, in the graphic below label A shows the counter
4564           algorithm. Each node contains both a key and a count metadata
4565           (shown in bold). When an element is accessed (e.g. 6) its count is
4566           incremented, as shown in label B. If the count reaches some
4567           predetermined value, say 10, as shown in label C, the count is set
4568           to 0 and the node is moved to the front of the list, as in label
4569           D.
4570           </para>
4571
4572           <figure>
4573             <title>The counter algorithm</title>
4574             <mediaobject>
4575               <imageobject>
4576                 <imagedata align="center" format="PNG" scale="100"
4577                            fileref="../images/pbds_list_update.png"/>
4578               </imageobject>
4579               <textobject>
4580                 <phrase>The counter algorithm</phrase>
4581               </textobject>
4582             </mediaobject>
4583           </figure>
4584
4585
4586         </section>
4587
4588         <section xml:id="container.list.details.policies">
4589           <info><title>Policies</title></info>
4590
4591           <para>this library allows instantiating lists with policies
4592           implementing any algorithm moving nodes to the front of the
4593           list (policies implementing algorithms interchanging nodes are
4594           unsupported).</para>
4595
4596           <para>Associative containers based on lists are parametrized by a
4597           <classname>Update_Policy</classname> parameter. This parameter defines the
4598           type of metadata each node contains, how to create the
4599           metadata, and how to decide, using this metadata, whether to
4600           move a node to the front of the list. A list-based associative
4601           container object derives (publicly) from its update policy.
4602           </para>
4603
4604           <para>An instantiation of <classname>Update_Policy</classname> must define
4605           internally <classname>update_metadata</classname> as the metadata it
4606           requires. Internally, each node of the list contains, besides
4607           the usual key and data, an instance of <classname>typename
4608           Update_Policy::update_metadata</classname>.</para>
4609
4610           <para>An instantiation of <classname>Update_Policy</classname> must define
4611           internally two operators:</para>
4612           <programlisting>
4613             update_metadata
4614             operator()();
4615
4616             bool
4617             operator()(update_metadata &amp;);
4618           </programlisting>
4619
4620           <para>The first is called by the container object, when creating a
4621           new node, to create the node's metadata. The second is called
4622           by the container object, when a node is accessed (
4623           when a find operation's key is equivalent to the key of the
4624           node), to determine whether to move the node to the front of
4625           the list.
4626           </para>
4627
4628           <para>The library contains two predefined implementations of
4629           list-update policies. The first
4630           is <classname>lu_counter_policy</classname>, which implements the
4631           counter algorithm described above. The second is
4632           <classname>lu_move_to_front_policy</classname>,
4633           which unconditionally move an accessed element to the front of
4634           the list. The latter type is very useful in this library,
4635           since there is no need to associate metadata with each element.
4636           (See <xref linkend="biblio.andrew04mtf"/>
4637           </para>
4638
4639         </section>
4640
4641         <section xml:id="container.list.details.mapped">
4642           <info><title>Use in Multimaps</title></info>
4643
4644           <para>In this library, there are no equivalents for the standard's
4645           multimaps and multisets; instead one uses an associative
4646           container mapping primary keys to secondary keys.</para>
4647
4648           <para>List-based containers are especially useful as associative
4649           containers for secondary keys. In fact, they are implemented
4650           here expressly for this purpose.</para>
4651
4652           <para>To begin with, these containers use very little per-entry
4653           structure memory overhead, since they can be implemented as
4654           singly-linked lists. (Arrays use even lower per-entry memory
4655           overhead, but they are less flexible in moving around entries,
4656           and have weaker invalidation guarantees).</para>
4657
4658           <para>More importantly, though, list-based containers use very
4659           little per-container memory overhead. The memory overhead of an
4660           empty list-based container is practically that of a pointer.
4661           This is important for when they are used as secondary
4662           associative-containers in situations where the average ratio of
4663           secondary keys to primary keys is low (or even 1).</para>
4664
4665           <para>In order to reduce the per-container memory overhead as much
4666           as possible, they are implemented as closely as possible to
4667           singly-linked lists.</para>
4668
4669           <orderedlist>
4670             <listitem>
4671               <para>
4672                 List-based containers do not store internally the number
4673                 of values that they hold. This means that their <function>size</function>
4674                 method has linear complexity (just like <classname>std::list</classname>).
4675                 Note that finding the number of equivalent-key values in a
4676                 standard multimap also has linear complexity (because it must be
4677                 done,  via <function>std::distance</function> of the
4678                 multimap's <function>equal_range</function> method), but usually with
4679                 higher constants.
4680               </para>
4681             </listitem>
4682
4683             <listitem>
4684               <para>
4685                 Most associative-container objects each hold a policy
4686                 object (a hash-based container object holds a
4687                 hash functor). List-based containers, conversely, only have
4688                 class-wide policy objects.
4689               </para>
4690             </listitem>
4691           </orderedlist>
4692
4693
4694         </section>
4695
4696       </section> <!-- details -->
4697
4698     </section> <!-- list -->
4699
4700
4701     <!-- priority_queue -->
4702     <section xml:id="pbds.design.container.priority_queue">
4703       <info><title>Priority Queue</title></info>
4704
4705       <section xml:id="container.priority_queue.interface">
4706         <info><title>Interface</title></info>
4707
4708         <para>The priority queue container has the following
4709         declaration:
4710         </para>
4711         <programlisting>
4712           template&lt;typename  Value_Type,
4713           typename  Cmp_Fn = std::less&lt;Value_Type&gt;,
4714           typename  Tag = pairing_heap_tag,
4715           typename  Allocator = std::allocator&lt;char &gt; &gt;
4716           class priority_queue;
4717         </programlisting>
4718
4719         <para>The parameters have the following meaning:</para>
4720
4721         <orderedlist>
4722           <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4723
4724           <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4725
4726           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4727           to use.</para></listitem>
4728
4729           <listitem><para><classname>Allocator</classname> is an allocator
4730           type.</para></listitem>
4731         </orderedlist>
4732
4733         <para>The <classname>Tag</classname> parameter specifies which underlying
4734         data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4735         <classname>binomial_heap_tag</classname>,
4736         <classname>rc_binomial_heap_tag</classname>,
4737         or <classname>thin_heap_tag</classname>,
4738         specifies, respectively,
4739         an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4740         binary heap (<xref linkend="biblio.clrs2001"/>),
4741         binomial heap (<xref linkend="biblio.clrs2001"/>),
4742         a binomial heap with a redundant binary counter (<xref linkend="biblio.maverik_lowerbounds"/>),
4743         or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4744         </para>
4745
4746         <para>
4747           As mentioned in the tutorial,
4748           <classname>__gnu_pbds::priority_queue</classname> shares most of the
4749           same interface with <classname>std::priority_queue</classname>.
4750           E.g. if <varname>q</varname> is a priority queue of type
4751           <classname>Q</classname>, then <function>q.top()</function> will
4752           return the "largest" value in the container (according to
4753           <classname>typename
4754           Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4755           has a larger (and very slightly different) interface than
4756           <classname>std::priority_queue</classname>, however, since typically
4757           <classname>push</classname> and <classname>pop</classname> are deemed
4758         insufficient for manipulating priority-queues. </para>
4759
4760         <para>Different settings require different priority-queue
4761         implementations which are described in later; see traits
4762         discusses ways to differentiate between the different traits of
4763         different implementations.</para>
4764
4765
4766       </section>
4767
4768       <section xml:id="container.priority_queue.details">
4769         <info><title>Details</title></info>
4770
4771         <section xml:id="container.priority_queue.details.iterators">
4772           <info><title>Iterators</title></info>
4773
4774           <para>There are many different underlying-data structures for
4775           implementing priority queues. Unfortunately, most such
4776           structures are oriented towards making <function>push</function> and
4777           <function>top</function> efficient, and consequently don't allow efficient
4778           access of other elements: for instance, they cannot support an efficient
4779           <function>find</function> method. In the use case where it
4780           is important to both access and "do something with" an
4781           arbitrary value, one would be out of luck. For example, many graph algorithms require
4782           modifying a value (typically increasing it in the sense of the
4783           priority queue's comparison functor).</para>
4784
4785           <para>In order to access and manipulate an arbitrary value in a
4786           priority queue, one needs to reference the internals of the
4787           priority queue from some form of an associative container -
4788           this is unavoidable. Of course, in order to maintain the
4789           encapsulation of the priority queue, this needs to be done in a
4790           way that minimizes exposure to implementation internals.</para>
4791
4792           <para>In this library the priority queue's <function>insert</function>
4793           method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4794           <function>erase</function> operations. This both preserves the priority
4795           queue's encapsulation, and allows accessing arbitrary values (since the
4796           returned iterators from the <function>push</function> operation can be
4797           stored in some form of associative container).</para>
4798
4799           <para>Priority queues' iterators present a problem regarding their
4800           invalidation guarantees. One assumes that calling
4801           <function>operator++</function> on an iterator will associate it
4802           with the "next" value. Priority-queues are
4803           self-organizing: each operation changes what the "next" value
4804           means. Consequently, it does not make sense that <function>push</function>
4805           will return an iterator that can be incremented - this can have
4806           no possible use. Also, as in the case of hash-based containers,
4807           it is awkward to define if a subsequent <function>push</function> operation
4808           invalidates a prior returned iterator: it invalidates it in the
4809           sense that its "next" value is not related to what it
4810           previously considered to be its "next" value. However, it might not
4811           invalidate it, in the sense that it can be
4812           de-referenced and used for <function>modify</function> and <function>erase</function>
4813           operations.</para>
4814
4815           <para>Similarly to the case of the other unordered associative
4816           containers, this library uses a distinction between
4817           point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4818           converted to a <classname>point_iterator</classname>, and a
4819           <classname>const_iterator</classname> can always be converted to a
4820           <classname>point_const_iterator</classname>.</para>
4821
4822           <para>The following snippet demonstrates manipulating an arbitrary
4823           value:</para>
4824           <programlisting>
4825             // A priority queue of integers.
4826             priority_queue&lt;int &gt; p;
4827
4828             // Insert some values into the priority queue.
4829             priority_queue&lt;int &gt;::point_iterator it = p.push(0);
4830
4831             p.push(1);
4832             p.push(2);
4833
4834             // Now modify a value.
4835             p.modify(it, 3);
4836
4837             assert(p.top() == 3);
4838           </programlisting>
4839
4840
4841           <para>It should be noted that an alternative design could embed an
4842           associative container in a priority queue. Could, but most
4843           probably should not. To begin with, it should be noted that one
4844           could always encapsulate a priority queue and an associative
4845           container mapping values to priority queue iterators with no
4846           performance loss. One cannot, however, "un-encapsulate" a priority
4847           queue embedding an associative container, which might lead to
4848           performance loss. Assume, that one needs to associate each value
4849           with some data unrelated to priority queues. Then using
4850           this library's design, one could use an
4851           associative container mapping each value to a pair consisting of
4852           this data and a priority queue's iterator. Using the embedded
4853           method would need to use two associative containers. Similar
4854           problems might arise in cases where a value can reside
4855           simultaneously in many priority queues.</para>
4856
4857         </section>
4858
4859
4860         <section xml:id="container.priority_queue.details.d">
4861           <info><title>Underlying Data Structure</title></info>
4862
4863           <para>There are three main implementations of priority queues: the
4864           first employs a binary heap, typically one which uses a
4865           sequence; the second uses a tree (or forest of trees), which is
4866           typically less structured than an associative container's tree;
4867           the third simply uses an associative container. These are
4868           shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4869
4870           <figure>
4871             <title>Underlying Priority-Queue Data-Structures.</title>
4872             <mediaobject>
4873               <imageobject>
4874                 <imagedata align="center" format="PNG" scale="100"
4875                            fileref="../images/pbds_priority_queue_different_underlying_dss.png"/>
4876               </imageobject>
4877               <textobject>
4878                 <phrase>Underlying Priority-Queue Data-Structures.</phrase>
4879               </textobject>
4880             </mediaobject>
4881           </figure>
4882
4883           <para>Roughly speaking, any value that is both pushed and popped
4884           from a priority queue must incur a logarithmic expense (in the
4885           amortized sense). Any priority queue implementation that would
4886           avoid this, would violate known bounds on comparison-based
4887           sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4888           </para>
4889
4890           <para>Most implementations do
4891           not differ in the asymptotic amortized complexity of
4892           <function>push</function> and <function>pop</function> operations, but they differ in
4893           the constants involved, in the complexity of other operations
4894           (e.g., <function>modify</function>), and in the worst-case
4895           complexity of single operations. In general, the more
4896           "structured" an implementation (i.e., the more internal
4897           invariants it possesses) - the higher its amortized complexity
4898           of <function>push</function> and <function>pop</function> operations.</para>
4899
4900           <para>This library implements different algorithms using a
4901           single class: <classname>priority_queue</classname>.
4902           Instantiating the <classname>Tag</classname> template parameter, "selects"
4903           the implementation:</para>
4904
4905           <orderedlist>
4906             <listitem><para>
4907               Instantiating <classname>Tag = binary_heap_tag</classname> creates
4908               a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4909               selected by priority_queue
4910               if <classname>Value_Type</classname> is instantiated by a primitive type
4911               (e.g., an <type>int</type>); the latter is
4912               internally selected for all other types (e.g.,
4913               <classname>std::string</classname>). This implementations is relatively
4914               unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4915               performance; it is the "best-in-kind" for primitive
4916               types, e.g., <type>int</type>s. Conversely, it has
4917               high worst-case performance, and can support only linear-time
4918             <function>modify</function> and <function>erase</function> operations.</para></listitem>
4919
4920             <listitem><para>Instantiating <classname>Tag =
4921             pairing_heap_tag</classname> creates a pairing heap of the form
4922             in represented by label B in the graphic above. This
4923             implementations too is relatively unstructured, and so has good
4924             <function>push</function> and <function>pop</function>
4925             performance; it is the "best-in-kind" for non-primitive types,
4926             e.g., <classname>std:string</classname>s. It also has very good
4927             worst-case <function>push</function> and
4928             <function>join</function> performance (O(1)), but has high
4929             worst-case <function>pop</function>
4930             complexity.</para></listitem>
4931
4932             <listitem><para>Instantiating <classname>Tag =
4933             binomial_heap_tag</classname> creates a binomial heap of the
4934             form repsented by label B in the graphic above. This
4935             implementations is more structured than a pairing heap, and so
4936             has worse <function>push</function> and <function>pop</function>
4937             performance. Conversely, it has sub-linear worst-case bounds for
4938             <function>pop</function>, e.g., and so it might be preferred in
4939             cases where responsiveness is important.</para></listitem>
4940
4941             <listitem><para>Instantiating <classname>Tag =
4942             rc_binomial_heap_tag</classname> creates a binomial heap of the
4943             form represented in label B above, accompanied by a redundant
4944             counter which governs the trees. This implementations is
4945             therefore more structured than a binomial heap, and so has worse
4946             <function>push</function> and <function>pop</function>
4947             performance. Conversely, it guarantees O(1)
4948             <function>push</function> complexity, and so it might be
4949             preferred in cases where the responsiveness of a binomial heap
4950             is insufficient.</para></listitem>
4951
4952             <listitem><para>Instantiating <classname>Tag =
4953             thin_heap_tag</classname> creates a thin heap of the form
4954             represented by the label B in the graphic above. This
4955             implementations too is more structured than a pairing heap, and
4956             so has worse <function>push</function> and
4957             <function>pop</function> performance. Conversely, it has better
4958             worst-case and identical amortized complexities than a Fibonacci
4959             heap, and so might be more appropriate for some graph
4960             algorithms.</para></listitem>
4961           </orderedlist>
4962
4963           <para>Of course, one can use any order-preserving associative
4964           container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4965           over the associative container (much as
4966           <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4967           This has the advantage that no cross-referencing is necessary
4968           at all; the priority queue itself is an associative container.
4969           Most associative containers are too structured to compete with
4970           priority queues in terms of <function>push</function> and <function>pop</function>
4971           performance.</para>
4972
4973
4974
4975         </section>
4976
4977         <section xml:id="container.priority_queue.details.traits">
4978           <info><title>Traits</title></info>
4979
4980           <para>It would be nice if all priority queues could
4981           share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4982           two binary heaps might throw an exception (not corrupt
4983           any of the heaps on which it operates), but joining two pairing
4984           heaps is exception free.</para>
4985
4986           <para>Tags and traits are very useful for manipulating generic
4987           types. <classname>__gnu_pbds::priority_queue</classname>
4988           publicly defines <classname>container_category</classname> as one of the tags. Given any
4989           container <classname>Cntnr</classname>, the tag of the underlying
4990           data structure can be found via <classname>typename
4991           Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4992           </para>
4993
4994           <figure>
4995             <title>Priority-Queue Data-Structure Tags.</title>
4996             <mediaobject>
4997               <imageobject>
4998                 <imagedata align="center" format="PNG" scale="100"
4999                            fileref="../images/pbds_priority_queue_tag_hierarchy.png"/>
5000               </imageobject>
5001               <textobject>
5002                 <phrase>Priority-Queue Data-Structure Tags.</phrase>
5003               </textobject>
5004             </mediaobject>
5005           </figure>
5006
5007
5008           <para>Additionally, a traits mechanism can be used to query a
5009           container type for its attributes. Given any container
5010           <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits&lt;Cntnr&gt;</programlisting>
5011           is a traits class identifying the properties of the
5012           container.</para>
5013
5014           <para>To find if a container might throw if two of its objects are
5015           joined, one can use
5016           <programlisting>
5017             container_traits&lt;Cntnr&gt;::split_join_can_throw
5018           </programlisting>
5019           </para>
5020
5021           <para>
5022             Different priority-queue implementations have different invalidation guarantees. This is
5023             especially important, since there is no way to access an arbitrary
5024             value of priority queues except for iterators. Similarly to
5025             associative containers, one can use
5026             <programlisting>
5027               container_traits&lt;Cntnr&gt;::invalidation_guarantee
5028             </programlisting>
5029           to get the invalidation guarantee type of a priority queue.</para>
5030
5031           <para>It is easy to understand from the graphic above, what <classname>container_traits&lt;Cntnr&gt;::invalidation_guarantee</classname>
5032           will be for different implementations. All implementations of
5033           type represented by label B have <classname>point_invalidation_guarantee</classname>:
5034           the container can freely internally reorganize the nodes -
5035           range-type iterators are invalidated, but point-type iterators
5036           are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5037           the container can freely internally reallocate the array - both
5038           point-type and range-type iterators might be invalidated.</para>
5039
5040           <para>
5041             This has major implications, and constitutes a good reason to avoid
5042             using binary heaps. A binary heap can perform <function>modify</function>
5043             or <function>erase</function> efficiently given a valid point-type
5044             iterator. However, in order to supply it with a valid point-type
5045             iterator, one needs to iterate (linearly) over all
5046             values, then supply the relevant iterator (recall that a
5047             range-type iterator can always be converted to a point-type
5048             iterator). This means that if the number of <function>modify</function> or
5049             <function>erase</function> operations is non-negligible (say
5050             super-logarithmic in the total sequence of operations) - binary
5051             heaps will perform badly.
5052           </para>
5053
5054         </section>
5055
5056       </section> <!-- details -->
5057
5058     </section> <!-- priority_queue -->
5059
5060
5061
5062   </section> <!-- container -->
5063
5064   </section> <!-- design -->
5065
5066
5067
5068   <!-- S04: Test -->
5069   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5070               href="test_policy_data_structures.xml">
5071   </xi:include>
5072
5073   <!-- S05: Reference/Acknowledgments -->
5074   <section xml:id="pbds.ack">
5075     <info><title>Acknowledgments</title></info>
5076     <?dbhtml filename="policy_data_structures_biblio.html"?>
5077
5078     <para>
5079       Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5080       Laboratories), and Benjamin Kosnik (Red Hat).
5081     </para>
5082
5083     <para>
5084       This library was partially written at
5085       <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.haifa.il.ibm.com/">IBM's Haifa Research Labs</link>.
5086       It is based heavily on policy-based design and uses many useful
5087       techniques from Modern C++ Design: Generic Programming and Design
5088       Patterns Applied by Andrei Alexandrescu.
5089     </para>
5090
5091     <para>
5092       Two ideas are borrowed from the SGI-STL implementation:
5093     </para>
5094
5095     <orderedlist>
5096       <listitem>
5097         <para>
5098           The prime-based resize policies use a list of primes taken from
5099           the SGI-STL implementation.
5100         </para>
5101       </listitem>
5102
5103       <listitem>
5104         <para>
5105           The red-black trees contain both a root node and a header node
5106           (containing metadata), connected in a way that forward and
5107           reverse iteration can be performed efficiently.
5108         </para>
5109       </listitem>
5110     </orderedlist>
5111
5112     <para>
5113       Some test utilities borrow ideas from
5114       <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.boost.org/doc/libs/release/libs/timer/index.html">boost::timer</link>.
5115     </para>
5116
5117     <para>
5118       We would like to thank Scott Meyers for useful comments (without
5119       attributing to him any flaws in the design or implementation of the
5120       library).
5121     </para>
5122     <para>We would like to thank Matt Austern for the suggestion to
5123     include tries.</para>
5124   </section>
5125
5126   <!-- S06: Biblio -->
5127   <bibliography xml:id="pbds.biblio">
5128     <info>
5129       <title>
5130         Bibliography
5131       </title>
5132     </info>
5133     <?dbhtml filename="policy_data_structures_biblio.html"?>
5134
5135     <!-- 01 -->
5136     <biblioentry xml:id="biblio.abrahams97exception">
5137       <title>
5138         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5139               xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1997/N1075.pdf">
5140           STL Exception Handling Contract
5141         </link>
5142       </title>
5143       <date>1997</date>
5144
5145       <author>
5146         <personname>
5147           <firstname>
5148             Dave
5149           </firstname>
5150           <surname>
5151             Abrahams
5152           </surname>
5153         </personname>
5154       </author>
5155
5156       <publisher>
5157         <publishername>
5158           ISO SC22/WG21
5159         </publishername>
5160       </publisher>
5161     </biblioentry>
5162
5163
5164     <!-- 02 -->
5165     <biblioentry xml:id="biblio.alexandrescu01modern">
5166       <title>
5167         Modern C++ Design: Generic Programming and Design Patterns Applied
5168       </title>
5169       <date>
5170         2001
5171       </date>
5172
5173       <author>
5174         <personname>
5175           <firstname>
5176             Andrei
5177           </firstname>
5178           <surname>
5179             Alexandrescu
5180           </surname>
5181         </personname>
5182       </author>
5183
5184       <publisher>
5185         <publishername>
5186           Addison-Wesley Publishing Company
5187         </publishername>
5188       </publisher>
5189     </biblioentry>
5190
5191
5192     <!-- 03 -->
5193     <biblioentry xml:id="biblio.andrew04mtf">
5194       <title>
5195         MTF, Bit, and COMB: A Guide to Deterministic and Randomized
5196         Algorithms for the List Update Problem
5197       </title>
5198
5199       <authorgroup>
5200         <author>
5201           <personname>
5202             <firstname>
5203               K.
5204             </firstname>
5205             <surname>
5206               Andrew
5207             </surname>
5208           </personname>
5209         </author>
5210
5211         <author>
5212           <personname>
5213             <firstname>
5214               D.
5215             </firstname>
5216             <surname>
5217               Gleich
5218             </surname>
5219           </personname>
5220         </author>
5221       </authorgroup>
5222     </biblioentry>
5223
5224     <!-- 04 -->
5225     <biblioentry xml:id="biblio.austern00noset">
5226       <title>
5227         Why You Shouldn't Use set - and What You Should Use Instead
5228       </title>
5229       <date>
5230         April, 2000
5231       </date>
5232
5233       <author>
5234         <personname>
5235           <firstname>
5236             Matthew
5237           </firstname>
5238           <surname>
5239             Austern
5240           </surname>
5241         </personname>
5242       </author>
5243
5244       <publisher>
5245         <publishername>
5246           C++ Report
5247         </publishername>
5248       </publisher>
5249     </biblioentry>
5250
5251     <!-- 05 -->
5252     <biblioentry xml:id="biblio.austern01htprop">
5253       <title>
5254         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5255               xlink:href="http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2001/n1326.html">
5256           A Proposal to Add Hashtables to the Standard Library
5257         </link>
5258       </title>
5259       <date>
5260         2001
5261       </date>
5262
5263       <author>
5264         <personname>
5265           <firstname>
5266             Matthew
5267           </firstname>
5268           <surname>
5269             Austern
5270           </surname>
5271         </personname>
5272       </author>
5273
5274       <publisher>
5275         <publishername>
5276           ISO SC22/WG21
5277         </publishername>
5278       </publisher>
5279     </biblioentry>
5280
5281     <!-- 06 -->
5282     <biblioentry xml:id="biblio.austern98segmentedit">
5283       <title>
5284         Segmented iterators and hierarchical algorithms
5285       </title>
5286       <date>
5287         April, 1998
5288       </date>
5289
5290       <author>
5291         <personname>
5292           <firstname>
5293             Matthew
5294           </firstname>
5295           <surname>
5296             Austern
5297           </surname>
5298         </personname>
5299       </author>
5300
5301       <publisher>
5302         <publishername>
5303           Generic Programming
5304         </publishername>
5305       </publisher>
5306     </biblioentry>
5307
5308     <!-- 07 -->
5309     <biblioentry xml:id="biblio.dawestimer">
5310       <title>
5311         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5312               xlink:href="www.boost.org/doc/libs/release/libs/timer/">
5313           Boost Timer Library
5314         </link>
5315       </title>
5316
5317       <author>
5318         <personname>
5319           <firstname>
5320             Beeman
5321           </firstname>
5322           <surname>
5323             Dawes
5324           </surname>
5325         </personname>
5326       </author>
5327
5328       <publisher>
5329         <publishername>
5330           Boost
5331         </publishername>
5332       </publisher>
5333     </biblioentry>
5334
5335     <!-- 08 -->
5336     <biblioentry xml:id="biblio.clearypool">
5337       <title>
5338         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5339               xlink:href="www.boost.org/doc/libs/release/libs/pool/">
5340           Boost Pool Library
5341         </link>
5342       </title>
5343
5344       <author>
5345         <personname>
5346           <firstname>
5347             Stephen
5348           </firstname>
5349           <surname>
5350             Cleary
5351           </surname>
5352         </personname>
5353       </author>
5354
5355       <publisher>
5356         <publishername>
5357           Boost
5358         </publishername>
5359       </publisher>
5360     </biblioentry>
5361
5362
5363     <!-- 09 -->
5364     <biblioentry xml:id="biblio.maddocktraits">
5365       <title>
5366         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5367               xlink:href="www.boost.org/doc/libs/release/libs/type_traits/">
5368           Boost Type Traits Library
5369         </link>
5370       </title>
5371       <authorgroup>
5372         <author>
5373           <personname>
5374             <firstname>
5375               Maddock
5376             </firstname>
5377             <surname>
5378               John
5379             </surname>
5380           </personname>
5381         </author>
5382         <author>
5383           <personname>
5384             <firstname>
5385               Stephen
5386             </firstname>
5387             <surname>
5388               Cleary
5389             </surname>
5390           </personname>
5391         </author>
5392       </authorgroup>
5393       <publisher>
5394         <publishername>
5395           Boost
5396         </publishername>
5397       </publisher>
5398     </biblioentry>
5399
5400     <!-- 10 -->
5401     <biblioentry xml:id="biblio.brodal96priority">
5402       <title>
5403         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5404               xlink:href="http://portal.acm.org/citation.cfm?id=313883">
5405           Worst-case efficient priority queues
5406         </link>
5407       </title>
5408
5409       <author>
5410         <personname>
5411           <firstname>
5412             Gerth
5413           </firstname>
5414           <surname>
5415             Stolting Brodal
5416           </surname>
5417         </personname>
5418       </author>
5419
5420     </biblioentry>
5421
5422     <!-- 11 -->
5423     <biblioentry xml:id="biblio.bulkamayheweff">
5424       <title>
5425         Efficient C++ Programming Techniques
5426       </title>
5427       <date>
5428         1997
5429       </date>
5430
5431       <authorgroup>
5432         <author>
5433           <personname>
5434             <firstname>
5435               D.
5436             </firstname>
5437             <surname>
5438               Bulka
5439             </surname>
5440           </personname>
5441         </author>
5442         <author>
5443           <personname>
5444             <firstname>
5445               D.
5446             </firstname>
5447             <surname>
5448               Mayhew
5449             </surname>
5450           </personname>
5451         </author>
5452       </authorgroup>
5453
5454       <publisher>
5455         <publishername>
5456           Addison-Wesley Publishing Company
5457         </publishername>
5458       </publisher>
5459     </biblioentry>
5460
5461     <!-- 12 -->
5462     <biblioentry xml:id="biblio.clrs2001">
5463       <title>
5464         Introduction to Algorithms, 2nd edition
5465       </title>
5466       <date>
5467         2001
5468       </date>
5469       <authorgroup>
5470         <author>
5471           <personname>
5472             <firstname>
5473               T. H.
5474             </firstname>
5475             <surname>
5476               Cormen
5477             </surname>
5478           </personname>
5479         </author>
5480
5481         <author>
5482           <personname>
5483             <firstname>
5484               C. E.
5485             </firstname>
5486             <surname>
5487               Leiserson
5488             </surname>
5489           </personname>
5490         </author>
5491
5492         <author>
5493           <personname>
5494             <firstname>
5495               R. L.
5496             </firstname>
5497             <surname>
5498               Rivest
5499             </surname>
5500           </personname>
5501         </author>
5502
5503         <author>
5504           <personname>
5505             <firstname>
5506               C.
5507             </firstname>
5508             <surname>
5509               Stein
5510             </surname>
5511           </personname>
5512         </author>
5513       </authorgroup>
5514       <publisher>
5515         <publishername>
5516           MIT Press
5517         </publishername>
5518       </publisher>
5519     </biblioentry>
5520
5521     <!-- 13 -->
5522     <biblioentry xml:id="biblio.dubhashi98neg">
5523       <title>
5524         Balls and bins: A study in negative dependence
5525       </title>
5526       <date>
5527         1998
5528       </date>
5529       <authorgroup>
5530         <author>
5531           <personname>
5532             <firstname>
5533               D.
5534             </firstname>
5535             <surname>
5536               Dubashi
5537             </surname>
5538           </personname>
5539         </author>
5540         <author>
5541           <personname>
5542             <firstname>
5543               D.
5544             </firstname>
5545             <surname>
5546               Ranjan
5547             </surname>
5548           </personname>
5549         </author>
5550       </authorgroup>
5551
5552       <publisher>
5553         <publishername>
5554           Random Structures and Algorithms 13
5555         </publishername>
5556       </publisher>
5557     </biblioentry>
5558
5559
5560     <!-- 14 -->
5561     <biblioentry xml:id="biblio.fagin79extendible">
5562       <title>
5563         Extendible hashing - a fast access method for dynamic files
5564       </title>
5565       <date>
5566         1979
5567       </date>
5568
5569       <authorgroup>
5570         <author>
5571           <personname>
5572             <firstname>
5573               R.
5574             </firstname>
5575             <surname>
5576               Fagin
5577             </surname>
5578           </personname>
5579         </author>
5580         <author>
5581           <personname>
5582             <firstname>
5583               J.
5584             </firstname>
5585             <surname>
5586               Nievergelt
5587             </surname>
5588           </personname>
5589         </author>
5590         <author>
5591           <personname>
5592             <firstname>
5593               N.
5594             </firstname>
5595             <surname>
5596               Pippenger
5597             </surname>
5598           </personname>
5599         </author>
5600         <author>
5601           <personname>
5602             <firstname>
5603               H. R.
5604             </firstname>
5605             <surname>
5606               Strong
5607             </surname>
5608           </personname>
5609         </author>
5610       </authorgroup>
5611
5612       <publisher>
5613         <publishername>
5614           ACM Trans. Database Syst. 4
5615         </publishername>
5616       </publisher>
5617     </biblioentry>
5618
5619
5620
5621     <!-- 15 -->
5622     <biblioentry xml:id="biblio.filliatre2000ptset">
5623       <title>
5624         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5625               xlink:href="http://cristal.inria.fr/~frisch/icfp06_contest/advtr/applyOmatic/ptset.ml">
5626           Ptset: Sets of integers implemented as Patricia trees
5627         </link>
5628       </title>
5629
5630       <date>
5631         2000
5632       </date>
5633
5634       <author>
5635         <personname>
5636           <firstname>
5637             Jean-Christophe
5638           </firstname>
5639           <surname>
5640             Filliatre
5641           </surname>
5642         </personname>
5643       </author>
5644     </biblioentry>
5645
5646
5647
5648     <!-- 16 -->
5649     <biblioentry xml:id="biblio.fredman86pairing">
5650       <title>
5651         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5652               xlink:href="http://www.cs.cmu.edu/~sleator/papers/pairing-heaps.pdf">
5653           The pairing heap: a new form of self-adjusting heap
5654         </link>
5655       </title>
5656       <date>
5657         1986
5658       </date>
5659       <authorgroup>
5660         <author>
5661           <personname>
5662             <firstname>
5663               M. L.
5664             </firstname>
5665             <surname>
5666               Fredman
5667             </surname>
5668           </personname>
5669         </author>
5670         <author>
5671           <personname>
5672             <firstname>
5673               R.
5674             </firstname>
5675             <surname>
5676               Sedgewick
5677             </surname>
5678           </personname>
5679         </author>
5680         <author>
5681           <personname>
5682             <firstname>
5683               D. D.
5684             </firstname>
5685             <surname>
5686               Sleator
5687             </surname>
5688           </personname>
5689         </author>
5690         <author>
5691           <personname>
5692             <firstname>
5693               R. E.
5694             </firstname>
5695             <surname>
5696               Tarjan
5697             </surname>
5698           </personname>
5699         </author>
5700       </authorgroup>
5701     </biblioentry>
5702
5703
5704     <!-- 17 -->
5705     <biblioentry xml:id="biblio.gof">
5706       <title>
5707         Design Patterns - Elements of Reusable Object-Oriented Software
5708       </title>
5709       <date>
5710         1995
5711       </date>
5712       <authorgroup>
5713         <author>
5714           <personname>
5715             <firstname>
5716               E.
5717             </firstname>
5718             <surname>
5719               Gamma
5720             </surname>
5721           </personname>
5722         </author>
5723         <author>
5724           <personname>
5725             <firstname>
5726               R.
5727             </firstname>
5728             <surname>
5729               Helm
5730             </surname>
5731           </personname>
5732         </author>
5733         <author>
5734           <personname>
5735             <firstname>
5736               R.
5737             </firstname>
5738             <surname>
5739               Johnson
5740             </surname>
5741           </personname>
5742         </author>
5743         <author>
5744           <personname>
5745             <firstname>
5746               J.
5747             </firstname>
5748             <surname>
5749               Vlissides
5750             </surname>
5751           </personname>
5752         </author>
5753       </authorgroup>
5754       <publisher>
5755         <publishername>
5756           Addison-Wesley Publishing Company
5757         </publishername>
5758       </publisher>
5759     </biblioentry>
5760
5761
5762     <!-- 18 -->
5763     <biblioentry xml:id="biblio.garg86order">
5764       <title>
5765         Order-preserving key transformations
5766       </title>
5767       <date>
5768         1986
5769       </date>
5770       <authorgroup>
5771         <author>
5772           <personname>
5773             <firstname>
5774               A. K.
5775             </firstname>
5776             <surname>
5777               Garg
5778             </surname>
5779           </personname>
5780         </author>
5781         <author>
5782           <personname>
5783             <firstname>
5784               C. C.
5785             </firstname>
5786             <surname>
5787               Gotlieb
5788             </surname>
5789           </personname>
5790         </author>
5791       </authorgroup>
5792
5793       <publisher>
5794         <publishername>
5795           Trans. Database Syst. 11
5796         </publishername>
5797       </publisher>
5798     </biblioentry>
5799
5800     <!-- 19 -->
5801     <biblioentry xml:id="biblio.hyslop02making">
5802       <title>
5803         Making a real hash of things
5804       </title>
5805       <date>
5806         May 2002
5807       </date>
5808       <authorgroup>
5809         <author>
5810           <personname>
5811             <firstname>
5812               J.
5813             </firstname>
5814             <surname>
5815               Hyslop
5816             </surname>
5817           </personname>
5818         </author>
5819         <author>
5820           <personname>
5821             <firstname>
5822               Herb
5823             </firstname>
5824             <surname>
5825               Sutter
5826             </surname>
5827           </personname>
5828         </author>
5829       </authorgroup>
5830
5831       <publisher>
5832         <publishername>
5833           C++ Report
5834         </publishername>
5835       </publisher>
5836     </biblioentry>
5837
5838
5839     <!-- 20 -->
5840     <biblioentry xml:id="biblio.jossutis01stl">
5841       <title>
5842         The C++ Standard Library - A Tutorial and Reference
5843       </title>
5844       <date>
5845         2001
5846       </date>
5847
5848       <author>
5849         <personname>
5850           <firstname>
5851             N. M.
5852           </firstname>
5853           <surname>
5854             Jossutis
5855           </surname>
5856         </personname>
5857       </author>
5858       <publisher>
5859         <publishername>
5860           Addison-Wesley Publishing Company
5861         </publishername>
5862       </publisher>
5863     </biblioentry>
5864
5865     <!-- 21 -->
5866     <biblioentry xml:id="biblio.kt99fat_heaps">
5867       <title>
5868         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5869               xlink:href="http://www.cs.princeton.edu/research/techreps/TR-597-99">
5870           New Heap Data Structures
5871         </link>
5872       </title>
5873       <date>
5874         1999
5875       </date>
5876
5877       <authorgroup>
5878         <author>
5879           <personname>
5880             <firstname>
5881               Haim
5882             </firstname>
5883             <surname>
5884               Kaplan
5885             </surname>
5886           </personname>
5887         </author>
5888         <author>
5889           <personname>
5890             <firstname>
5891               Robert E.
5892             </firstname>
5893             <surname>
5894               Tarjan
5895             </surname>
5896           </personname>
5897         </author>
5898       </authorgroup>
5899     </biblioentry>
5900
5901
5902     <!-- 22 -->
5903     <biblioentry xml:id="biblio.kleft00sets">
5904       <title>
5905         Are Set Iterators Mutable or Immutable?
5906       </title>
5907       <date>
5908         October 2000
5909       </date>
5910       <authorgroup>
5911         <author>
5912           <personname>
5913             <firstname>
5914               Angelika
5915             </firstname>
5916             <surname>
5917               Langer
5918             </surname>
5919           </personname>
5920         </author>
5921
5922         <author>
5923           <personname>
5924             <firstname>
5925               Klaus
5926             </firstname>
5927             <surname>
5928               Kleft
5929             </surname>
5930           </personname>
5931         </author>
5932       </authorgroup>
5933
5934       <publisher>
5935         <publishername>
5936           C/C++ Users Jornal
5937         </publishername>
5938       </publisher>
5939     </biblioentry>
5940
5941     <!-- 23 -->
5942     <biblioentry xml:id="biblio.knuth98sorting">
5943       <title>
5944         The Art of Computer Programming - Sorting and Searching
5945       </title>
5946       <date>
5947         1998
5948       </date>
5949
5950       <author>
5951         <personname>
5952           <firstname>
5953             D. E.
5954           </firstname>
5955           <surname>
5956             Knuth
5957           </surname>
5958         </personname>
5959       </author>
5960
5961       <publisher>
5962         <publishername>
5963           Addison-Wesley Publishing Company
5964         </publishername>
5965       </publisher>
5966     </biblioentry>
5967
5968     <!-- 24 -->
5969     <biblioentry xml:id="biblio.liskov98data">
5970       <title>
5971         Data abstraction and hierarchy
5972       </title>
5973       <date>
5974         May 1998
5975       </date>
5976
5977       <author>
5978         <personname>
5979           <firstname>
5980             B.
5981           </firstname>
5982           <surname>
5983             Liskov
5984           </surname>
5985         </personname>
5986       </author>
5987
5988       <publisher>
5989         <publishername>
5990           SIGPLAN Notices 23
5991         </publishername>
5992       </publisher>
5993     </biblioentry>
5994
5995     <!-- 25 -->
5996     <biblioentry xml:id="biblio.litwin80lh">
5997       <title>
5998         Linear hashing: A new tool for file and table addressing
5999       </title>
6000       <date>
6001         June 1980
6002       </date>
6003
6004       <author>
6005         <personname>
6006           <firstname>
6007             W.
6008           </firstname>
6009           <surname>
6010             Litwin
6011           </surname>
6012         </personname>
6013       </author>
6014
6015       <publisher>
6016         <publishername>
6017           Proceedings of International Conference on Very Large Data Bases
6018         </publishername>
6019       </publisher>
6020     </biblioentry>
6021
6022     <!-- 26 -->
6023     <biblioentry xml:id="biblio.maverik_lowerbounds">
6024       <title>
6025         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6026               xlink:href="http://magic.aladdin.cs.cmu.edu/2005/08/01/deamortization-part-2-binomial-heaps">
6027           Deamortization - Part 2: Binomial Heaps
6028         </link>
6029       </title>
6030       <date>
6031         2005
6032       </date>
6033
6034       <author>
6035         <personname>
6036           <firstname>
6037             Maverik
6038           </firstname>
6039           <surname>
6040             Woo
6041           </surname>
6042         </personname>
6043       </author>
6044     </biblioentry>
6045
6046     <!-- 27 -->
6047     <biblioentry xml:id="biblio.meyers96more">
6048       <title>
6049         More Effective C++: 35 New Ways to Improve Your Programs and Designs
6050       </title>
6051       <date>
6052         1996
6053       </date>
6054
6055       <author>
6056         <personname>
6057           <firstname>
6058             Scott
6059           </firstname>
6060           <surname>
6061             Meyers
6062           </surname>
6063         </personname>
6064       </author>
6065
6066       <publisher>
6067         <publishername>
6068           Addison-Wesley Publishing Company
6069         </publishername>
6070       </publisher>
6071     </biblioentry>
6072
6073     <!-- 28 -->
6074     <biblioentry xml:id="biblio.meyers00nonmember">
6075       <title>
6076         How Non-Member Functions Improve Encapsulation
6077       </title>
6078       <date>
6079         2000
6080       </date>
6081
6082       <author>
6083         <personname>
6084           <firstname>
6085             Scott
6086           </firstname>
6087           <surname>
6088             Meyers
6089           </surname>
6090         </personname>
6091       </author>
6092
6093       <publisher>
6094         <publishername>
6095           C/C++ Users Journal
6096         </publishername>
6097       </publisher>
6098     </biblioentry>
6099
6100     <!-- 29 -->
6101     <biblioentry xml:id="biblio.meyers01stl">
6102       <title>
6103         Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library
6104       </title>
6105       <date>
6106         2001
6107       </date>
6108
6109       <author>
6110         <personname>
6111           <firstname>
6112             Scott
6113           </firstname>
6114           <surname>
6115             Meyers
6116           </surname>
6117         </personname>
6118       </author>
6119
6120       <publisher>
6121         <publishername>
6122           Addison-Wesley Publishing Company
6123         </publishername>
6124       </publisher>
6125     </biblioentry>
6126
6127     <!-- 30 -->
6128     <biblioentry xml:id="biblio.meyers02both">
6129       <title>
6130         Class Template, Member Template - or Both?
6131       </title>
6132       <date>
6133         2003
6134       </date>
6135
6136       <author>
6137         <personname>
6138           <firstname>
6139             Scott
6140           </firstname>
6141           <surname>
6142             Meyers
6143           </surname>
6144         </personname>
6145       </author>
6146
6147       <publisher>
6148         <publishername>
6149           C/C++ Users Journal
6150         </publishername>
6151       </publisher>
6152     </biblioentry>
6153
6154     <!-- 31 -->
6155     <biblioentry xml:id="biblio.motwani95random">
6156       <title>
6157         Randomized Algorithms
6158       </title>
6159       <date>
6160         2003
6161       </date>
6162       <authorgroup>
6163         <author>
6164           <personname>
6165             <firstname>
6166               R.
6167             </firstname>
6168             <surname>
6169               Motwani
6170             </surname>
6171           </personname>
6172         </author>
6173         <author>
6174           <personname>
6175             <firstname>
6176               P.
6177             </firstname>
6178             <surname>
6179               Raghavan
6180             </surname>
6181           </personname>
6182         </author>
6183       </authorgroup>
6184       <publisher>
6185         <publishername>
6186           Cambridge University Press
6187         </publishername>
6188       </publisher>
6189     </biblioentry>
6190
6191
6192     <!-- 32 -->
6193     <biblioentry xml:id="biblio.mscom">
6194       <title>
6195         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6196               xlink:href="http://www.microsoft.com/com">
6197           COM: Component Model Object Technologies
6198         </link>
6199       </title>
6200       <publisher>
6201         <publishername>
6202           Microsoft
6203         </publishername>
6204       </publisher>
6205     </biblioentry>
6206
6207     <!-- 33 -->
6208     <biblioentry xml:id="biblio.musser95rationale">
6209       <title>
6210         Rationale for Adding Hash Tables to the C++ Standard Template Library
6211       </title>
6212       <date>
6213         1995
6214       </date>
6215
6216       <author>
6217         <personname>
6218           <firstname>
6219             David R.
6220           </firstname>
6221           <surname>
6222             Musser
6223           </surname>
6224         </personname>
6225       </author>
6226
6227     </biblioentry>
6228
6229     <!-- 35 -->
6230     <biblioentry xml:id="biblio.musser96stltutorial">
6231       <title>
6232         STL Tutorial and Reference Guide
6233       </title>
6234       <date>
6235         1996
6236       </date>
6237
6238       <authorgroup>
6239         <author>
6240           <personname>
6241             <firstname>
6242               David R.
6243             </firstname>
6244             <surname>
6245               Musser
6246             </surname>
6247           </personname>
6248         </author>
6249         <author>
6250           <personname>
6251             <firstname>
6252               A.
6253             </firstname>
6254             <surname>
6255               Saini
6256             </surname>
6257           </personname>
6258         </author>
6259       </authorgroup>
6260       <publisher>
6261         <publishername>
6262           Addison-Wesley Publishing Company
6263         </publishername>
6264       </publisher>
6265
6266     </biblioentry>
6267
6268
6269     <!-- 36 -->
6270     <biblioentry xml:id="biblio.nelson96stlpq">
6271       <title>
6272         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6273               xlink:href="http://www.dogma.net/markn/articles/pq_stl/priority.htm">Priority Queues and the STL
6274         </link>
6275       </title>
6276       <date>
6277         January 1996
6278       </date>
6279
6280       <author>
6281         <personname>
6282           <firstname>
6283             Mark
6284           </firstname>
6285           <surname>
6286             Nelson
6287           </surname>
6288         </personname>
6289       </author>
6290
6291       <publisher>
6292         <publishername>
6293           Dr. Dobbs Journal
6294         </publishername>
6295       </publisher>
6296     </biblioentry>
6297
6298
6299     <!-- 37 -->
6300     <biblioentry xml:id="biblio.okasaki98mereable">
6301       <title>
6302         Fast mergeable integer maps
6303       </title>
6304       <date>
6305         September 1998
6306       </date>
6307       <authorgroup>
6308         <author>
6309           <personname>
6310             <firstname>
6311               C.
6312             </firstname>
6313             <surname>
6314               Okasaki
6315             </surname>
6316           </personname>
6317         </author>
6318         <author>
6319           <personname>
6320             <firstname>
6321               A.
6322             </firstname>
6323             <surname>
6324               Gill
6325             </surname>
6326           </personname>
6327         </author>
6328       </authorgroup>
6329       <publisher>
6330         <publishername>
6331           In Workshop on ML
6332         </publishername>
6333       </publisher>
6334     </biblioentry>
6335
6336     <!-- 38 -->
6337     <biblioentry xml:id="biblio.sgi_stl">
6338       <title>
6339         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6340               xlink:href="http://www.sgi.com/tech/stl">
6341           Standard Template Library Programmer's Guide
6342         </link>
6343       </title>
6344       <author>
6345         <personname>
6346           <firstname>
6347             Matt
6348           </firstname>
6349           <surname>
6350             Austern
6351           </surname>
6352         </personname>
6353       </author>
6354
6355       <publisher>
6356         <publishername>
6357           SGI
6358         </publishername>
6359       </publisher>
6360     </biblioentry>
6361
6362     <!-- 39 -->
6363     <biblioentry xml:id="biblio.select_man">
6364       <title>
6365         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6366               xlink:href="http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+select">
6367           select
6368         </link>
6369       </title>
6370     </biblioentry>
6371
6372
6373     <!-- 40 -->
6374     <biblioentry xml:id="biblio.sleator84amortized">
6375       <title>
6376         Amortized Efficiency of List Update Problems
6377       </title>
6378       <date>
6379         1984
6380       </date>
6381       <authorgroup>
6382         <author>
6383           <personname>
6384             <firstname>
6385               D. D.
6386             </firstname>
6387             <surname>
6388               Sleator
6389             </surname>
6390           </personname>
6391         </author>
6392
6393         <author>
6394           <personname>
6395             <firstname>
6396               R. E.
6397             </firstname>
6398             <surname>
6399               Tarjan
6400             </surname>
6401           </personname>
6402         </author>
6403       </authorgroup>
6404
6405       <publisher>
6406         <publishername>
6407           ACM Symposium on Theory of Computing
6408         </publishername>
6409       </publisher>
6410     </biblioentry>
6411
6412     <!-- 41 -->
6413     <biblioentry xml:id="biblio.sleator85self">
6414       <title>
6415         Self-Adjusting Binary Search Trees
6416       </title>
6417       <date>
6418         1985
6419       </date>
6420
6421       <authorgroup>
6422         <author>
6423           <personname>
6424             <firstname>
6425               D. D.
6426             </firstname>
6427             <surname>
6428               Sleator
6429             </surname>
6430           </personname>
6431         </author>
6432
6433         <author>
6434           <personname>
6435             <firstname>
6436               R. E.
6437             </firstname>
6438             <surname>
6439               Tarjan
6440             </surname>
6441           </personname>
6442         </author>
6443       </authorgroup>
6444
6445       <publisher>
6446         <publishername>
6447           ACM Symposium on Theory of Computing
6448         </publishername>
6449       </publisher>
6450     </biblioentry>
6451
6452     <!-- 42 -->
6453     <biblioentry xml:id="biblio.stepanov94standard">
6454       <title>
6455         The Standard Template Library
6456       </title>
6457       <date>
6458         1984
6459       </date>
6460       <authorgroup>
6461         <author>
6462           <personname>
6463             <firstname>
6464               A. A.
6465             </firstname>
6466             <surname>
6467               Stepanov
6468             </surname>
6469           </personname>
6470         </author>
6471         <author>
6472           <personname>
6473             <firstname>
6474               M.
6475             </firstname>
6476             <surname>
6477               Lee
6478             </surname>
6479           </personname>
6480         </author>
6481       </authorgroup>
6482     </biblioentry>
6483
6484     <!-- 43 -->
6485     <biblioentry xml:id="biblio.stroustrup97cpp">
6486       <title>
6487         The C++ Programming Langugage
6488       </title>
6489       <date>
6490         1997
6491       </date>
6492
6493       <author>
6494         <personname>
6495           <firstname>
6496             Bjarne
6497           </firstname>
6498           <surname>
6499             Stroustrup
6500           </surname>
6501         </personname>
6502       </author>
6503
6504       <publisher>
6505         <publishername>
6506           Addison-Wesley Publishing Company
6507         </publishername>
6508       </publisher>
6509     </biblioentry>
6510
6511     <!-- 44 -->
6512     <biblioentry xml:id="biblio.vandevoorde2002cpptemplates">
6513       <title>
6514         C++ Templates: The Complete Guide
6515       </title>
6516       <date>
6517         2002
6518       </date>
6519       <authorgroup>
6520         <author>
6521           <personname>
6522             <firstname>
6523               D.
6524             </firstname>
6525             <surname>
6526               Vandevoorde
6527             </surname>
6528           </personname>
6529         </author>
6530
6531         <author>
6532           <personname>
6533             <firstname>
6534               N. M.
6535             </firstname>
6536             <surname>
6537               Josuttis
6538             </surname>
6539           </personname>
6540         </author>
6541       </authorgroup>
6542       <publisher>
6543         <publishername>
6544           Addison-Wesley Publishing Company
6545         </publishername>
6546       </publisher>
6547     </biblioentry>
6548
6549
6550     <!-- 45 -->
6551     <biblioentry xml:id="biblio.wickland96thirty">
6552       <title>
6553         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6554               xlink:href="http://myweb.wvnet.edu/~gsa00121/books/amongdead30.zip">
6555           Thirty Years Among the Dead
6556         </link>
6557       </title>
6558       <date>
6559         1996
6560       </date>
6561
6562       <author>
6563         <personname>
6564           <firstname>
6565             C. A.
6566           </firstname>
6567           <surname>
6568             Wickland
6569           </surname>
6570         </personname>
6571       </author>
6572
6573       <publisher>
6574         <publishername>
6575           National Psychological Institute
6576         </publishername>
6577       </publisher>
6578     </biblioentry>
6579
6580
6581   </bibliography>
6582
6583 </chapter>