%mathml; ]> Generic AVL Tree Implementation (GAVL) Pavel Pisa
pisa@cmp.felk.cvut.cz
2003 Pavel Pisa Implementation of AVL tree with some interesting features. All algorithms for insertion, deletion, traversal etc. are implemented without function call recursion or tree node stacks. The only exception is routine to check correctness of tree used for debugging purposes. The implementation has more advantages. It can work with partially unbalanced trees with balance factor (height difference) outside range <-1,1>. The balancing subroutine is common for insertion and deletion. Provided macros enables to build trees with strict item type checking which avoids necessity to use type-casting from (void *).
Introduction There are more concepts how to store data items sorted by some key values. The most commonly used ones are: sorted conntinuous arrays of items or pointers to items binary search trees (splash tree, AVL-tree, RB-tree) more childs per node search trees (B-tree) The binary search trees are considered most efficient for in memory operations when frequent tree updates are present. If update frequency is very low, sorted arrays of pointers could be more efficient. If the node retrieval operations are time consuming the n-nary search trees could be used. The provided GAVL library tries to offer fast and API friendly set of functions and macros usable for in-memory stored AVL trees of user specified items. The tree node chaining informations must be added to each stored item for holding of the tree node topology. The GAVL library can use user specified field of items or small malloced structure for holding of node chaining information. The tree search, insert and delete functions can work with and enforce tree with unique items or can be used in mode when more items with same search key value are allowed. The balancing and other functions are written such way, that they tolerate even degraded tree which does not fulfill AVL tree definition. The operations do not result in further degradation or program exceptions. Even future tree operations tends to enhance balancing if the height differences (balance factors) of degraded tree has been correctly updated. Functions Description Generic AVL tree !F../../ulut/ul_gavl.h !F../../ulut/ul_gavlprim.c !F../../ulut/ul_gavl.c Custom AVL Tree Instances The provided macros allows to define all AVL tree functions for tree with user defined root and item types. The next defined function names are prefixed by custom selected cust_prefix: cust_item_t * cust_prefix_node2item const cust_root_t * root const gavl_node_t * node Conversion from AVL tree node to user custom defined item cust_key_t * cust_prefix_node2key const cust_root_t * root const gavl_node_t * node Conversion from AVL tree node to pointer to defined ordering key int cust_prefix_search_node const cust_root_t * root const cust_item_t * key gavl_node_t ** nodep Search for node or place for node by key cust_item_t * cust_prefix_find const cust_root_t * root const cust_key_t * key Pointer to item with key value or NULL cust_item_t * cust_prefix_find_first const cust_root_t * root const cust_key_t * key Same as above, but first matching item is found. cust_item_t * cust_prefix_find_after const cust_root_t * root const cust_key_t * key Finds the first item with key value higher than value pointed by key or returns NULL. int cust_prefix_insert cust_root_t * root cust_item_t * item Insert new item into AVL tree. If the operation is successful positive value is returned. If key with same key value already exists negative value is returned int cust_prefix_delete_node cust_root_t * root gavl_node_t * node Deletes/unlinks node from custom AVL tree int cust_prefix_delete cust_root_t * root cust_item_t * item Deletes/unlinks item from custom AVL tree gavl_node_t * cust_prefix_first_node const cust_root_t * root Returns the first node or NULL if the tree is empty gavl_node_t * cust_prefix_last_node const cust_root_t * root Returns last node or NULL if the tree is empty cust_item_t * cust_prefix_first const cust_root_t * root Returns pointer to the first item of the tree or NULL cust_item_t * cust_prefix_last const cust_root_t * root Returns pointer to last item of the tree or NULL cust_item_t * cust_prefix_next const cust_root_t * root const cust_item_t * item Returns pointer to next item of the tree or NULL if there is no more items cust_item_t * cust_prefix_prev const cust_root_t * root const cust_item_t * item Returns pointer to previous item of the tree or NULL if there is no more items int cust_prefix_is_empty const cust_root_t * root Returns non-zero value if there is no node in the tree cust_item_t * cust_prefix_cut_first cust_root_t * root Returns the first item or NULL if the tree is empty. The returned item is unlinked from the tree without tree rebalance GAVL_CUST_NODE_INT_IMP GAVL_CUST_NODE_INT_IMP Implementation of new custom tree with internal node Synopsis GAVL_CUST_NODE_INT_IMP cust_prefix cust_root_t cust_item_t cust_key_t cust_root_node cust_item_node cust_item_key cust_cmp_fnc Arguments cust_prefix defines prefix for builded function names cust_root_t user defined structure type of root of the tree cust_item_t user defined structure type of items stored in the tree cust_key_t type of the key used for sorting of the items cust_root_node the field of the root structure pointing to the tree root node cust_item_node the field of item structure used for chaining of items cust_item_key the key field of item structure defining order of items cust_cmp_fnc the keys compare function Description There are two macros designed for building custom AVL trees. The macro GAVL_CUST_NODE_INT_DEC declares functions for custom tree manipulations and is intended for use in header files. The macro GAVL_CUST_NODE_INT_IMP builds implementations for non-inlined functions declared by GAVL_CUST_NODE_INT_DEC. The cust_cmp_fnc is used for comparison of item keys in the search and insert functions. The types of two input arguments of cust_cmp_fnc functions must correspond with cust_key_t type. Return value should be positive for case when the first pointed key value is greater then second, negative for reverse case and zero for equal pointed values. Examples of GAVL Usage Generic AVL Tree This chapter describes use of generic AVL tree. This tree has advantage, that same functions could operate with any user provided items, but operations return type is void* and needs type-casting to user item type. #include <malloc.h> #include "ul_gavl.h" The above code fragment includes basic GAVL declarations. typedef struct test1_item { int my_val; /* more user data ... */ } test1_item_t; New item type is declared. Type have to contain or be connected to some key value. The integer number my_val will be used as key value in this example. The tree root instance definition follows. gavl_root_t test1_tree={ .root_node = NULL, .node_offs = -1, .key_offs = UL_OFFSETOF(test1_item_t, my_val), .cmp_fnc = gavl_cmp_int }; The field root_node have to be initialized to NULL. The value -1 for the node_offs field directs GAVL functions to allocate external node structure for each inserted item. The macro UL_OFFSETOF computes offset of my_val field in test1_item_t item. The last assignment select one of predefined functions for key values comparison. Almost same declarations and definitions could be used if the node chaining structure is contained inside item structure. That is shown in the next code fragments. typedef struct test2_item { gavl_node_t my_node; int my_val; /* more user data ... */ } test2_item_t; The item declaration contains field with gavl_node_t type in this case. This field is used for storage of tree topology information. gavl_root_t test2_tree={ .root_node = NULL, .node_offs = UL_OFFSETOF(test2_item_t, my_node), .key_offs = UL_OFFSETOF(test2_item_t, my_val), .cmp_fnc = gavl_cmp_int }; The field node_offs contains offset of the node structure inside item now. Next fragments are part of the function working with defined tree. There would be no difference in the functions calls for items with external nodes. cust2_item_t *item; int i; gavl_node_t *node; Declare some local variables first. Insert 100 items into tree then. for(i=0;i<items_cnt;i++){ item=malloc(sizeof(test2_item_t)); item->my_val=i; if(gavl_insert(&test2_tree,item,0)<0) printf("gavl_insert error\n"); } The tree is expected to store items with unique key values. If more items with same key values should be allowed, then the value of mode parameter in gavl_insert function call have to be changed from 0 to GAVL_FAFTER. for(i=0;i<102;i++){ if(!(item=(cust2_item_t *)gavl_find(&test2_tree,&i))) printf("no item %d\n",i); else if(item->my_val!=i) printf("gavl_find mismatch\n"); } Some test of retrieving item corresponding to specified key. The tree in order traversal is in the next code sample. node=gavl_first_node(&test2_tree) while(node){ item=(cust2_item_t *)gavl_node2item(&test2_tree,node); /* do something with item * insert and delete allowed there except delete of the actual item */ node=gavl_next_node(node); /* do something with item * insert and delete allowed there except delete of next item */ } Example of the item deletion is in the next fragment. The function gavl_delete guarantees that trial of deleting item, which is not part of the tree, is detected and negative value is returned. There is one prerequisite for this behavior for items with internal nodes that have never been part of any tree. The field parent of node structure stored in the item have to be initialized to NULL value. for(i=0;i<80;i++){ if(!(item=(cust2_item_t *)gavl_find(&test2_tree,&i))) continue; if(gavl_delete(&test2_tree,item)<0) printf("gavl_delete error\n"); free(item); } The function gavl_cut_first can significantly simplify deletion of all nodes of the tree without need of traversal over the tree. This function does not rebalance the tree, but keeps tree consistent. while((item=(cust2_item_t *)gavl_cut_first(&test2_tree))!=NULL) free(item); Customized AVL Tree The function templates are provided for faster and type-safe custom tree definition and manipulation. These templates expect user defined items with internal node structures and key comparison function with key pointer input parameters. The type of these parameters is required to be pointer to the key type. Next code fragment is intended to be part of the declarations of user defined custom tree and items for some data storage purpose. The header with these definitions should be included from all modules directly working with specified tree. #include "ul_gavl.h" typedef int cust_key_t; typedef struct cust2_item { cust2_key_t my_val; gavl_node_t my_node; /* more user item data ... */ } cust2_item_t; typedef struct cust2_root { gavl_cust_root_field_t my_root; /* more user root/tree data ... */ } cust2_root_t; int cust_cmp_fnc(cust_key_t *a, cust_key_t *b); As can be seen from above code fragment, the key field with user declared type (cust_key_t) and internal node structure are required for items. The only one field with type gavl_cust_root_field_t is required for the custom tree root structure. The use of the macro GAVL_CUST_NODE_INT_DEC declares all tree manipulation functions for the custom tree. The function names are prefixed by prefix specified as the first parameter of macro invocation. This declaration can be included in user header files as well. GAVL_CUST_NODE_INT_DEC(cust2, cust2_root_t, cust2_item_t, cust2_key_t,\ my_root, my_node, my_val, cust_cmp_fnc) The implementation of the custom tree functions is realized by use of macro GAVL_CUST_NODE_INT_IMP with same parameters as for GAVL_CUST_NODE_INT_DEC macro. #include "ul_gavlcust.h" GAVL_CUST_NODE_INT_IMP(cust2, cust2_root_t, cust2_item_t, cust2_key_t,\ my_root, my_node, my_val, cust_cmp_fnc) cust2_root_t cust2_tree; Static tree root instance is included at last line of above code fragment as well. The next sample function for the custom tree shows same operation as have been described for generic tree. void test_cust_tree(void) { int i; cust2_item_t *item; /* build tree */ for(i=1;i<=100;i++){ item=malloc(sizeof(cust2_item_t)); item->my_val=i; if(cust2_insert(&cust2_tree,item)<0) printf("cust2_insert error\n"); } /* traverse tree */ printf("Custom tree cust2:\n"); for(item=cust2_first(&cust2_tree);item;item=cust2_next(&cust2_tree,item)) printf("%d ",item->my_val); printf("\n"); /* delete part of the tree */ for(i=1;i<=80;i++){ item=cust2_find(&cust2_tree,&i); if(cust2_delete(&cust2_tree,item)<0) printf("cust2_delete error\n"); free(item); } /* traverse in reverse order */ printf("Custom tree cust2:\n"); for(item=cust2_last(&cust2_tree);item;item=cust2_prev(&cust2_tree,item)) printf("%d ",item->my_val); printf("\n"); /* delete all remaining items */ while((item=cust2_cut_first(&cust2_tree))!=NULL) free(item); } Used Algorithms Background Tree Balancing Operations This section describes used algorithms for keeping AVL tree balanced. Next convention is used for height and balance computations for node $\textrm{X}$ X . The height of subtree starting at $\textrm{X}$ X is labeled as $x=height(\textrm{X})$ x = height (X ) . Height difference (balance factor) for node $\textrm{X}$ X is labeled as $\Delta x=height(left(\textrm{X}))-height(right(\textrm{X}))$ Δx = height(left (X)) - height(right(X)) . Tree Balancing Case 1
Tree balancing case 1 Before operation After operation
\begin{eqnarray*} \Delta s & = & n-b\qquad \geq 2\\ \Delta n & = & a-p\qquad \geq 0 \end{eqnarray*} Δs = n - b 2 Δn = a - p 0 The height of subtree $\textrm{S}$ S is defined by highest branch, which is branch grovinggrowing from node $\textrm{A}$ . This leads to next equations. \begin{eqnarray*} n & = & a+1\\ s & = & a+2\\ p & = & a-\Delta n\\ b & = & a+1-\Delta s \end{eqnarray*} n = a + 1 s = a + 2 p = a - Δn b = a + 1 - Δs The height of branches $\textrm{N}$ N and $\textrm{S}$ S is marked as $n1$ n1 and $s1$ s1 after balancing. \begin{eqnarray*} \Delta s1 & = & p-b\\ \Delta n1 & = & a-s1 \end{eqnarray*} Δs1 = p - b Δn1 = a - s1 \begin{eqnarray*} s1 & = & \max (p,b)+1\\ s1 & = & \max (a-\Delta n,a+1-\Delta s)+1\\ s1 & = & a+1+\max (-\Delta n,1-\Delta s)\\ s1 & = & a+1-\min (\Delta n,\Delta s-1) \end{eqnarray*} s1 = max(p,b) + 1 s1 = max(a - Δn,a + 1 - Δs) + 1 s1 = a + 1 + max(-Δn, 1 - Δs) s1 = a + 1 - min(Δn, Δs - 1) \begin{eqnarray*} \Delta s1 & = & a-\Delta n-a-1+\Delta s\\ \Delta s1 & = & \Delta s-\Delta n-1\\ \Delta n1 & = & a-a-1+\min (\Delta n,\Delta s-1)\\ \Delta n1 & = & \min (\Delta n-1,\Delta s-2)\\ \Delta n1 & = & \left\langle \begin{array}{cc} \Delta n-1 & \qquad \Delta s\geq \Delta n+1\\ \Delta s-2 & \qquad \Delta s\leq \Delta n+1\end{array} \right. \end{eqnarray*} Δs1 = a - Δn - a - 1 + Δs Δs1 = Δs - Δn - 1 Δn1 = a - a - 1 + min(Δn, Δs - 1) Δn1 = min(Δn - 1, Δs - 2) Δn1 = Δn - 1Δs Δn + 1 Δs - 2 Δs Δn + 1 Because balancing in case 1 does not guarantee that new tree has lower height than original tree, it is necessary to compute tree height change (tree height lowering). \begin{eqnarray*} s-n1 & = & s-(\max (a,s1)+1)\\ & = & a+2-(\max (a,a+1+\max (-\Delta n,1-\Delta s))+1)\\ & = & 1-\max (0,\max (1-\Delta n,2-\Delta s))\\ & = & \min (1,\Delta n,\Delta s-1)\qquad \Delta n\geq 0,\Delta s\geq 2\\ & = & \min (1,\Delta n)\\ s-n1 & = & \left\langle \begin{array}{cc} 1 & \qquad \Delta n\neq 0\\ 0 & \qquad \Delta n=1\end{array} \right. \end{eqnarray*} s - n1 = s - (max(a,s1) + 1) = a + 2 - (max(a,a + 1 + max(-Δn, 1 - Δs)) + 1) = 1 - max(0, max(1 - Δn, 2 - Δs)) = min(1, Δn, Δs - 1)Δn 0, Δs 2 = min(1, Δn) s - n1 = 1 Δn0 0 Δn = 1
Tree Balancing Case 2
Tree balancing case 2 Before operation After operation
This case has advantage, that it reduces tree height for all node height combinations. Tree height is given by branch $\textrm{P}$ P $\textrm{B}$ B or $\textrm{P}$ P $\textrm{C}$ C . \begin{eqnarray*} \Delta s & = & n-d\qquad \geq 2\\ \Delta n & = & a-p\qquad \leq -1 \end{eqnarray*} Δs = n - d 2 Δn = a - p -1 \begin{eqnarray*} n & = & p+1\\ s & = & p+2\\ d & = & p+1-\Delta s\\ a & = & p+\Delta n \end{eqnarray*} n = p + 1 s = p + 2 d = p + 1 - Δs a = p + Δn \begin{eqnarray*} \textrm{for}\, b\geq c\qquad c & = & b-\Delta p\qquad p=b+1\\ \textrm{for}\, c\geq b\qquad b & = & c+\Delta p\qquad p=c+1 \end{eqnarray*} forb cc = b - Δpp = b + 1 forc bb = c + Δpp = c + 1 When $b\geq c$ b c then the height differences $\Delta s1$ Δs1 , $\Delta n1$ Δn1 and $\Delta p1$ Δp1 for nodes $\textrm{S}$ S , $\textrm{N}$ N and $\textrm{P}$ S of the balanced tree can be computed from next equations. \begin{eqnarray*} b\geq c & , & \Delta p\geq 0\\ n1 & = & b+1\\ p1 & = & b+2\\ s1 & = & \max (c,d)+1=\max (b-\Delta p,b+2-\Delta s)+1\\ \Delta s1 & = & c-d=b-\Delta p-b-2+\Delta s\\ \Delta s1 & = & \Delta s-2-\Delta p\\ \Delta n1 & = & a-b=b+1+\Delta n-b\\ \Delta n1 & = & \Delta n+1\\ \Delta p1 & = & n1-s1=b+1-\max (b-\Delta p,b+2-\Delta s)-1\\ \Delta p1 & = & \min (\Delta p,\Delta s-2)\qquad \Delta p\geq 0,\Delta s\geq 2 \end{eqnarray*} b c , Δp 0 n1 = b + 1 p1 = b + 2 s1 = max(c,d) + 1 = max(b - Δp,b + 2 - Δs) + 1 Δs1 = c - d = b - Δp - b - 2 + Δs Δs1 = Δs - 2 - Δp Δn1 = a - b = b + 1 + Δn - b Δn1 = Δn + 1 Δp1 = n1 - s1 = b + 1 - max(b - Δp,b + 2 - Δs) - 1 Δp1 = min(Δp, Δs - 2)Δp 0, Δs 2 When $c\geq b$ c b then the height differences $\Delta s1$ Δs1 , $\Delta n1$ Δn1 and $\Delta p1$ Δp1 for nodes $\textrm{S}$ S , $\textrm{N}$ N and $\textrm{P}$ P of the balanced tree can be computed from next equations. \begin{eqnarray*} c\geq b & , & \Delta p\leq 0\\ s1 & = & c+1\\ p1 & = & c+2\\ n1 & = & \max (a,b)+1=\max (c+1+\Delta n,c+\Delta p)+1\\ \Delta s1 & = & c-d=c-c-2+\Delta s\\ \Delta s1 & = & \Delta s-2\\ \Delta n1 & = & a-b=c+1+\Delta n-c-\Delta p\\ \Delta n1 & = & \Delta n+1-\Delta p\\ \Delta p1 & = & n1-s1=\max (c+1+\Delta n,c+\Delta p)+1-c-1\\ \Delta p1 & = & \max (\Delta n,\Delta s-1)\qquad \Delta n\leq -1,\Delta p\leq 0 \end{eqnarray*} c b , Δp 0 s1 = c + 1 p1 = c + 2 n1 = max(a,b) + 1 = max(c + 1 + Δn,c + Δp) + 1 Δs1 = c - d = c - c - 2 + Δs Δs1 = Δs - 2 Δn1 = a - b = c + 1 + Δn - c - Δp Δn1 = Δn + 1 - Δp Δp1 = n1 - s1 = max(c + 1 + Δn,c + Δp) + 1 - c - 1 Δp1 = max(Δn, Δs - 1)Δn -1, Δp 0
Height of the AVL TreeMinimal number of nodes $T$ T can be computed recursively. \[ T(h)=T(h-1)+T(h-2)+1,\, T(0)=0,\, T(1)=1\] T(h) = T(h - 1) + T(h - 2) + 1,T(0) = 0,T(1) = 1 1, 2, 4, 7, 12, 20, 33, 54, 88, 143, 232, 376, 609, 986 \[ T(h)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^{(h+2)}-1\] T(h) = 1 5 1 + 5 2 (h+2) - 1 \[ T(h)=\frac{1}{\sqrt{5}}\, 2^{(h+2)\, log_{2}\left(\frac{1+\sqrt{5}}{2}\right)}-1\] T(h) = 1 52(h+2)log21+5 2 - 1 \[ h=\left(log_{2}(T+1)-log_{2}\left(\frac{1}{\sqrt{5}}\right)\right)log_{2}^{-1}\left(\frac{1+\sqrt{5}}{2}\right)-2\] h = log2(T + 1) - log2 1 5 log2-1 1 + 5 2 - 2 \[ log_{2}^{-1}\left(\frac{1+\sqrt{5}}{2}\right)=1.44042\] log2-1 1 + 5 2 = 1.44042 An AVL tree with n nodes has height between log2 (n + 1) and 1.44 * log2 (n + 2) - 0.328. An AVL tree with height h has between pow (2, (h + .328) / 1.44) - 1 and pow (2, h) - 1 nodes.A red-black tree with n nodes has height at least log2 (n + 1) but no more than 2 * log2 (n + 1). A red-black tree with height h has at least pow (2, h / 2) - 1 nodes but no more than pow (2, h) - 1.