Post

AVL Trees

Mastering AVL Trees with C

AVL Trees

Height Balance: Mastering AVL Trees with C

Binary Search Trees (BSTs) are fantastic data structures for organizing data, offering potentially fast search, insertion, and deletion operations (O(log n) on average). However, their performance heavily depends on the order of insertion. In the worst-case scenario (e.g., inserting sorted data), a BST can degenerate into a linked list, making these operations take O(n) time.

This is where self-balancing binary search trees come to the rescue! The AVL Tree, named after its inventors Adelson-Velsky and Landis, was the first such data structure. It guarantees O(log n) time complexity for fundamental operations by ensuring the tree remains “height-balanced” after every insertion and deletion.

This post will dive deep into:

  1. What height balance means in the context of AVL trees.
  2. Why AVL trees are necessary (the problem with unbalanced BSTs).
  3. The core mechanism: Tree Rotations (LL, RR, LR, RL).
  4. A complete implementation of AVL Trees in C, including insertion and traversal.
  5. Analysis of AVL tree performance.

Let’s get started!

What is Height Balance? The AVL Property

An AVL tree is a BST with an additional balance property. For every node in the tree, the heights of its left and right subtrees can differ by at most 1.

To enforce this, we define the Balance Factor for a node:

Balance Factor (Node) = height(Node -> left_subtree) - height(Node -> right_subtree)

  • Height of a node: The number of edges on the longest path from the node to a leaf. The height of a NULL subtree is typically defined as -1.
  • AVL Property: For every node in an AVL tree, its Balance Factor must be -1, 0, or 1.
    • -1: The left subtree is one level deeper than the right subtree.
    • 0: The left and right subtrees have the same height.
    • 1: The right subtree is one level deeper than the left subtree.

If, after an insertion or deletion, any node’s balance factor becomes less than -1 or greater than 1, the tree is considered unbalanced, and rebalancing operations (rotations) are performed.

Why AVL Trees? The Peril of Unbalanced BSTs

Imagine inserting keys 1, 2, 3, 4, 5 into a standard BST:

1
2
3
4
5
6
7
8
9
1
 \
  2
   \
    3
     \
      4
       \
        5

This tree has degenerated into a linked list. Searching for 5 requires traversing all 5 nodes (O(n)).

An AVL tree prevents this. By maintaining the height balance property, it ensures the tree’s height remains logarithmic with respect to the number of nodes (approximately 1.44 * log2(n)), guaranteeing O(log n) performance for search, insertion, and deletion in the worst case.

Restoring Balance: AVL Tree Rotations

When an insertion or deletion violates the AVL property (Balance Factor becomes < -1 or > 1), we need to restructure the tree locally to restore balance. This is achieved through rotations.

There are four main imbalance scenarios and corresponding rotation types:

Let z be the unbalanced node (Balance Factor < -1 or > 1). Let y be the child of z in the taller subtree. Let x be the child of y in the taller subtree (or the newly inserted node’s path).

  1. Left-Left (LL) Imbalance:
    • Cause: Insertion into the left subtree of the left child of z.
    • Balance Factors: balanceFactor(z) < -1, balanceFactor(y) <= 0.
    • Solution: Perform a Right Rotation at z.
    1
    2
    3
    4
    5
    6
    7
    
          z (-2)                   y (0 or -1)
         / \                     /   \
        y (-1 or 0)  T4  ==>    x     z (+1 or 0)
       / \                     / \   / \
      x   T3                 T1  T2 T3  T4
     / \
    T1  T2
    
  2. Right-Right (RR) Imbalance:
    • Cause: Insertion into the right subtree of the right child of z.
    • Balance Factors: balanceFactor(z) > 1, balanceFactor(y) >= 0.
    • Solution: Perform a Left Rotation at z.
    1
    2
    3
    4
    5
    6
    7
    
      z (+2)                       y (0 or +1)
     / \                         /   \
    T1  y (+1 or 0)   ==>       z     x (-1 or 0)
       / \                     / \   / \
      T2  x                  T1  T2 T3  T4
         / \
        T3  T4
    
  3. Left-Right (LR) Imbalance:
    • Cause: Insertion into the right subtree of the left child of z.
    • Balance Factors: balanceFactor(z) < -1, balanceFactor(y) > 0.
    • Solution: Perform a Left Rotation at y, followed by a Right Rotation at z.
    1
    2
    3
    4
    5
    6
    7
    
         z (-2)                  z (-2)                  x (0)
        / \                    / \                    /   \
       y (+1)  T4   ==>       x (+1 or 0) T4  ==>   y     z
      / \                    / \                    / \   / \
     T1  x                  y   T3                T1 T2a T2b T4
        / \                / \
       T2a T2b            T1 T2a (or T2b)
    
  4. Right-Left (RL) Imbalance:
    • Cause: Insertion into the left subtree of the right child of z.
    • Balance Factors: balanceFactor(z) > 1, balanceFactor(y) < 0.
    • Solution: Perform a Right Rotation at y, followed by a Left Rotation at z.
    1
    2
    3
    4
    5
    6
    7
    
       z (+2)                  z (+2)                     x (0)
      / \                    / \                       /   \
     T1  y (-1)    ==>      T1  x (-1 or 0)  ==>      z     y
        / \                    /  \                   / \   / \
       x   T4                 T2a  y                 T1 T2a T2b T4
      / \                         / \
     T2a T2b                     T2b T4
    

Key Points about Rotations:

  • They preserve the BST property (in-order traversal remains sorted).
  • They locally fix the height imbalance.
  • They take constant time, O(1).
  • Heights of the affected nodes must be recalculated after the rotation.

Implementation in C

Let’s implement an AVL Tree in C.

1. Node Structure:

We need a structure to represent a node in the AVL tree. It will store the key, pointers to the left and right children, and the node’s height.

1
2
3
4
5
6
7
8
9
10
#include <stdio.h>
#include <stdlib.h>

// Structure for an AVL tree node
typedef struct AVLNode {
    int key;
    struct AVLNode *left;
    struct AVLNode *right;
    int height; // Height of the node
} AVLNode;

2. Helper Functions:

These functions simplify the main logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Function to get the height of a node
// Height of NULL tree is -1, height of leaf node is 0
int height(AVLNode *node) {
    if (node == NULL)
        return -1;
    return node->height;
}

// Function to get the maximum of two integers
int max(int a, int b) {
    return (a > b) ? a : b;
}

// Function to create a new AVL node
AVLNode *newNode(int key) {
    AVLNode *node = (AVLNode *)malloc(sizeof(AVLNode));
    if (!node) {
        perror("Memory allocation failed");
        exit(EXIT_FAILURE);
    }
    node->key = key;
    node->left = NULL;
    node->right = NULL;
    node->height = 0; // New node is initially added at leaf (height 0)
    return node;
}

// Function to get the Balance Factor of a node
int getBalanceFactor(AVLNode *node) {
    if (node == NULL)
        return 0;
    // Convention: height(NULL) is -1
    return height(node->left) - height(node->right);
}

// Function to update the height of a node
// Must be called whenever the children of a node change
void updateHeight(AVLNode *node) {
    if (node != NULL) {
        node->height = 1 + max(height(node->left), height(node->right));
    }
}

3. Rotation Functions:

Implement the Right and Left rotations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Right rotate subtree rooted with y
// See diagram for LL imbalance
AVLNode *rightRotate(AVLNode *y) {
    AVLNode *x = y->left;
    AVLNode *T2 = x->right;

    // Perform rotation
    x->right = y;
    y->left = T2;

    // Update heights (Order matters: update y first, then x)
    updateHeight(y);
    updateHeight(x);

    // Return new root of the rotated subtree
    return x;
}

// Left rotate subtree rooted with x
// See diagram for RR imbalance
AVLNode *leftRotate(AVLNode *x) {
    AVLNode *y = x->right;
    AVLNode *T2 = y->left;

    // Perform rotation
    y->left = x;
    x->right = T2;

    // Update heights (Order matters: update x first, then y)
    updateHeight(x);
    updateHeight(y);


    // Return new root of the rotated subtree
    return y;
}

4. Insertion Function:

This function inserts a key like a standard BST, then checks for imbalance and performs rotations if necessary. It returns the new root of the modified subtree (which might change due to rotations).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Insert a node into the AVL tree
AVLNode *insertNode(AVLNode *node, int key) {
    // 1. Perform standard BST insertion
    if (node == NULL)
        return newNode(key);

    if (key < node->key)
        node->left = insertNode(node->left, key);
    else if (key > node->key)
        node->right = insertNode(node->right, key);
    else // Duplicate keys are not allowed in this implementation
        return node;

    // 2. Update height of this ancestor node
    updateHeight(node);

    // 3. Get the balance factor of this ancestor node
    int balance = getBalanceFactor(node);

    // 4. If the node becomes unbalanced, then there are 4 cases

    // Left Left Case (LL)
    if (balance < -1 && key < node->left->key) {
        printf("Performing Right Rotation on node %d\n", node->key);
        return rightRotate(node);
    }

    // Right Right Case (RR)
    if (balance > 1 && key > node->right->key) {
         printf("Performing Left Rotation on node %d\n", node->key);
        return leftRotate(node);
    }

    // Left Right Case (LR)
    if (balance < -1 && key > node->left->key) {
        printf("Performing LR Rotation on node %d\n", node->key);
        node->left = leftRotate(node->left);
        return rightRotate(node);
    }

    // Right Left Case (RL)
    if (balance > 1 && key < node->right->key) {
         printf("Performing RL Rotation on node %d\n", node->key);
        node->right = rightRotate(node->right);
        return leftRotate(node);
    }

    // Return the (possibly updated) node pointer
    return node;
}

Self-correction: Changed balance > 1 to balance < -1 for LL/LR and balance < -1 to balance > 1 for RR/RL as per standard Balance Factor definition (Left Height - Right Height). Correction 2: The conditions for determining LL/RR vs LR/RL depend on the relationship between the *inserted key and the key in the child node (y in the diagrams). Corrected the conditions inside the if statements.*

5. Traversal Function (for testing):

An in-order traversal prints the keys in sorted order, confirming the BST property.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// In-order traversal of the AVL tree (prints sorted keys)
void inOrder(AVLNode *root) {
    if (root != NULL) {
        inOrder(root->left);
        printf("%d(h=%d, bf=%d) ", root->key, root->height, getBalanceFactor(root));
        inOrder(root->right);
    }
}

// Pre-order traversal (useful for visualizing tree structure)
void preOrder(AVLNode *root) {
     if (root != NULL) {
        printf("%d(h=%d, bf=%d) ", root->key, root->height, getBalanceFactor(root));
        preOrder(root->left);
        preOrder(root->right);
    }
}

6. Example Usage (main function):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
int main() {
    AVLNode *root = NULL;

    // Insert some nodes
    root = insertNode(root, 10);
    root = insertNode(root, 20);
    root = insertNode(root, 30); // Triggers RR imbalance -> Left Rotation at 10
    root = insertNode(root, 40);
    root = insertNode(root, 50); // Triggers RR imbalance -> Left Rotation at 30
    root = insertNode(root, 25); // Triggers RL imbalance -> Right rotation at 50, Left rotation at 40

    printf("In-order traversal of the constructed AVL tree:\n");
    inOrder(root);
    printf("\n");

    printf("Pre-order traversal of the constructed AVL tree:\n");
    preOrder(root);
    printf("\n");

    // Clean up memory (implement a freeTree function for production code)
    // freeTree(root);

    return 0;
}

// TODO: Implement a freeTree function to prevent memory leaks
// void freeTree(AVLNode* node) { ... }

Note on Deletion: Implementing deletion in AVL trees follows a similar pattern: perform standard BST deletion (handling 0, 1, or 2 children, potentially involving finding the in-order successor/predecessor), then trace back up the path, updating heights and performing rotations as needed to rebalance the tree. The logic for choosing rotations after deletion is slightly different but based on the same principles and balance factors.

Analysis of AVL Trees

  • Time Complexity:
    • Search: O(log n) - Guaranteed due to the height balance.
    • Insertion: O(log n) - Involves a standard BST insertion (O(log n)) followed by height updates and potentially a constant number of rotations (at most two for LR/RL) along the path back to the root (O(log n)).
    • Deletion: O(log n) - Similar reasoning to insertion; BST deletion plus O(log n) path traversal for balancing.
  • Space Complexity: O(n) - To store the n nodes. Each node requires a constant amount of extra space for the height information.

AVL Trees vs. Standard BSTs:

  • Pros: Guaranteed logarithmic performance, avoiding worst-case O(n) scenarios. Excellent for lookup-intensive applications.
  • Cons: More complex implementation than standard BSTs. Insertions and deletions are slightly slower due to the overhead of height checks and potential rotations (though still O(log n)). The balancing criteria are very strict, which might lead to more frequent rotations compared to other balanced trees like Red-Black trees.

Conclusion

AVL trees are a fundamental example of self-balancing binary search trees, demonstrating how maintaining a simple height-balance property can provide robust O(log n) performance guarantees. By understanding the balance factor and the four types of rotations (LL, RR, LR, RL), you can implement efficient AVL trees in C (or any other language). While other balanced trees like Red-Black trees might offer slightly better insertion/deletion performance in practice due to less frequent rotations, AVL trees remain a cornerstone concept in data structures and algorithms, providing a clear and effective way to conquer the worst-case behavior of standard BSTs.

This post is licensed under CC BY 4.0 by the author.