We may get confused with the terms Binary Tree, Binary Search Tree, B Tree and B+ Tree. B Tree is also called Balanced Search Tree and B+ tree is something like B tree having some other features and conditions applied to it. We first remove our confusion learning the difference between these terms.

■ What is Balanced Tree?

A balanced tree means that all searches for individual values require the same number of nodes to be read from the disc. The B+-Tree is called a balanced tree because every path from the root node to a leaf node is the same length.

■ Difference between Binary Tree and Binary Search Tree

Binary tree is a generic version of binary search tree and its not ordered.

While constructing, binary tree we follow a rule that every node should have at most two children.

Whereas in case of BST, along with at most two children rule, we follow below rules

  1. All left descendants should possess smaller values than root value.

  2. All right descendants should possess larger values than root value.

BST are effective only when we are dealing with data which resides in main memory (RAM).

■ Differences between a B tree and a B+ tree

B+ Trees are different from B Trees with following two properties:

  1. B+ trees dont store data pointer in interior nodes, they are ONLY stored in leaf nodes. This is not optional as in B-Tree. This means that interior nodes can fit more keys on block of memory.
  2. The leaf nodes of B+ trees are linked, so doing a linear scan of all keys will requires just one pass through all the leaf nodes. A B tree, on the other hand, would require a traversal of every level in the tree. This property can be utilized for efficient search as well since data is stored only in leafs.

■ Details On B+ Tree

A B+ tree is a data structure often used in the implementation of database indexes. Each node of the tree contains an ordered list of keys and pointers to lower level nodes in the tree. These pointers can be thought of as being between each of the keys. To search for or insert an element into the tree, one loads up the root node, finds the adjacent keys that the searched-for value is between, and follows the corresponding pointer to the next node in the tree. Recursing eventually leads to the desired value or the conclusion that the value is not present.

B+ trees use some clever balancing techniques to make sure that all of the leaves are always on the same level of the tree, that each node is always at least half full (rounded) of keys, and (therefore) that the height of the tree is always at most ceiling(log(k)) of base ceiling(n/2) where k is the number of values in the tree and n is the maximum number of pointers (= maximum number of nodes + 1) in each block. This means that only a small number of pointer traversals is necessary to search for a value if the number of keys in a node is large. This is crucial in a database because the B+ tree is on disk. Reading a single block takes just as much time as reading a partial block, and a block can hold a large number of keys/pointers (128-1024 or so).

Finally, the leaves of the B+ tree all contain a next sibling pointer for fast iteration through a contiguous block of values. This allows for extremely efficient range queries (find the block containing the first value and blast through siblings until the last value is found). Its especially efficient if the implementation guarantees that leaf blocks are contiguous on disk, though this is a challenge as far as I know.

B+ trees can also be used outside of the disk, but generally a balanced binary search tree or a skip list or something should provide better performance in memory, where pointer following is no more expensive than finding the right pointer to follow.

B trees,B+ trees comes handy when we have huge which cant fit in main memory and has to perform disk seek operations. These two self balancing trees will reduces no.of disk seeks.

B+ Tree Structure

Fig: B+ Tree Structure

A B+-Tree consists of one or more blocks of data, called nodes, linked together by pointers.

■ The B+ Tree tree structure

The tree has a single node at the top, called the root node. The root node points to two or more blocks, called child nodes. Each child nodes points to further child nodes and so on.

The B+-Tree consists of two types of (1) internal nodes and (2) leaf nodes

  • Internal nodes point to other nodes in the tree.
  • Leaf nodes point to data in the database using data pointers. Leaf nodes also contain an additional pointer, called the sibling pointer, which is used to improve the efficiency of certain types of search.
  • All the nodes in a B+-Tree must be at least half full except the root node which may contain a minimum of two entries. The algorithms that allow data to be inserted into and deleted from a B+-Tree guarantee that each node in the tree will be at least half full.
  • Searching for a value in the B+-Tree always starts at the root node and moves downwards until it reaches a leaf node.
  • Both internal and leaf nodes contain key values that are used to guide the search for entries in the index.
  • The B+ Tree is called a balanced tree because every path from the root node to a leaf node is the same length. A balanced tree means that all searches for individual values require the same number of nodes to be read from the disc.
  • An internal node in a B+-Tree consists of a set of key values and pointers. The set of keys and values are ordered so that a pointer is followed by a key value. The last key value is followed by one pointer.
  • Each pointer points to nodes containing values that are less than or equal to the value of the key immediately to its right. For instance, in the example, above, the first pointer of the first node from level 1 (containing only 3) points to a node containing key values that are less than or equal to 3.
  • The last pointer in an internal node is called the infinity pointer. The infinity pointer points to a node containing key values that are greater than the last key value in the node.
  • When an internal node is searched for a key value, the search begins at the leftmost key value and moves rightwards along the keys.

If the key value is less than the sought key then the pointer to the left of the key is known to point to a node containing keys less than the sought key.

If the key value is greater than or equal to the sought key then the pointer to the left of the key is known to point to a node containing keys between the the previous key value and the current key value.

Fig 2: Example B+ Tree

Fig 2: Example B+ Tree

  • A leaf node in a B+-Tree consists of a set of key values and data pointers. Each key value has one data pointer. The key values and data pointers are ordered by the key values.
  • The data pointer points to a record or block in the database that contains the record identified by the key value. For instance, in the example, above, the pointer attached to key value 7 points to the record identified by the value 7.
  • Searching a leaf node for a key value begins at the leftmost value and moves rightwards until a matching key is found.
  • The leaf node also has a pointer to its immediate sibling node in the tree. The sibling node is the node immediately to the right of the current node. Because of the order of keys in the B+-Tree the sibling pointer always points to a node that has key values that are greater than the key values in the current node.
  • Leaf nodes always appear at the bottom of the B+-Tree structure

■ Properties of a B+ Tree

The root node points to at least two nodes.

  1. All non-root nodes are at least half full.
  2. For a tree of order m, all internal nodes have *m-*1 keys and m pointers.
  3. A B+-Tree grows upwards.
  4. A B+-Tree is balanced.
  5. Sibling pointers allow sequential searching.

■ How B+ Tree Grows

  • The root node in a B+-Tree can have a minimum of one key and two pointers. This is because as the tree grows it is necessary to create a new root by splitting the current root. Splitting the current root creates two nodes which must be pointed to by the new root.
  • All non-root nodes in the B+-Tree, that is, internal and leaf nodes, can be at least half full. This is because when nodes are full they are split in half. When keys are deleted from the tree, nodes are re-organized to guarantee each node is half full.
  • The order of a B+-Tree is the maximum number of pointers in an internal node.
  • Because the B+-Tree always fills up from the leaf nodes it only possible to split the root node after all nodes between it and a leaf node have been split. Hence, the tree grows upwards.
  • As all paths from the root node to a leaf node are the same length, the tree is said to be balanced. A balanced tree guarantees that all searches for individual keys require the same number of nodes to be read because all the paths from the root to the leaf nodes are the same length.
  • The sibling pointers between leaf nodes allow sequential searches to be carried out efficiently. The sibling pointer always points to a node that contains keys greater than the keys in the current node.

■ B+ Tree Data Insertion

Algorithm

  1. If the bucket is not full (at most b 1 entries after the insertion), add the record.
  2. Otherwise, split the bucket.
    1. Allocate new leaf and move half the buckets elements to the new bucket.
    2. Insert the new leafs smallest key and address into the parent.
    3. If the parent is full, split it too.
      1. Add the middle key to the parent node.
    4. Repeat until a parent is found that need not split.
  3. If the root splits, create a new root which has one key and two pointers. (That is, the value that gets pushed to the new root gets removed from the original node)

■ Illustration of some random data insertion in a B+ tre

**** Insert 5 ****

[|5|]

**** Insert 9 ****

[|5|9|]

**** Insert 1 ****

[|1|5|9|]

**** Insert 3 ****

[|5|] [|1|3|] [|5|9|]

**** Insert 4 ****

[|5|] [|1|3|4|] [|5|9|]

**** Insert 59 ****

[|5|] [|1|3|4|] [|5|9|59|]

**** Insert 65 ****

[|5|59|] [|1|3|4|] [|5|9|] [|59|65|]

**** Insert 45 ****

[|5|59|] [|1|3|4|] [|5|9|45|] [|59|65|]

**** Insert 89 ****

[|5|59|] [|1|3|4|] [|5|9|45|] [|59|65|89|]

**** Insert 29 ****

[|5|29|59|] [|1|3|4|] [|5|9|] [|29|45|] [|59|65|89|]

**** Insert 68 ****

[|59|] [|5|29|] [|68|] [|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|]

**** Insert 108 ****

[|59|] [|5|29|] [|68|] [|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|108|] **** Insert 165 ****

[|59|] [|5|29|] [|68|108|] [|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|]

**** Insert 298 ****

[|59|] [|5|29|] [|68|108|] [|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|298|]

**** Insert 219 ****

[|59|] [|5|29|] [|68|108|219|] [|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|] [|219|298|]

**** Insert 569 ****

[|59|] [|5|29|] [|68|108|219|] [|1|3|4|] [|5|9|] [|29|45|] [|59|65|] [|68|89|] [|108|165|] [|219|298|569|]

**** Insert 37 ****

[|59|] [|5|29|] [|68|108|219|] [|1|3|4|] [|5|9|] [|29|37|45|] [|59|65|] [|68|89|] [|108|165|] [|219|298|569|]

**** Insert 47 ****

[|59|] [|5|29|45|] [|68|108|219|] [|1|3|4|] [|5|9|] [|29|37|] [|45|47|] [|59|65|] [|68|89|] [|108|165|] [|219|298|569|]

■ B+ Tree Data Deletion

Algorithm:

  1. Start at the root and go up to leaf node containing the key K
  2. Find the node n on the path from the root to the leaf node containing K
    1. If n is root, remove K
      1. if root has mode than one keys, done

      2. if root has only K

        1. if any of its child node can lend a node
          1. Borrow key from the child and adjust child links
        2. Otherwise merge the children nodes it will be new root
      3. If n is a internal node, remove K

        1. If n has at lease ceil(m/2) keys, done!
        2. If n has less than ceil(m/2) keys,
          1. If a sibling can lend a key,

            1. Borrow key from the sibling and adjust keys in n and the parent node
            2. Adjust child links
          2. Else

            1. Merge n with its sibling
            2. Adjust child links
      4. If n is a leaf node, remove K

        1. If n has at least ceil(M/2) elements, done!
          1. In case the smallest key is deleted, push up the next key
        2. If n has less than ceil(m/2) elements
          1. If the sibling can lend a key

            1. Borrow key from a sibling and adjust keys in n and its parent node
          2. Else

            1. Merge n and its sibling
            2. Adjust keys in the parent node

Note: Sibling is one of the adjusting node with same parent as n

■ Illustration of some random data deletion from an existing B+ tree

Suppose we have the following B+ tree, and we want to delete some random data

[|45|] [|5|29|] [|51|59|165|] [|1|3|4|] [|5|9|] [|29|37|43|] [|45|47|49|] [|51|53|57|] [|59|61|65|] [|165|298|]

Let’s delete 298, 45, 1, 3, 47, 53, 37, 165, 57, 49 from the tree and see how the structure changes after each deletion.

---- Delete 298 ----

[|29|] [|5|] [|45|51|59|65|] [|1|3|4|] [|5|9|] [|29|37|43|] [|45|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

---- Delete 45 ----

[|29|] [|43|] [|43|51|59|65|] [|1|3|4|] [|5|9|] [|29|37|] [|43|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

---- Delete 1 ----

[|43|] [|43|29|] [|51|59|65|] [|3|4|] [|5|9|] [|29|37|] [|43|47|49|] [|51|53|57|] [|59|61|] [|65|165|] ---- Delete 3 ----

[|51|] [|29|43|] [|59|65|] [|4|5|9|] [|29|37|] [|43|47|49|] [|51|53|57|] [|59|61|] [|65|165|]

---- Delete 47 ----

[|59|] [|29|43|51|] [|65|] [|4|5|9|] [|29|37|] [|43|49|] [|51|53|57|] [|59|61|] [|65|165|]

---- Delete 53 ----

[|59|] [|29|43|51|] [|65|] [|4|5|9|] [|29|37|] [|43|49|] [|51|57|] [|59|61|] [|65|165|] ---- Delete 37 ----

[|59|] [|9|43|51|] [|65|] [|4|5|] [|9|29|] [|43|49|] [|51|57|] [|59|61|] [|65|165|]

---- Delete 165 ----

[|51|] [|9|43|] [|59|] [|4|5|] [|9|29|] [|43|49|] [|51|57|] [|59|61|65|]

---- Delete 57 ----

[|43|] [|9|] [|51|61|] [|4|5|] [|9|29|] [|43|49|] [|51|59|] [|61|65|]

---- Delete 49 ----

[|9|43|61|] [|4|5|] [|9|29|] [|43|51|59|] [|61|65|]

Three would come many condition while implementing the B+ tree in code. But the idea is generalize by the algorithm written above. There typically comes three types of condition handling

  1. Handling root
  2. Handling merging, redistribution among leaf nodes
  3. Handling merging, redistribution among non leaf nodes

Handling values and pointers carefully would results in a perfect implementation of B+ tree. I implemented my own B+ tree. Here is my implemented B+ Tree code link. If there is any bug in the code, please notify me, I will try to fix the bug soon.

Resources:

  1. Wikipedia : B+ Tree
  2. B+ Tree Lecture
  3. http://loveforprogramming.quora.com/Memory-locality-the-magic-of-B-Trees
  4. https://www.quora.com/What-is-a-B+-Tree