Summary:
This paper proposed a new solution to support concurrent access on B-trees which reside on secondary storage. Basically a new link pointer is added to the B-tree data structure to chain all the pages in the same level into a linked list. The intent of doing this is that if an insertion split a node into two, other processes will still see a "logically single node" by following the link pointer, thus the consistency of the tree is not lost.
The the search for the B-tree will proceed as usual, without grabbing any locks. However if the search key exceed the highest value in a node (as indicated by the high key), it indicates that the tree structure has been changed, and we should follow the link pointer to get the split node. Insertion into the B-tree is divided into two phase: the search phase to locate where to inserte, which is identical to normal search; and the actual insertion phase, where we lock a node, split it as necessary, and add appropriate link pointers. An insertion to the parent node is also needed, if split happens, and it worked the same as leaf node. A stack of the rightmost node we examined at each level during the descent through the tree is maintained, to urge the case when we backtrack the tree.
An informal proof of the correctness of this solution is given: it ensures that each process sees a consistent state of the B-tree, and prevents deadlock. Livelock, on the other hand, is possible if a process keep following the link pointer other processes created. This solution is highly concurrent because an insertion process at most hold three locks simultaneously and a search process don't need to grab any lock (not even shared lock).
The whole model they developed is based on the assumption that the whole B-tree sits on disk, and every process has its own private memory cache. However, I don't think this is very pratical. A more common case would be that the first and second level of the B-tree sit in main memory and is accessed by all the process. How to correctly access this kind of B-tree correctly remains a problem.
没有评论:
发表评论