You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@directory.apache.org by el...@apache.org on 2013/09/21 16:54:58 UTC

svn commit: r1525247 [1/3] - in /directory/site/trunk/content/mavibot/user-guide: 2.1-file-format.mdtext images/nodeLeaf.graphml images/nodeLeaf.png images/nodeLeaf2.graphml images/nodeLeaf2.png

Author: elecharny
Date: Sat Sep 21 14:54:57 2013
New Revision: 1525247

URL: http://svn.apache.org/r1525247
Log:
added some content

Added:
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf2.graphml
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf2.png   (with props)
Modified:
    directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf.graphml
    directory/site/trunk/content/mavibot/user-guide/images/nodeLeaf.png

Modified: directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext
URL: http://svn.apache.org/viewvc/directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext?rev=1525247&r1=1525246&r2=1525247&view=diff
==============================================================================
--- directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext (original)
+++ directory/site/trunk/content/mavibot/user-guide/2.1-file-format.mdtext Sat Sep 21 14:54:57 2013
@@ -119,7 +119,7 @@ Note that each *BTreeHeader* has at leas
 
 #### The Nodes and Leaves
 
-Nodes and Leaves are logical *BTree* pages which are serialized on disk into one to many *PageIO*s. They have slightly different data structures, as *Node*s contains pointers to *Leaves*, and no data, while *Leaves* contains data. In any case, both contain the keys.
+Nodes and Leaves are logical *BTree* pages which are serialized on disk into one to many *PageIO*s. They have slightly different data structures, as *Node*s contains pointers to *Leaves*, and no data, while *Leaves* contains data. In any case, both contain the keys. The *Node* has one ore value than the *Leaf*, too.
 
 On disk, each *Node* and *Leaf* are stored in *PageIO*s, as we said. A *Node* will have pointers to some other logical pages, and on disk, those pointers will be offset of the first *PageIO* used to store the logical page it points to.
 
@@ -128,4 +128,20 @@ Here is the *Node* and *Leaf* data struc
 ![Node and Leaf](images/nodeLeaf.png)
 
 
+Note that we store the size of the serialized data : this is necessary as we have to know how many *PageIO*s will be needed to store the logical page.
+
+The *rootPage* is just a *Node* or a *Leaf*.
+
+#### Potential improvement
+
+We can get better performance by serializing the data differently. Instead of storing keys and values as byte arrays prefixed by their length, we could store an array of keys and values' offsets before the associated byte[]. Here is the resulting data structure, once serialized :
+
+![Node and Leaf, improved](images/nodeLeaf2.png)
+
+(The *Node* is not described, as it's basically the same data structure, but with one extra value).
+
+It does not need more space to serialize the data this way, as the offsets are ints, and in the previous version, those ints are used to store the length of the keys and values anyway.
+
+The gain is that we can have access to a given key and value without having to read all the previous keys and values. Also we can now read a leaf or a node without having to deserialize all the keys and values they contain.
+