You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Michael Zedeler <mi...@zedeler.dk> on 2010/11/30 13:03:14 UTC

A possible approach for constructing trees of documents optimized for fast querying

Hi everybody.

I have come up with a way to create a tree where each node is a document 
in CouchDB. The tree is ordered.

Maybe someone on the list tried something similar and could comment on 
the approach. For others, this could inspire a new way of using CouchDB.

The performance features are:

    * Fast reads.
    * Slow writes (updates, deletions, moving subtrees).

The basic notion is that every document gets a path attribute that 
specifies where in the tree the document is located.

The root path looks like this:

[0]

(JSON notation.)

A child of the root looks like this:

[0, 0]

Adding another child to the root provides this path:

[0, 32768]

(32768 equals 2**15 and it is somewhat arbitraryly chosen.)

The resulting documents could look like this:

{ '_id': 'root', 'path': [0] }
{ '_id': 'child_1', 'path': [0, 0] }
{ '_id': 'child_2', 'path': [0, 32768] }

Adding a child to child_2 yields the path [0, 32768, 0].

Now querying the whole tree is done using

startkey: [-2**16, 2**16]
endkey: [-2**16, 2**16, {}]

Querying the subtree rooted in child_1:

startkey: [0, 0]
endkey: [0, 0, {}]

I have based this on the collation rules found here:

http://wiki.apache.org/couchdb/View_collation#Complex_keys

The constants -2**16 and 2**16 are just endpoints where you sequentially 
subdivide the interval into smaller segments to make room for more 
siblings. Thus inserting a node between child_1 and child_2 will return 
in the path [0, 32768/2] == [0, 16384]. As far as I understand it, 
JavaScript uses 64 bits to store numbers (they are always floats), so 
there should be ample room for many siblings under each node. At some 
point, it may be needed to re-balance the tree, which will be an 
operation requiring an update of most nodes in the subtree being balanced.

Any comments?

Regards,

Michael.

P. s. demonstration written in perl available on request.


Re: A possible approach for constructing trees of documents optimized for fast querying

Posted by Keith Gable <zi...@gmail.com>.
This is kind of already on the wiki under "How to store hierarchical  
data". Reparenting massive amounts sucks, child creation doesn't. I  
use this method. Though I store IDs in my path array instead of  
integers. This makes it easy to find all items whose parent has ID x.  
Or the section of the tree with IDs x, y, z.





On Nov 30, 2010, at 6:03 AM, Michael Zedeler <mi...@zedeler.dk> wrote:

> Hi everybody.
>
> I have come up with a way to create a tree where each node is a  
> document in CouchDB. The tree is ordered.
>
> Maybe someone on the list tried something similar and could comment  
> on the approach. For others, this could inspire a new way of using  
> CouchDB.
>
> The performance features are:
>
>   * Fast reads.
>   * Slow writes (updates, deletions, moving subtrees).
>
> The basic notion is that every document gets a path attribute that  
> specifies where in the tree the document is located.
>
> The root path looks like this:
>
> [0]
>
> (JSON notation.)
>
> A child of the root looks like this:
>
> [0, 0]
>
> Adding another child to the root provides this path:
>
> [0, 32768]
>
> (32768 equals 2**15 and it is somewhat arbitraryly chosen.)
>
> The resulting documents could look like this:
>
> { '_id': 'root', 'path': [0] }
> { '_id': 'child_1', 'path': [0, 0] }
> { '_id': 'child_2', 'path': [0, 32768] }
>
> Adding a child to child_2 yields the path [0, 32768, 0].
>
> Now querying the whole tree is done using
>
> startkey: [-2**16, 2**16]
> endkey: [-2**16, 2**16, {}]
>
> Querying the subtree rooted in child_1:
>
> startkey: [0, 0]
> endkey: [0, 0, {}]
>
> I have based this on the collation rules found here:
>
> http://wiki.apache.org/couchdb/View_collation#Complex_keys
>
> The constants -2**16 and 2**16 are just endpoints where you  
> sequentially subdivide the interval into smaller segments to make  
> room for more siblings. Thus inserting a node between child_1 and  
> child_2 will return in the path [0, 32768/2] == [0, 16384]. As far  
> as I understand it, JavaScript uses 64 bits to store numbers (they  
> are always floats), so there should be ample room for many siblings  
> under each node. At some point, it may be needed to re-balance the  
> tree, which will be an operation requiring an update of most nodes  
> in the subtree being balanced.
>
> Any comments?
>
> Regards,
>
> Michael.
>
> P. s. demonstration written in perl available on request.
>