You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jo-Erlend Schinstad <jo...@gmail.com> on 2011/12/23 11:47:38 UTC

Modeling a tree in couchdb.

For quite some time, I've been thinking about creating a Gtk TreeModel 
that uses CouchDB as a backend. I've just started investigating it and 
I've found some issues I don't know how to solve. I think this would be 
useful for all kinds of trees, though, so it wouldn't be specific to 
GTK. I hope someone can help.

The general idea is simple. The tree consists of nodes that can have 
zero or one parent and zero or more children. So I thought each node 
would be its own document with an ID like [model number, root node 
number, child node number, child...], for instance,
{ "_id":[0, 0, 1,2] } would be the third child of the second child of 
the first root node in the first model. This way, it would be easy to 
get a single node or a range of nodes (including children) from a 
specific model.

I have two problems with this; inserting, removing and moving nodes. If 
I insert a node somewhere in the middle, all the documents after it 
would need to get their IDs changed. I guess I could do this by looping 
backwards from the end, incrementing the id in order to create a free 
slot for the new node, (opposite for removals) but wouldn't this be 
extremely slow for large collections? It should also be possible to 
reorder rows, or just move something one step up or down. This would 
ideally mean simply swapping the IDs of two documents, but I don't know 
how to do this. I could give the first document a temporary value, then 
change the value for the second to the old value of the first, and then 
give the first the old value for the second. But the temporary value 
might itself cause problems.

Any ideas or thoughts? I appreciate your insight.

Jo-Erlend Schinstad

Re: Modeling a tree in couchdb.

Posted by Jo-Erlend Schinstad <jo...@gmail.com>.
Den 23. des. 2011 12:26, skrev CGS:
> An idea I got from the CouchDB documentation: use floating point 
> numbers instead of integers. That is, instead of 1, 2, 3, 4..., you 
> can use 0.1, 0.2, 0.3, 0.4... or 1.0, 2.0, 3.0, 4.0... That can help 
> you to insert a new node (say, 0.15 or 1.5) in between any two given 
> nodes (say, in between 0.1 and 0.2 or in between 1.0 and 2.0) without 
> any other modification.
>
> I hope it will be useful for your case.

Thank you! Yes, that helps a lot, I think. I'm not sure it solves the 
problem completely, but it sure gave me something to think about. It 
actually gave me a rather different perspective on the whole problem, 
which is very valuable itself. Thanks again :)

Jo-Erlend Schinstad

Re: Modeling a tree in couchdb.

Posted by Jason Smith <jh...@iriscouch.com>.
On Sat, Dec 24, 2011 at 4:03 AM, Jens Alfke <je...@couchbase.com> wrote:
>
> On Dec 23, 2011, at 3:26 AM, CGS wrote:
>
> An idea I got from the CouchDB documentation: use floating point numbers
> instead of integers. That is, instead of 1, 2, 3, 4..., you can use 0.1,
> 0.2, 0.3, 0.4... or 1.0, 2.0, 3.0, 4.0... That can help you to insert a
> new node (say, 0.15 or 1.5) in between any two given nodes (say, in
> between 0.1 and 0.2 or in between 1.0 and 2.0) without any other
> modification.
>
> This really only postpones the problem, though, since floating-point has finite precision.

That is a reason, among others, why I created bigdecimal.js.

https://github.com/iriscouch/bigdecimal.js

-- 
Iris Couch

Re: Modeling a tree in couchdb.

Posted by Jens Alfke <je...@couchbase.com>.
On Dec 23, 2011, at 3:26 AM, CGS wrote:

An idea I got from the CouchDB documentation: use floating point numbers
instead of integers. That is, instead of 1, 2, 3, 4..., you can use 0.1,
0.2, 0.3, 0.4... or 1.0, 2.0, 3.0, 4.0... That can help you to insert a
new node (say, 0.15 or 1.5) in between any two given nodes (say, in
between 0.1 and 0.2 or in between 1.0 and 2.0) without any other
modification.

This really only postpones the problem, though, since floating-point has finite precision. With standard doubles you’ll get about 50 levels of insertion before you end up with two numbers that you can’t fit another number in between. (You’ll get fewer levels if you start out with a large number of nodes because there will be less room left over for fractions.) “50 levels” may sound like a lot, but it isn’t really; think of first creating two nodes, then repeatedly inserting a new node after the first one, which is a likely-sounding scenario. After inserting 50-60 nodes that way you’re stuck.

Another issue is that you’ll get roundoff errors when the numbers are converted to/from decimal, i.e. when fetching documents and during replication. And that’s at the mercy of the JSON library being used, how many significant digits its implementor decided to use for floating point. It would be a lot safer to do the equivalent thing with integers — i.e. number the initial rows 0x100000000, 0x200000000…

Lena Herrmann’s thesis "Implementation of a distributed application using the document-oriented database CouchDB” goes into the problem of tree representations in some detail.

—Jens

Re: Modeling a tree in couchdb.

Posted by CGS <cg...@gmail.com>.
An idea I got from the CouchDB documentation: use floating point numbers 
instead of integers. That is, instead of 1, 2, 3, 4..., you can use 0.1, 
0.2, 0.3, 0.4... or 1.0, 2.0, 3.0, 4.0... That can help you to insert a 
new node (say, 0.15 or 1.5) in between any two given nodes (say, in 
between 0.1 and 0.2 or in between 1.0 and 2.0) without any other 
modification.

I hope it will be useful for your case.

CGS




On 12/23/2011 11:47 AM, Jo-Erlend Schinstad wrote:
> For quite some time, I've been thinking about creating a Gtk TreeModel 
> that uses CouchDB as a backend. I've just started investigating it and 
> I've found some issues I don't know how to solve. I think this would 
> be useful for all kinds of trees, though, so it wouldn't be specific 
> to GTK. I hope someone can help.
>
> The general idea is simple. The tree consists of nodes that can have 
> zero or one parent and zero or more children. So I thought each node 
> would be its own document with an ID like [model number, root node 
> number, child node number, child...], for instance,
> { "_id":[0, 0, 1,2] } would be the third child of the second child of 
> the first root node in the first model. This way, it would be easy to 
> get a single node or a range of nodes (including children) from a 
> specific model.
>
> I have two problems with this; inserting, removing and moving nodes. 
> If I insert a node somewhere in the middle, all the documents after it 
> would need to get their IDs changed. I guess I could do this by 
> looping backwards from the end, incrementing the id in order to create 
> a free slot for the new node, (opposite for removals) but wouldn't 
> this be extremely slow for large collections? It should also be 
> possible to reorder rows, or just move something one step up or down. 
> This would ideally mean simply swapping the IDs of two documents, but 
> I don't know how to do this. I could give the first document a 
> temporary value, then change the value for the second to the old value 
> of the first, and then give the first the old value for the second. 
> But the temporary value might itself cause problems.
>
> Any ideas or thoughts? I appreciate your insight.
>
> Jo-Erlend Schinstad


Re: Modeling a tree in couchdb.

Posted by Michael Zedeler <mi...@zedeler.dk>.
Hi Nils and Jo-Erlend.

On 2011-12-30 00:39, Nils Breunese wrote:
> I love CouchDB, but I'm not sure I would use it for tree data. Have you looked into graph databases, like Neo4j for instance?
I've come to the same conclusion. We have been running a solution where 
paths were represented as a list of ids, but it is very maintenance 
heavy and in some scenarios it doesn't perform well. To alleviate those 
problems, we have collapsed whole trees into single document, which - 
obviously - won't scale indefinately.
-- 
Michael Zedeler
70 25 19 99 | LinkedIn <http://dk.linkedin.com/in/mzedeler> | Twitter 
<http://twitter.com/#%21/mzedeler> | Github <https://github.com/mzedeler>

Re: Modeling a tree in couchdb.

Posted by Jo-Erlend Schinstad <jo...@gmail.com>.
Hello again. Thanks for your suggestion.

Den 03. jan. 2012 00:38, skrev Kevin R. Coombes:
> Your last few posts have confused me as to the actual requirements.  
> Suppose I have the following tree, where the numbers on the nodes are 
> arbitrary IDs that i have assigned.
>                      [1]
>                       |
>                  -----------
>                  |         |
>                 [2]       [3]
>                  |         |
>     -----------------   ---------
>     |        |      |   |       |
>    [4]      [X]    [5] [6]     [7]
>

But if I want to insert a branch between 2 and 3 and still keep it 
ordered? Wouldn't I then have to give a new path to 3, 6 and 7? One 
solution that has been suggested is to use decimal values, so that I 
could do 2+3/2 to get 2.5 and use that as its path. Then, if I wanted to 
add a node between 2.5 and 3, I would do 2.5+3/2=2,75 and the tree would 
look like this:

                                 [1]
                                  |
                         --------------------------------
                         |        |            |              |
                        [2]    [2,5]      [2,75]      [3]

The paths for 2 and 3, including all their child nodes would stay the 
same. If there are more siblings on that level, then they would also be 
intact with no change. However, this would be limited because there's a 
finite number of decimals. And there is another problem. If I now want 
to move 2.75 before 2.5, then I would do 2+2,5/2=2.25. The first level 
below the root would now be [2, 2.25, 2.5, 3], but since 2.75 is now 
called 2.25, I would also need to update the paths of all nodes under 
it. This would also mean that the tree would be in an inconsistent state 
between updating 2.75 itself and the updates to its child nodes. If 
there is a large number of elements, then this can take a considerable 
amount of time.

So, the current idea (that I got from Lena Herrman) is that each element 
only points to its direct parent and to its next sibling. That way, any 
operation would only require two writes, regardless of the size of the tree:

                                 [1]
                                   |
                         -----------------------------
                         |                                 |
                       [2]                              [3]
                         |                                 |
                 ---------------                ------------
                 |              |                  |             |
                [4]          [5]               [6]          [7]

(Yes, same tree, just repeated for your convenience:))

In this case, the JSON would look something like:
{"_id" : "uuid1", "label" : "1", "parent" : null, "next_sibling" : null, 
"root_node" : true }
{"_id" : "uuid2", "label" : "2", "parent" : "uuid1", "next_sibling" : 
"uuid3", "first_child" : true }
{"_id" : "uuid3", "label" : "3", "parent" : "uuid1", "next_sibling" : null }
{"_id" : "uuid4", "label" : "4", "parent" : "uuid2", "next_sibling" : 
"uuid5", "first_child" :true }
{"_id" : "uuid5", "label" : "5", "parent" : "uuid2", "next_sibling" : null }
{"_id" : "uuid6", "label" : "6", "parent" : "uuid3", "next_sibling" : 
"uuid7", "first_child" : true }
{"_id" : "uuid7", "label" : "7", "parent" : "uuid3", "next_sibling" : null }

So now, if I wanted to move 2 to 7, then I would do

{"_id" : "uuid2", "label" : "2", "parent" : "uuid7", "next_sibling" : 
null, "first_child" : true }
{"_id" : "uuid3", "label" : "3", "parent" : "uuid1", "next_sibling": 
null, "first_child": true }

If, instead, I wanted to move 5 before 4, then I would do

{"_id" : "uuid5", "label" : "5", "parent" : "uuid2", "next_sibling" : 
"uuid4", "first_child" : true }
{"_id" : "uuid4", "label" : "4", "parent" : "uuid2", "next_sibling": 
null, "first_child": false }

No other writes would be necessary, no matter what you do, or how large 
the changes are. There are also no problems with decimals. You can 
reorder the tree how many times you want. That is a huge improvement, 
but it would still mean that the tree is broken between the first write 
and the second. I think I can do this by adding a field that signifies 
that the first is the beginning of an operation and that the second 
write finishes it.

The problem is how to get the tree in an ordered state from the 
database. I currently sort by root_node, then parent, then first_child. 
This means I can always get the starting points quickly, but I'll have 
to rebuild the tree on the client. It's not really a big problem, but if 
I use the tree in one Python desktop app and one JavaScript browser app, 
then I'll have to do that work twice. I would much rather do it on the 
server so I could just loop through on the client. That would also 
reduce the amount of work that is necessary.

It feels like there should be a better way to do it, but I can't really 
get it to the front of my mind. :)


Jo-Erlend

Re: Modeling a tree in couchdb.

Posted by "Kevin R. Coombes" <ke...@gmail.com>.
Your last few posts have confused me as to the actual requirements.  
Suppose I have the following tree, where the numbers on the nodes are 
arbitrary IDs that i have assigned.
                      [1]
                       |
                  -----------
                  |         |
                 [2]       [3]
                  |         |
     -----------------   ---------
     |        |      |   |       |
    [4]      [X]    [5] [6]     [7]

Assume for the moment that the node labeled [X] does not yet exist.  
Then this tree would have something like the following JSON/Couch 
representation:
{ '_id': 1,  'ancestors': [] }
   { '_id': 2,  'ancestors': [1] }
   { '_id': 3,  'ancestors': [1] }
   { '_id': 4,  'ancestors': [2, 1] }
   { '_id': 5,  'ancestors': [2, 1] }
   { '_id': 6,  'ancestors': [3, 1] }
   { '_id': 7,  'ancestors': [3, 1] }

Each _id has been assigned manually rather than automatically.  The list 
of 'ancestors' for each item simply gives the ids that you trace through 
to march back up the tree from that node to the root.  If you want to 
add a new node at position X, then you decide which id to give it and 
simply create a new node that looks like
{ '_id': 'X',   'ancestors': [2, 1] }
Nothing else needs to change.  The hard part would only come if you 
wanted to, for example, insert a new node [Y] above node [3] in the 
tree. In that case, you would have to update all of the nodes in that 
branch, producing new items:
   { '_id': 'Y', 'ancestors': [1] }
   { '_id':  3,  'ancestors': ['Y', 1] }
   { '_id':  6,  'ancestors': [3, 'Y',  1] }
   { '_id':  7,  'ancestors': [3, 'Y', 1] }
Assuming fairly balanced branching, inserting a node above a branch at 
level N, you would expect to have to updated 1/2^N of the nodes, which 
should usually be a fairly small fraction.

Your answers suggest that you _also _want to keep track of the number of 
siblings of each node in some sequential order, which is a different 
requirement than keeping track of the hierarchical tree structure.  You 
can achieve this other goal by adding another item to the representation 
of each node. The simplest idea would be to add a 'children' property 
that lists (in implicit order) all of the children of a node.  In that 
case, for example, node [2] would originally be written as
   { '_id': 2,   'ancestors': [1],  'children': [4, 5] }
and would need to be updated to
   { '_id': 2,   'ancestors': [1], 'children;: [4, 'X', 5'] }
when you inserted the node [X].  Nothing else would have to change.

Of course, you might also want a complete list of all nodes in the order 
in which they were added to the tree. You could accomplish that goal by 
adding yet another property that allowed you to keep the items ordered, 
independent of the tree structure.

Other than replacing the root node by something above it, I cannot see a 
reason to ever have to update all of the nodes.  What am I missing?

     Kevin

On 1/2/2012 2:57 PM, Jo-Erlend Schinstad wrote:
> Den 02. jan. 2012 18:04, skrev Kevin R. Coombes:
>> Take this suggestion with a grain of salt, since I haven't actually 
>> tried it and am just making things up....
>>
>> If your structure is really a tree, then the location of every node 
>> is characterized by a unique path back to the root node, You could 
>> save the entire path in each object as a list:
>>     [parent, grandparent, great-grandparent, ..., rootnode]
>> One view would simply return this entire list for each object.
>> A second view would just give back the parent node.
>> Either view can be used (with appropriate logic in the client) to 
>> reconstruct the entire tree. However, you could also easily create 
>> auxiliary views (e.g., grandparent) depending on your needs.
>>
>> This organization makes querying the tree relatively easy.  However, 
>> it will have _terrible _performance if you do a lot of surgery on the 
>> tree, lopping off branches and reattaching them in different places.
>>
>
> That was my original thought. I  wanted to do something like key [0, 
> 5, 4,2] which would mean the sixth child of the first top-level node, 
> and the fifth child of that node and the third of it.
>
> This would work well, and I would get the tree in one go, organized 
> and nice. The problem is that the tree must be reorganizable. In that 
> model, if moved one node, then I would have to update all the 
> following documents. If there's a million rows in the tree, then I 
> would need something like 999.990+ http requests... :)
>
> Further, one of primary goals is replication. It could never work. The 
> internet janitor would hit me in the back of the head with a wrench. 
> However, for something like a family tree or a blog, where something 
> will forever be a response to something else, that would work nicely.
>
> So what I'm doing now, is that I'm just retrieving the data from the 
> database and organising it on the client. It's not a comfortable 
> solution, I think. It's not elegant, but if it's the only possible 
> solution, then it doesn't really matter if it's elegant or not. :)
>
> In other words, I'm still very much looking for a good way to model a 
> flexible tree in a couchdb. Any suggestions are highly welcome.
>
> Jo-Erlend Schinstad

Re: Modeling a tree in couchdb.

Posted by Keith Gable <zi...@ignition-project.com>.
That's true, but in my case moving big branches doesn't happen.

---
Keith Gable
A+ Certified Professional
Network+ Certified Professional
Web Developer



On Mon, Jan 2, 2012 at 4:07 PM, Jo-Erlend Schinstad <
joerlend.schinstad@gmail.com> wrote:

> Den 02. jan. 2012 22:53, skrev Keith Gable:
>
>  The method I use is to have a field called "path", which contains a list
>> of
>> IDs, and the last item in the list/array is the current document's ID. The
>> downside is that you have to assign the IDs yourself, but the upside is
>> that it's easy to query.
>>
>> ---
>>
>
> Right. But consider this list of elements: [1,2,3,4,5]. What happens if I
> insert something before 3? Then, 3 becomes 4, 4 becomes 5 and 5 becomes 6.
> If this list is a million entries long, then you would need to update
> 999.997 documents. Then all those would need to be sent to all the other
> databases, etc. That's the problem. For immutable trees, in the sense that
> the structure and order doesn't change, your solution works nicely.
>
> Jo-Erlend Schinstad
>
>
>

Re: Modeling a tree in couchdb.

Posted by Jo-Erlend Schinstad <jo...@gmail.com>.
Den 02. jan. 2012 22:53, skrev Keith Gable:
> The method I use is to have a field called "path", which contains a list of
> IDs, and the last item in the list/array is the current document's ID. The
> downside is that you have to assign the IDs yourself, but the upside is
> that it's easy to query.
>
> ---

Right. But consider this list of elements: [1,2,3,4,5]. What happens if 
I insert something before 3? Then, 3 becomes 4, 4 becomes 5 and 5 
becomes 6. If this list is a million entries long, then you would need 
to update 999.997 documents. Then all those would need to be sent to all 
the other databases, etc. That's the problem. For immutable trees, in 
the sense that the structure and order doesn't change, your solution 
works nicely.

Jo-Erlend Schinstad



Re: Modeling a tree in couchdb.

Posted by Keith Gable <zi...@ignition-project.com>.
The method I use is to have a field called "path", which contains a list of
IDs, and the last item in the list/array is the current document's ID. The
downside is that you have to assign the IDs yourself, but the upside is
that it's easy to query.

---
Keith Gable
A+ Certified Professional
Network+ Certified Professional
Web Developer



On Mon, Jan 2, 2012 at 2:57 PM, Jo-Erlend Schinstad <
joerlend.schinstad@gmail.com> wrote:

> Den 02. jan. 2012 18:04, skrev Kevin R. Coombes:
>
>  Take this suggestion with a grain of salt, since I haven't actually tried
>> it and am just making things up....
>>
>> If your structure is really a tree, then the location of every node is
>> characterized by a unique path back to the root node, You could save the
>> entire path in each object as a list:
>>    [parent, grandparent, great-grandparent, ..., rootnode]
>> One view would simply return this entire list for each object.
>> A second view would just give back the parent node.
>> Either view can be used (with appropriate logic in the client) to
>> reconstruct the entire tree. However, you could also easily create
>> auxiliary views (e.g., grandparent) depending on your needs.
>>
>> This organization makes querying the tree relatively easy.  However, it
>> will have _terrible _performance if you do a lot of surgery on the tree,
>> lopping off branches and reattaching them in different places.
>>
>>
> That was my original thought. I  wanted to do something like key [0, 5,
> 4,2] which would mean the sixth child of the first top-level node, and the
> fifth child of that node and the third of it.
>
> This would work well, and I would get the tree in one go, organized and
> nice. The problem is that the tree must be reorganizable. In that model, if
> moved one node, then I would have to update all the following documents. If
> there's a million rows in the tree, then I would need something like
> 999.990+ http requests... :)
>
> Further, one of primary goals is replication. It could never work. The
> internet janitor would hit me in the back of the head with a wrench.
> However, for something like a family tree or a blog, where something will
> forever be a response to something else, that would work nicely.
>
> So what I'm doing now, is that I'm just retrieving the data from the
> database and organising it on the client. It's not a comfortable solution,
> I think. It's not elegant, but if it's the only possible solution, then it
> doesn't really matter if it's elegant or not. :)
>
> In other words, I'm still very much looking for a good way to model a
> flexible tree in a couchdb. Any suggestions are highly welcome.
>
> Jo-Erlend Schinstad
>

Re: Modeling a tree in couchdb.

Posted by Jo-Erlend Schinstad <jo...@gmail.com>.
Den 02. jan. 2012 18:04, skrev Kevin R. Coombes:
> Take this suggestion with a grain of salt, since I haven't actually 
> tried it and am just making things up....
>
> If your structure is really a tree, then the location of every node is 
> characterized by a unique path back to the root node, You could save 
> the entire path in each object as a list:
>     [parent, grandparent, great-grandparent, ..., rootnode]
> One view would simply return this entire list for each object.
> A second view would just give back the parent node.
> Either view can be used (with appropriate logic in the client) to 
> reconstruct the entire tree. However, you could also easily create 
> auxiliary views (e.g., grandparent) depending on your needs.
>
> This organization makes querying the tree relatively easy.  However, 
> it will have _terrible _performance if you do a lot of surgery on the 
> tree, lopping off branches and reattaching them in different places.
>

That was my original thought. I  wanted to do something like key [0, 5, 
4,2] which would mean the sixth child of the first top-level node, and 
the fifth child of that node and the third of it.

This would work well, and I would get the tree in one go, organized and 
nice. The problem is that the tree must be reorganizable. In that model, 
if moved one node, then I would have to update all the following 
documents. If there's a million rows in the tree, then I would need 
something like 999.990+ http requests... :)

Further, one of primary goals is replication. It could never work. The 
internet janitor would hit me in the back of the head with a wrench. 
However, for something like a family tree or a blog, where something 
will forever be a response to something else, that would work nicely.

So what I'm doing now, is that I'm just retrieving the data from the 
database and organising it on the client. It's not a comfortable 
solution, I think. It's not elegant, but if it's the only possible 
solution, then it doesn't really matter if it's elegant or not. :)

In other words, I'm still very much looking for a good way to model a 
flexible tree in a couchdb. Any suggestions are highly welcome.

Jo-Erlend Schinstad

Re: Modeling a tree in couchdb.

Posted by "Kevin R. Coombes" <ke...@gmail.com>.
Take this suggestion with a grain of salt, since I haven't actually 
tried it and am just making things up....

If your structure is really a tree, then the location of every node is 
characterized by a unique path back to the root node, You could save the 
entire path in each object as a list:
     [parent, grandparent, great-grandparent, ..., rootnode]
One view would simply return this entire list for each object.
A second view would just give back the parent node.
Either view can be used (with appropriate logic in the client) to 
reconstruct the entire tree. However, you could also easily create 
auxiliary views (e.g., grandparent) depending on your needs.

This organization makes querying the tree relatively easy.  However, it 
will have _terrible _performance if you do a lot of surgery on the tree, 
lopping off branches and reattaching them in different places.

Best,
     Kevin

On 1/2/2012 2:33 AM, Walter Werner wrote:
> 2012/1/2 Jo-Erlend Schinstad<jo...@gmail.com>:
>> Den 02. jan. 2012 08:58, skrev Walter Werner:
>>
>>> How about if every document get a parent attribute?
>>>
>>> root document
>>> id: 123
>>> parent: undefined
>>>
>>> child document
>>> id: 768
>>> parent: 123
>>>
>>> child child document
>>> id: 991
>>> parent: 768
>>>
>>>
>>> etc.
>>>
>>> You need then a view with the parent as a key. With one request you
>>> can get all his children (only 1 level) of a document. Then you
>>> proceed with the children-documents and ask again whether they have
>>> children. Maybe it will be a performance Issue, if your 'object' has
>>> too many levels. The advantage is, that you don't have to think about
>>> how the id's of your documents should look like.
>>>
>>>
>> That is almost what I'm currently trying. I Have a topmost_parent attribute,
>> then a parent attribute and a next_sibling attribute. That way, I can get
>> all elements with one request. I still have to process on the client, but I
>> think it should work. The optimal solution would allow me to get it ordered
>> correctly from the database, but I see no way of achieving that.
>>
>> Jo-Erlend Schinstad
> Did you think about of using 2 keys in your view? One key for the
> parent attribute and the other for ordering
>
> [doc.parent, doc.timestamp]
>
> I take timestamp as an example.
>
> Walter

Re: Modeling a tree in couchdb.

Posted by Walter Werner <we...@googlemail.com>.
2012/1/2 Jo-Erlend Schinstad <jo...@gmail.com>:
> Den 02. jan. 2012 08:58, skrev Walter Werner:
>
>>
>> How about if every document get a parent attribute?
>>
>> root document
>> id: 123
>> parent: undefined
>>
>> child document
>> id: 768
>> parent: 123
>>
>> child child document
>> id: 991
>> parent: 768
>>
>>
>> etc.
>>
>> You need then a view with the parent as a key. With one request you
>> can get all his children (only 1 level) of a document. Then you
>> proceed with the children-documents and ask again whether they have
>> children. Maybe it will be a performance Issue, if your 'object' has
>> too many levels. The advantage is, that you don't have to think about
>> how the id's of your documents should look like.
>>
>>
>
> That is almost what I'm currently trying. I Have a topmost_parent attribute,
> then a parent attribute and a next_sibling attribute. That way, I can get
> all elements with one request. I still have to process on the client, but I
> think it should work. The optimal solution would allow me to get it ordered
> correctly from the database, but I see no way of achieving that.
>
> Jo-Erlend Schinstad

Did you think about of using 2 keys in your view? One key for the
parent attribute and the other for ordering

[doc.parent, doc.timestamp]

I take timestamp as an example.

Walter

Re: Modeling a tree in couchdb.

Posted by Jo-Erlend Schinstad <jo...@gmail.com>.
Den 02. jan. 2012 08:58, skrev Walter Werner:
>
> How about if every document get a parent attribute?
>
> root document
> id: 123
> parent: undefined
>
> child document
> id: 768
> parent: 123
>
> child child document
> id: 991
> parent: 768
>
>
> etc.
>
> You need then a view with the parent as a key. With one request you
> can get all his children (only 1 level) of a document. Then you
> proceed with the children-documents and ask again whether they have
> children. Maybe it will be a performance Issue, if your 'object' has
> too many levels. The advantage is, that you don't have to think about
> how the id's of your documents should look like.
>
>

That is almost what I'm currently trying. I Have a topmost_parent 
attribute, then a parent attribute and a next_sibling attribute. That 
way, I can get all elements with one request. I still have to process on 
the client, but I think it should work. The optimal solution would allow 
me to get it ordered correctly from the database, but I see no way of 
achieving that.

Jo-Erlend Schinstad

Re: Modeling a tree in couchdb.

Posted by Walter Werner <we...@googlemail.com>.
2011/12/30 Nils Breunese <N....@vpro.nl>:
> I love CouchDB, but I'm not sure I would use it for tree data. Have you looked into graph databases, like Neo4j for instance?
>
> Nils.
> ________________________________________
> Van: Jo-Erlend Schinstad [joerlend.schinstad@gmail.com]
> Verzonden: vrijdag 23 december 2011 11:47
> Aan: user@couchdb.apache.org
> Onderwerp: Modeling a tree in couchdb.
>
> For quite some time, I've been thinking about creating a Gtk TreeModel
> that uses CouchDB as a backend. I've just started investigating it and
> I've found some issues I don't know how to solve. I think this would be
> useful for all kinds of trees, though, so it wouldn't be specific to
> GTK. I hope someone can help.
>
> The general idea is simple. The tree consists of nodes that can have
> zero or one parent and zero or more children. So I thought each node
> would be its own document with an ID like [model number, root node
> number, child node number, child...], for instance,
> { "_id":[0, 0, 1,2] } would be the third child of the second child of
> the first root node in the first model. This way, it would be easy to
> get a single node or a range of nodes (including children) from a
> specific model.
>
> I have two problems with this; inserting, removing and moving nodes. If
> I insert a node somewhere in the middle, all the documents after it
> would need to get their IDs changed. I guess I could do this by looping
> backwards from the end, incrementing the id in order to create a free
> slot for the new node, (opposite for removals) but wouldn't this be
> extremely slow for large collections? It should also be possible to
> reorder rows, or just move something one step up or down. This would
> ideally mean simply swapping the IDs of two documents, but I don't know
> how to do this. I could give the first document a temporary value, then
> change the value for the second to the old value of the first, and then
> give the first the old value for the second. But the temporary value
> might itself cause problems.
>
> Any ideas or thoughts? I appreciate your insight.
>
> Jo-Erlend Schinstad
> ------------------------------------------------------------------------
>  VPRO   www.vpro.nl
> ------------------------------------------------------------------------

How about if every document get a parent attribute?

root document
id: 123
parent: undefined

child document
id: 768
parent: 123

child child document
id: 991
parent: 768


etc.

You need then a view with the parent as a key. With one request you
can get all his children (only 1 level) of a document. Then you
proceed with the children-documents and ask again whether they have
children. Maybe it will be a performance Issue, if your 'object' has
too many levels. The advantage is, that you don't have to think about
how the id's of your documents should look like.

Walter

RE: Modeling a tree in couchdb.

Posted by Nils Breunese <N....@vpro.nl>.
I love CouchDB, but I'm not sure I would use it for tree data. Have you looked into graph databases, like Neo4j for instance?

Nils.
________________________________________
Van: Jo-Erlend Schinstad [joerlend.schinstad@gmail.com]
Verzonden: vrijdag 23 december 2011 11:47
Aan: user@couchdb.apache.org
Onderwerp: Modeling a tree in couchdb.

For quite some time, I've been thinking about creating a Gtk TreeModel
that uses CouchDB as a backend. I've just started investigating it and
I've found some issues I don't know how to solve. I think this would be
useful for all kinds of trees, though, so it wouldn't be specific to
GTK. I hope someone can help.

The general idea is simple. The tree consists of nodes that can have
zero or one parent and zero or more children. So I thought each node
would be its own document with an ID like [model number, root node
number, child node number, child...], for instance,
{ "_id":[0, 0, 1,2] } would be the third child of the second child of
the first root node in the first model. This way, it would be easy to
get a single node or a range of nodes (including children) from a
specific model.

I have two problems with this; inserting, removing and moving nodes. If
I insert a node somewhere in the middle, all the documents after it
would need to get their IDs changed. I guess I could do this by looping
backwards from the end, incrementing the id in order to create a free
slot for the new node, (opposite for removals) but wouldn't this be
extremely slow for large collections? It should also be possible to
reorder rows, or just move something one step up or down. This would
ideally mean simply swapping the IDs of two documents, but I don't know
how to do this. I could give the first document a temporary value, then
change the value for the second to the old value of the first, and then
give the first the old value for the second. But the temporary value
might itself cause problems.

Any ideas or thoughts? I appreciate your insight.

Jo-Erlend Schinstad
------------------------------------------------------------------------
 VPRO   www.vpro.nl
------------------------------------------------------------------------

Re: Modeling a tree in couchdb.

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 28 Dec 2011 23:08:01 +0100 Jo-Erlend Schinstad <jo...@gmail.com> wrote: 

JS> Den 27. des. 2011 17:32, skrev Ted Zlatanov:
>> How big is this document?  If it's like the TreeModels I've used, maybe
>> you can simply keep the whole connectivity data (the edges of the graph)
>> in one document, and the nodes have one document each.  That way you can
>> load or update the graph in one shot; replication is simpler;
>> consistency is better (at worst, a node is missing).  Renumbering is
>> easier as well, since you just load a single document, rewrite it, and
>> save.
>> 
>> If this is a huge tree or you expect multiple clients to update the
>> connectivity graph this is not a great solution.
>> 
JS> Thanks for the suggestion. Interesting idea. The problem is that I
JS> want this to be a general solution, which means I don't know how big
JS> it will be or whether it'll be accessed simultaneously by different
JS> users.

One suggestion...  If you don't expect elements and subtrees to move
around much (if it's mostly append activity), you can break the
connectivity document into pieces when it gets big enough.  It can be
better than storing the full list of edges, which for a large tree will
be huge, or implicitly creating edges based on node child/parent
attributes, which is expensive.  But it's not general, of course:
rebalancing the tree will be painful and moving a node can require 2
rewrites instead of 1, so you could end up with the node in two
locations temporarily.  And you have to write code to do it :)

Ted


Re: Modeling a tree in couchdb.

Posted by Jo-Erlend Schinstad <jo...@gmail.com>.
Den 27. des. 2011 17:32, skrev Ted Zlatanov:
> How big is this document?  If it's like the TreeModels I've used, maybe
> you can simply keep the whole connectivity data (the edges of the graph)
> in one document, and the nodes have one document each.  That way you can
> load or update the graph in one shot; replication is simpler;
> consistency is better (at worst, a node is missing).  Renumbering is
> easier as well, since you just load a single document, rewrite it, and
> save.
>
> If this is a huge tree or you expect multiple clients to update the
> connectivity graph this is not a great solution.
>
> Ted
>
Thanks for the suggestion. Interesting idea. The problem is that I want 
this to be a general solution, which means I don't know how big it will 
be or whether it'll be accessed simultaneously by different users.

Jo-Erlend Schinstad

Re: Modeling a tree in couchdb.

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Fri, 23 Dec 2011 11:47:38 +0100 Jo-Erlend Schinstad <jo...@gmail.com> wrote: 

JS> For quite some time, I've been thinking about creating a Gtk TreeModel
JS> that uses CouchDB as a backend. I've just started investigating it and
JS> I've found some issues I don't know how to solve. I think this would
JS> be useful for all kinds of trees, though, so it wouldn't be specific
JS> to GTK. I hope someone can help.

JS> The general idea is simple. The tree consists of nodes that can have
JS> zero or one parent and zero or more children. So I thought each node
JS> would be its own document with an ID like [model number, root node
JS> number, child node number, child...], for instance,
JS> { "_id":[0, 0, 1,2] } would be the third child of the second child of
JS> the first root node in the first model. This way, it would be easy to
JS> get a single node or a range of nodes (including children) from a
JS> specific model.

How big is this document?  If it's like the TreeModels I've used, maybe
you can simply keep the whole connectivity data (the edges of the graph)
in one document, and the nodes have one document each.  That way you can
load or update the graph in one shot; replication is simpler;
consistency is better (at worst, a node is missing).  Renumbering is
easier as well, since you just load a single document, rewrite it, and
save.

If this is a huge tree or you expect multiple clients to update the
connectivity graph this is not a great solution.

Ted