You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by alartin <al...@gmail.com> on 2007/03/21 11:15:24 UTC

Node mapping question: when should I use subnode?

Hi all,
  I am trying to write a QnA (question and answer) demo of jackrabbit and
have a few questions about Object and Content Mapping.
  Given three objects: qeustion, answer, and comment. One question may has
many answers and comments; one answer may have many comments. Answer and
comment can not exist alone.
  In OCM, I have two choice:
1. subnodes:
            1 root -- 1 my:qna -- * my:question 
                                                    |__  * my:answer  ___
                                                    |__  * my:comment    |__
* my:comment
2. same level(use reference):
            1 root -- 1 my:qna -- 1 my:questions
                                                             |__ *
my:question
                                       -- 1 my:answers
                                                             |__ * my:answer
                                       -- 1 my:comments
                                                               |__ *
my:comment
   I can use the multi-value property to store the references. For example,
one my:question has a multi-value property named "answers" and each value is
the uuid of one my:answer node and each my:answer hold the uuid of the
question node.
My question is : what's the difference between the two choice?
If I need do a lot of access or calculating to answers or comments, is it
better to choose the 2th choice? Thus, I do not need to iterate all
questions to find all answers or comments.
The second question is: Is it important to make a single my:questions
node/my:answers node/my:comments node?
If not, there are many different nodes in one level. Is it a big problem in
the future such as searching performance?
Many thanks!
-- 
View this message in context: http://www.nabble.com/Node-mapping-question%3A-when-should-I-use-subnode--tf3439621.html#a9590761
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Node mapping question: when should I use subnode?

Posted by Stefan Kurla <st...@gmail.com>.

Good detailed answer Jukka. Helps other beginners as well.

Thanks.
On 3/22/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 3/22/07, alartin <al...@gmail.com> wrote:
> > Thanks for your reply. I wanna write a simple QnA(like yahoo answers, but
> > much more simpler) demo of jackrabbit. Every question have tags and user can
> > find questions by search box, by status(open,voting,close) or by tag. And
> > also, newest questions, the hottest questions(ranked by the  number of
> > answers),  the users with the highest scores are shown in the frontpage.
> > There will be a timer to calculate all the infomration above.
>
> OK, thanks for the details. It sounds like a date-based hierarchy
> could work for you. It wouldn't be directly applicable in the user
> interface perspective, but would make administration easy and would
> give you a fast way to build the list of most recent questions.
>
> I would go for a main structure like this:
>
>     /my:content
>     /my:content/tags
>     /my:content/tags/<tag>
>     /my:content/users
>     /my:content/users/<user>
>     /my:content/workflow
>     /my:content/workflow/<status>
>     /my:content/content
>     /my:content/content/<yyyy>/<mm>/<question>
>     /my:content/content/<yyyy>/<mm>/<question>/<answer>
>
> The /my:content root node is for a clean separation from /jcr:system
> and any other content applications you may want to store within the
> same workspace.
>
> The second level (tags, users, workflow, content) is for partitioning
> your content by type. In future you might want to generalize these
> parts into standalone /my:tags, etc. components that could be used
> with all sorts of content applications.
>
> Tags would be referenceable nodes with whatever metadata properties
> you may want to associate with them. If you want, you could also have
> a tag hierarchy. Question and answer nodes would have a multivalued
> "tags" reference property that points to the associated tags. The tag
> nodes would be named by the tag name
>
> User nodes would contain whatever user information you want to store.
> They would also be referenceable, and you'd have an "author" property
> (or perhaps "authors") on the question and answer nodes for linking to
> the associated user. User nodes would be named by the username.
>
> The workflow nodes would be used just like tags for tracking the
> status of a question. You'd have three referenceable workflow nodes
> ("open", "voting", "close") and a single-valued "status" property on
> the question (and answer?) nodes. This gives you a quick way to list
> all nodes in a given state, and also allows you to easily extend the
> workflow model if needed. You can also attach all sorts of extra
> metadata on the workflow nodes.
>
> The actual content nodes, questions and answers, would be distributed
> into a date-based <yyyy>/<mm> tree hierarchy to simplify
> administration and to avoid making the content structure too flat.
> These content nodes would have the above-mentioned reference
> properties and of course any "title" and "content" properties and
> extra metadata you need. If you want you could also allow binary
> attachments. The content nodes would be named by the title
> (potentially encoded) of the node for easy administration and URL
> mapping.
>
> A quick and dirty shot at node typing would be:
>
>     [my:user] > mix:referenceable
>
>     [my:tag] > mix:referenceable
>
>     [my:state] > mix:referenceable
>
>     [my:content]
>     - title (STRING) mandatory
>     - content (STRING) mandatory
>     - tags (REFERENCE) multiple mandatory < my:tag
>     - authors (REFERENCE) multiple mandatory < my:user
>     - status (REFERENCE) mandatory < my:state
>
>     [my:question] > my:content orderable
>     + * (my:answer)
>
>     [my:answer] > my:content
>
> You might also want to consider making the types extend
> nt:hierarchyNode in which case you could use nt:folder nodes for the
> intermediate structure, and also achieve extra interoperability with
> generic JCR clients.
>
> > I am eager to hear advices from experts like you. Many thanks again.
>
> You're welcome. It's great to have such a content modeling discussion,
> I believe there are many people who are very interested in these
> concepts.
>
> BR,
>
> Jukka Zitting
>

Re: Node mapping question: when should I use subnode?

Posted by alartin <al...@gmail.com>.

Hi Jukka,

Thanks for you patience and explanation.
In my case, the common features of question, answer and comment are:
1. date
2. content
3. author(only one)
question must has a title but answer and comment not. And only question has
tags and state/status
So, would the below definition be better?

[my:content] > nt:hierarchyNode  // inherit its jcr:created property
    - content (STRING) mandatory
    - author (REFERENCE)  mandatory < my:user

 [my:question] > my:content orderable
    - title (string) mandatory // case insensitive to type?
    - tags (REFERENCE) multiple mandatory < my:tag
    - status (REFERENCE) mandatory < my:state
    + * (my:answer)  // same as + my:answer multiple ?

[my:answer] > my:content
    - best (boolean) mandatory // whether this is the best answer
    - vote (int) mandatory // vote number count from users
 

Jukka Zitting wrote:
> 
> Hi,
> 
> On 3/22/07, alartin <al...@gmail.com> wrote:
>> Thanks for your reply. I wanna write a simple QnA(like yahoo answers, but
>> much more simpler) demo of jackrabbit. Every question have tags and user
>> can
>> find questions by search box, by status(open,voting,close) or by tag. And
>> also, newest questions, the hottest questions(ranked by the  number of
>> answers),  the users with the highest scores are shown in the frontpage.
>> There will be a timer to calculate all the infomration above.
> 
> OK, thanks for the details. It sounds like a date-based hierarchy
> could work for you. It wouldn't be directly applicable in the user
> interface perspective, but would make administration easy and would
> give you a fast way to build the list of most recent questions.
> 
> I would go for a main structure like this:
> 
>     /my:content
>     /my:content/tags
>     /my:content/tags/<tag>
>     /my:content/users
>     /my:content/users/<user>
>     /my:content/workflow
>     /my:content/workflow/<status>
>     /my:content/content
>     /my:content/content/<yyyy>/<mm>/<question>
>     /my:content/content/<yyyy>/<mm>/<question>/<answer>
> 
> The /my:content root node is for a clean separation from /jcr:system
> and any other content applications you may want to store within the
> same workspace.
> 
> The second level (tags, users, workflow, content) is for partitioning
> your content by type. In future you might want to generalize these
> parts into standalone /my:tags, etc. components that could be used
> with all sorts of content applications.
> 
> Tags would be referenceable nodes with whatever metadata properties
> you may want to associate with them. If you want, you could also have
> a tag hierarchy. Question and answer nodes would have a multivalued
> "tags" reference property that points to the associated tags. The tag
> nodes would be named by the tag name
> 
> User nodes would contain whatever user information you want to store.
> They would also be referenceable, and you'd have an "author" property
> (or perhaps "authors") on the question and answer nodes for linking to
> the associated user. User nodes would be named by the username.
> 
> The workflow nodes would be used just like tags for tracking the
> status of a question. You'd have three referenceable workflow nodes
> ("open", "voting", "close") and a single-valued "status" property on
> the question (and answer?) nodes. This gives you a quick way to list
> all nodes in a given state, and also allows you to easily extend the
> workflow model if needed. You can also attach all sorts of extra
> metadata on the workflow nodes.
> 
> The actual content nodes, questions and answers, would be distributed
> into a date-based <yyyy>/<mm> tree hierarchy to simplify
> administration and to avoid making the content structure too flat.
> These content nodes would have the above-mentioned reference
> properties and of course any "title" and "content" properties and
> extra metadata you need. If you want you could also allow binary
> attachments. The content nodes would be named by the title
> (potentially encoded) of the node for easy administration and URL
> mapping.
> 
> A quick and dirty shot at node typing would be:
> 
>     [my:user] > mix:referenceable
> 
>     [my:tag] > mix:referenceable
> 
>     [my:state] > mix:referenceable
> 
>     [my:content]
>     - title (STRING) mandatory
>     - content (STRING) mandatory
>     - tags (REFERENCE) multiple mandatory < my:tag
>     - authors (REFERENCE) multiple mandatory < my:user
>     - status (REFERENCE) mandatory < my:state
> 
>     [my:question] > my:content orderable
>     + * (my:answer)
> 
>     [my:answer] > my:content
> 
> You might also want to consider making the types extend
> nt:hierarchyNode in which case you could use nt:folder nodes for the
> intermediate structure, and also achieve extra interoperability with
> generic JCR clients.
> 
>> I am eager to hear advices from experts like you. Many thanks again.
> 
> You're welcome. It's great to have such a content modeling discussion,
> I believe there are many people who are very interested in these
> concepts.
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: http://www.nabble.com/Node-mapping-question%3A-when-should-I-use-subnode--tf3439621.html#a9627954
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Node mapping question: when should I use subnode?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 3/22/07, Torgeir Veimo <to...@pobox.com> wrote:
> You'd probably want to store tags by their owners, eg
>
> /my:content/users/<user>/tags/<tag>
>
> otherwise you'd have to manage ownership by references, and you'd end
> up with a lot of child nodes under the tags node.

Good point. This really depends on whether you want global tags or
user-specific tags. In fact you could even have both with the proposed
content model, as the tags reference property doesn't really care
where or how the tag nodes are stored.

BR,

Jukka Zitting

Re: Node mapping question: when should I use subnode?

Posted by Torgeir Veimo <to...@pobox.com>.

On 22 Mar 2007, at 11:54, Jukka Zitting wrote:

>    /my:content
>    /my:content/tags
>    /my:content/tags/<tag>
>    /my:content/users
>    /my:content/users/<user>
>    /my:content/workflow
>    /my:content/workflow/<status>
>    /my:content/content
>    /my:content/content/<yyyy>/<mm>/<question>
>    /my:content/content/<yyyy>/<mm>/<question>/<answer>
>
> The /my:content root node is for a clean separation from /jcr:system
> and any other content applications you may want to store within the
> same workspace.
>
> The second level (tags, users, workflow, content) is for partitioning
> your content by type. In future you might want to generalize these
> parts into standalone /my:tags, etc. components that could be used
> with all sorts of content applications.
>
> Tags would be referenceable nodes with whatever metadata properties
> you may want to associate with them. If you want, you could also have
> a tag hierarchy. Question and answer nodes would have a multivalued
> "tags" reference property that points to the associated tags. The tag
> nodes would be named by the tag name

You'd probably want to store tags by their owners, eg

/my:content/users/<user>/tags/<tag>

otherwise you'd have to manage ownership by references, and you'd end  
up with a lot of child nodes under the tags node.

-- 
Torgeir Veimo
torgeir@pobox.com

Re: Node mapping question: when should I use subnode?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 3/22/07, alartin <al...@gmail.com> wrote:
> Thanks for your reply. I wanna write a simple QnA(like yahoo answers, but
> much more simpler) demo of jackrabbit. Every question have tags and user can
> find questions by search box, by status(open,voting,close) or by tag. And
> also, newest questions, the hottest questions(ranked by the  number of
> answers),  the users with the highest scores are shown in the frontpage.
> There will be a timer to calculate all the infomration above.

OK, thanks for the details. It sounds like a date-based hierarchy
could work for you. It wouldn't be directly applicable in the user
interface perspective, but would make administration easy and would
give you a fast way to build the list of most recent questions.

I would go for a main structure like this:

    /my:content
    /my:content/tags
    /my:content/tags/<tag>
    /my:content/users
    /my:content/users/<user>
    /my:content/workflow
    /my:content/workflow/<status>
    /my:content/content
    /my:content/content/<yyyy>/<mm>/<question>
    /my:content/content/<yyyy>/<mm>/<question>/<answer>

The /my:content root node is for a clean separation from /jcr:system
and any other content applications you may want to store within the
same workspace.

The second level (tags, users, workflow, content) is for partitioning
your content by type. In future you might want to generalize these
parts into standalone /my:tags, etc. components that could be used
with all sorts of content applications.

Tags would be referenceable nodes with whatever metadata properties
you may want to associate with them. If you want, you could also have
a tag hierarchy. Question and answer nodes would have a multivalued
"tags" reference property that points to the associated tags. The tag
nodes would be named by the tag name

User nodes would contain whatever user information you want to store.
They would also be referenceable, and you'd have an "author" property
(or perhaps "authors") on the question and answer nodes for linking to
the associated user. User nodes would be named by the username.

The workflow nodes would be used just like tags for tracking the
status of a question. You'd have three referenceable workflow nodes
("open", "voting", "close") and a single-valued "status" property on
the question (and answer?) nodes. This gives you a quick way to list
all nodes in a given state, and also allows you to easily extend the
workflow model if needed. You can also attach all sorts of extra
metadata on the workflow nodes.

The actual content nodes, questions and answers, would be distributed
into a date-based <yyyy>/<mm> tree hierarchy to simplify
administration and to avoid making the content structure too flat.
These content nodes would have the above-mentioned reference
properties and of course any "title" and "content" properties and
extra metadata you need. If you want you could also allow binary
attachments. The content nodes would be named by the title
(potentially encoded) of the node for easy administration and URL
mapping.

A quick and dirty shot at node typing would be:

    [my:user] > mix:referenceable

    [my:tag] > mix:referenceable

    [my:state] > mix:referenceable

    [my:content]
    - title (STRING) mandatory
    - content (STRING) mandatory
    - tags (REFERENCE) multiple mandatory < my:tag
    - authors (REFERENCE) multiple mandatory < my:user
    - status (REFERENCE) mandatory < my:state

    [my:question] > my:content orderable
    + * (my:answer)

    [my:answer] > my:content

You might also want to consider making the types extend
nt:hierarchyNode in which case you could use nt:folder nodes for the
intermediate structure, and also achieve extra interoperability with
generic JCR clients.

> I am eager to hear advices from experts like you. Many thanks again.

You're welcome. It's great to have such a content modeling discussion,
I believe there are many people who are very interested in these
concepts.

BR,

Jukka Zitting

Re: Node mapping question: when should I use subnode?

Posted by alartin <al...@gmail.com>.

Hi Jukka,

Thanks for your reply. I wanna write a simple QnA(like yahoo answers, but
much more simpler) demo of jackrabbit. Every question have tags and user can
find questions by search box, by status(open,voting,close) or by tag. And
also, newest questions, the hottest questions(ranked by the  number of
answers),  the users with the highest scores are shown in the frontpage.
There will be a timer to calculate all the infomration above.
I am eager to hear advices from experts like you. Many thanks again.

Jukka Zitting wrote:
> 
> Hi,
> 
> On 3/22/07, alartin <al...@gmail.com> wrote:
>> Many thanks to Brian. I am not sure about "I'd try to come
>> up with a way to group the my:questions nodes to avoid getting an overly
>> large set of children from a single my:qna node.". There may be ten
>> thousand
>> or more question nodes under one qna node in future. What's the way to
>> group those nodes for better performance?
> 
> What would be the primary way of presenting the information to a user?
> You wouldn't just want to put out a linear list of all those ten
> thousand questions. If you plan to categorize the questions, then
> that's a good candidate for also the internal content model.
> Alternatively, if you're targeting a more blog-like time-based
> sequence of published questions and answers, then a year/month
> structure would probably be good. Some approaches could also organize
> the entries by author or some other characteristic piece of
> information.
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: http://www.nabble.com/Node-mapping-question%3A-when-should-I-use-subnode--tf3439621.html#a9611466
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Node mapping question: when should I use subnode?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 3/22/07, alartin <al...@gmail.com> wrote:
> Many thanks to Brian. I am not sure about "I'd try to come
> up with a way to group the my:questions nodes to avoid getting an overly
> large set of children from a single my:qna node.". There may be ten thousand
> or more question nodes under one qna node in future. What's the way to
> group those nodes for better performance?

What would be the primary way of presenting the information to a user?
You wouldn't just want to put out a linear list of all those ten
thousand questions. If you plan to categorize the questions, then
that's a good candidate for also the internal content model.
Alternatively, if you're targeting a more blog-like time-based
sequence of published questions and answers, then a year/month
structure would probably be good. Some approaches could also organize
the entries by author or some other characteristic piece of
information.

BR,

Jukka Zitting

Re: Node mapping question: when should I use subnode?

Posted by alartin <al...@gmail.com>.

Many thanks to Brian. I am not sure about "I'd try to come
up with a way to group the my:questions nodes to avoid getting an overly
large set of children from a single my:qna node.". There may be ten thousand
or more question nodes under
one qna node in future. What's the way to group those nodes for better
performance?

Brian Thompson-5 wrote:
> 
> Hi,
> 
> I'd suggest going with option 1 from your original email.  It's the most
> natural mapping of application data to JCR nodes.  For getting all answers
> or all comments, I'd use a search query (very simple in SQL syntax:  Just
> do
> "select * from my:answer").
> 
> Given what I've read about Jackrabbit performance on very large, flat
> (>5000
> nodes as first-level children of a single node) node trees, I'd try to
> come
> up with a way to group the my:questions nodes to avoid getting an overly
> large set of children from a single my:qna node.
> 
> It also seems to me that using references to keep track of which comments
> are associated with a given answer would be simulating the organization of
> RDBMS tables.  Better, IMO, to keep it with a simple, easily-understood
> node
> structure.
> 
> Hope this helps,
> 
> -Brian
> 
> 
> On 3/21/07, alartin <al...@gmail.com> wrote:
>>
>>
>> Hi all,
>>   I am trying to write a QnA (question and answer) demo of jackrabbit and
>> have a few questions about Object and Content Mapping.
>>   Given three objects: qeustion, answer, and comment. One question may
>> has
>> many answers and comments; one answer may have many comments. Answer and
>> comment can not exist alone.
>>   In OCM, I have two choice:
>> 1. subnodes:
>>             1 root -- 1 my:qna -- * my:question
>>                                                     |__  * my:answer  ___
>>                                                     |__  *
>> my:comment    |__
>> * my:comment
>> 2. same level(use reference):
>>             1 root -- 1 my:qna -- 1 my:questions
>>                                                              |__ *
>> my:question
>>                                        -- 1 my:answers
>>                                                              |__ *
>> my:answer
>>                                        -- 1 my:comments
>>                                                                |__ *
>> my:comment
>>    I can use the multi-value property to store the references. For
>> example,
>> one my:question has a multi-value property named "answers" and each value
>> is
>> the uuid of one my:answer node and each my:answer hold the uuid of the
>> question node.
>> My question is : what's the difference between the two choice?
>> If I need do a lot of access or calculating to answers or comments, is it
>> better to choose the 2th choice? Thus, I do not need to iterate all
>> questions to find all answers or comments.
>> The second question is: Is it important to make a single my:questions
>> node/my:answers node/my:comments node?
>> If not, there are many different nodes in one level. Is it a big problem
>> in
>> the future such as searching performance?
>> Many thanks!
>> --
>> View this message in context:
>> http://www.nabble.com/Node-mapping-question%3A-when-should-I-use-subnode--tf3439621.html#a9590761
>> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Node-mapping-question%3A-when-should-I-use-subnode--tf3439621.html#a9604685
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Node mapping question: when should I use subnode?

Posted by Brian Thompson <el...@gmail.com>.

Hi,

I'd suggest going with option 1 from your original email.  It's the most
natural mapping of application data to JCR nodes.  For getting all answers
or all comments, I'd use a search query (very simple in SQL syntax:  Just do
"select * from my:answer").

Given what I've read about Jackrabbit performance on very large, flat (>5000
nodes as first-level children of a single node) node trees, I'd try to come
up with a way to group the my:questions nodes to avoid getting an overly
large set of children from a single my:qna node.

It also seems to me that using references to keep track of which comments
are associated with a given answer would be simulating the organization of
RDBMS tables.  Better, IMO, to keep it with a simple, easily-understood node
structure.

Hope this helps,

-Brian


On 3/21/07, alartin <al...@gmail.com> wrote:
>
>
> Hi all,
>   I am trying to write a QnA (question and answer) demo of jackrabbit and
> have a few questions about Object and Content Mapping.
>   Given three objects: qeustion, answer, and comment. One question may has
> many answers and comments; one answer may have many comments. Answer and
> comment can not exist alone.
>   In OCM, I have two choice:
> 1. subnodes:
>             1 root -- 1 my:qna -- * my:question
>                                                     |__  * my:answer  ___
>                                                     |__  *
> my:comment    |__
> * my:comment
> 2. same level(use reference):
>             1 root -- 1 my:qna -- 1 my:questions
>                                                              |__ *
> my:question
>                                        -- 1 my:answers
>                                                              |__ *
> my:answer
>                                        -- 1 my:comments
>                                                                |__ *
> my:comment
>    I can use the multi-value property to store the references. For
> example,
> one my:question has a multi-value property named "answers" and each value
> is
> the uuid of one my:answer node and each my:answer hold the uuid of the
> question node.
> My question is : what's the difference between the two choice?
> If I need do a lot of access or calculating to answers or comments, is it
> better to choose the 2th choice? Thus, I do not need to iterate all
> questions to find all answers or comments.
> The second question is: Is it important to make a single my:questions
> node/my:answers node/my:comments node?
> If not, there are many different nodes in one level. Is it a big problem
> in
> the future such as searching performance?
> Many thanks!
> --
> View this message in context:
> http://www.nabble.com/Node-mapping-question%3A-when-should-I-use-subnode--tf3439621.html#a9590761
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>
>