You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by samabhiK <qe...@gmail.com> on 2013/05/13 11:24:40 UTC

Best way to design a "story and comments" schema.

Hi, I wish to know how to best design a schema to store comments in stories /
articles posted.
I have a set of fields:
   /   &lt;field name=&quot;subject&quot; type=&quot;text_general&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
   &lt;field name=&quot;keywords&quot; type=&quot;text_general&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
   &lt;field name=&quot;category&quot; type=&quot;text_general&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
   &lt;field name=&quot;content&quot; type=&quot;text_general&quot;
indexed=&quot;false&quot; stored=&quot;true&quot; /&gt;   /
Users can post their comments on a post and I should be able to retrieve
these comments and show it along side the original post. I only need to show
the last 3 comments and show a facet of the remaining comments which user
can click and see the rest of the comments ( something like facebook does ).
One alternative, I could think of, was adding a dynamic field for all
comments : 
/&lt;dynamicField name=&quot;comment_*&quot;  type=&quot;string&quot; 
indexed=&quot;false&quot;  stored=&quot;true&quot;/&gt;/
So, to store each comments, I would send a text to solr of the form ->
For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
Comment Text]/
And to keep the count of those comments, I could use another field like so
:/&lt;field name=&quot;comment_count&quot; type=&quot;int&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;/
With this approach, I will have to do some calculation when a comment is
deleted by the user but I still can manage to show the comments right.
My idea is to find the best solution for this scenario which will be fast
and also be simple. 
Kindly suggest.



--
View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to design a "story and comments" schema.

Posted by samabhiK <qe...@gmail.com>.
I think I got your point.

So, what I will create are three cores (or collections) - one for the users,
one for the stories and the last one for comments. 

When I need to find all the stories posted by a single user, I first need to
search the stories core with a unique userid in the filter and then run
another query to fetch the collection of comments. Correct?

Also, I have no such requirement to search through the comments and its
mostly a storage filed for me. So, do you think I should shift that into a
DB from where I may query the comments? Or will it be too costly for Solr to
just plain store that data in a core? Which would be the best option here?

Also, the idea of custom search component sounds great. But as you said, I
will first try this out with a simple possible setup and then go from there.





--
View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867p4062929.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to design a "story and comments" schema.

Posted by Jack Krupansky <ja...@basetechnology.com>.
There are no transactions in Solr. Delete the Story and then the comments.

"Core" is just the old Solr terminology. A "collection" is the data itself, 
like the data on the disk. And with SolrCloud, the collection terminology is 
required.

How much data will hou have. I mean, a news article could have thousands of 
comments. Do you want to be able to search through them? Solr has no 
provision for searching across an arbitrary number of dynamic fields. I 
mean, if you want a query to search in a field, you need to name the field 
in either the query, or "qf" even for dismax, which makes query across 
arbitrary columns unworkable.

Multiple HTTP requests should not be a problem, especially if each of them 
is shorter. Are you running into some problem?

Technically, you could also do a custom search component that did a lot of 
the multi-query processing inside Solr, but once again, it is best to start 
with a simple design first.

-- Jack Krupansky

-----Original Message----- 
From: samabhiK
Sent: Monday, May 13, 2013 8:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Best way to design a "story and comments" schema.

Thanks for your reply.

I generally get confused by a collection and a core. But just FYI, I do have
two cores at the moment - one for the users and another for the Stories.
Initially I thought of adding an extra core for the Comments too but
realized that it would mean multiple HTTP calls to fetch both the story and
the comments. Also, when a story is deleted, so should be its comments.
Having that spread across two cores might cause issues with transaction when
I delete the story and try to delete the respective comments? Or when I
delete the User and all hos stories and comments?

I really wish to understand how that works.

Sam







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867p4062913.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Best way to design a "story and comments" schema.

Posted by samabhiK <qe...@gmail.com>.
Thanks for your reply.

I generally get confused by a collection and a core. But just FYI, I do have
two cores at the moment - one for the users and another for the Stories.
Initially I thought of adding an extra core for the Comments too but
realized that it would mean multiple HTTP calls to fetch both the story and
the comments. Also, when a story is deleted, so should be its comments.
Having that spread across two cores might cause issues with transaction when
I delete the story and try to delete the respective comments? Or when I
delete the User and all hos stories and comments?

I really wish to understand how that works.

Sam



 



--
View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867p4062913.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to design a "story and comments" schema.

Posted by Jack Krupansky <ja...@basetechnology.com>.
Multi-valued fields don't have the same full support as simple fields and 
documents (since they are effectively a sub-document). Although we do now 
have the ability to "add" to a multi-valued field with atomic update, we 
can't directly edit them, like delete/replace the kth item or insert 
before/after an item, sort them by various criteria, etc. And a query won't 
tell you which entry matched. And you can't narrow your query to search a 
subset of a multi-valued field.

They do work well for "short" lists, but not "Big Data". Listing a "few" 
authors for a book is fine. But trying to do hundreds, thousands, and more, 
is quite problematic. There was a recent issue on the list about how 
multi-valued field values are sometimes handled inefficiently in Solr.

-- Jack Krupansky

-----Original Message----- 
From: Jack Park
Sent: Monday, May 13, 2013 9:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Best way to design a "story and comments" schema.

Jack,

Why are multi-valued fields considered messy?
I think I am about to learn something..

Thanks
Another Jack

On Mon, May 13, 2013 at 5:29 AM, Jack Krupansky <ja...@basetechnology.com> 
wrote:
> Try the simplest, cleanest design first (at least on paper), before you
> start resorting to either dynamic fields or multi-valued fields or other
> messy approaches. Like, one collection for stories, which would have a 
> story
> id and a second collection for comments, each with a comment id and a 
> field
> that is the associated story id and user id. And a third collection for
> users and their profiles. Identify the user and get their user id. 
> Identify
> the story (maybe by keyword search) to get story id. Then identify and 
> facet
> user comments by story id and user id and whatever other search criteria,
> and then facet on that.
>
> -- Jack Krupansky
>
> -----Original Message----- From: samabhiK
> Sent: Monday, May 13, 2013 5:24 AM
> To: solr-user@lucene.apache.org
> Subject: Best way to design a "story and comments" schema.
>
>
> Hi, I wish to know how to best design a schema to store comments in 
> stories
> /
> articles posted.
> I have a set of fields:
>   /   &lt;field name=&quot;subject&quot; type=&quot;text_general&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
>   &lt;field name=&quot;keywords&quot; type=&quot;text_general&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
>   &lt;field name=&quot;category&quot; type=&quot;text_general&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
>   &lt;field name=&quot;content&quot; type=&quot;text_general&quot;
> indexed=&quot;false&quot; stored=&quot;true&quot; /&gt;   /
> Users can post their comments on a post and I should be able to retrieve
> these comments and show it along side the original post. I only need to 
> show
> the last 3 comments and show a facet of the remaining comments which user
> can click and see the rest of the comments ( something like facebook 
> does ).
> One alternative, I could think of, was adding a dynamic field for all
> comments :
> /&lt;dynamicField name=&quot;comment_*&quot;  type=&quot;string&quot;
> indexed=&quot;false&quot;  stored=&quot;true&quot;/&gt;/
> So, to store each comments, I would send a text to solr of the form ->
> For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
> Comment Text]/
> And to keep the count of those comments, I could use another field like so
> :/&lt;field name=&quot;comment_count&quot; type=&quot;int&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;/
> With this approach, I will have to do some calculation when a comment is
> deleted by the user but I still can manage to show the comments right.
> My idea is to find the best solution for this scenario which will be fast
> and also be simple.
> Kindly suggest.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
> Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Best way to design a "story and comments" schema.

Posted by Jack Park <ja...@topicquests.org>.
Jack,

Why are multi-valued fields considered messy?
I think I am about to learn something..

Thanks
Another Jack

On Mon, May 13, 2013 at 5:29 AM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Try the simplest, cleanest design first (at least on paper), before you
> start resorting to either dynamic fields or multi-valued fields or other
> messy approaches. Like, one collection for stories, which would have a story
> id and a second collection for comments, each with a comment id and a field
> that is the associated story id and user id. And a third collection for
> users and their profiles. Identify the user and get their user id. Identify
> the story (maybe by keyword search) to get story id. Then identify and facet
> user comments by story id and user id and whatever other search criteria,
> and then facet on that.
>
> -- Jack Krupansky
>
> -----Original Message----- From: samabhiK
> Sent: Monday, May 13, 2013 5:24 AM
> To: solr-user@lucene.apache.org
> Subject: Best way to design a "story and comments" schema.
>
>
> Hi, I wish to know how to best design a schema to store comments in stories
> /
> articles posted.
> I have a set of fields:
>   /   &lt;field name=&quot;subject&quot; type=&quot;text_general&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
>   &lt;field name=&quot;keywords&quot; type=&quot;text_general&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
>   &lt;field name=&quot;category&quot; type=&quot;text_general&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
>   &lt;field name=&quot;content&quot; type=&quot;text_general&quot;
> indexed=&quot;false&quot; stored=&quot;true&quot; /&gt;   /
> Users can post their comments on a post and I should be able to retrieve
> these comments and show it along side the original post. I only need to show
> the last 3 comments and show a facet of the remaining comments which user
> can click and see the rest of the comments ( something like facebook does ).
> One alternative, I could think of, was adding a dynamic field for all
> comments :
> /&lt;dynamicField name=&quot;comment_*&quot;  type=&quot;string&quot;
> indexed=&quot;false&quot;  stored=&quot;true&quot;/&gt;/
> So, to store each comments, I would send a text to solr of the form ->
> For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
> Comment Text]/
> And to keep the count of those comments, I could use another field like so
> :/&lt;field name=&quot;comment_count&quot; type=&quot;int&quot;
> indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;/
> With this approach, I will have to do some calculation when a comment is
> deleted by the user but I still can manage to show the comments right.
> My idea is to find the best solution for this scenario which will be fast
> and also be simple.
> Kindly suggest.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to design a "story and comments" schema.

Posted by Jack Krupansky <ja...@basetechnology.com>.
Try the simplest, cleanest design first (at least on paper), before you 
start resorting to either dynamic fields or multi-valued fields or other 
messy approaches. Like, one collection for stories, which would have a story 
id and a second collection for comments, each with a comment id and a field 
that is the associated story id and user id. And a third collection for 
users and their profiles. Identify the user and get their user id. Identify 
the story (maybe by keyword search) to get story id. Then identify and facet 
user comments by story id and user id and whatever other search criteria, 
and then facet on that.

-- Jack Krupansky

-----Original Message----- 
From: samabhiK
Sent: Monday, May 13, 2013 5:24 AM
To: solr-user@lucene.apache.org
Subject: Best way to design a "story and comments" schema.

Hi, I wish to know how to best design a schema to store comments in stories 
/
articles posted.
I have a set of fields:
   /   &lt;field name=&quot;subject&quot; type=&quot;text_general&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
   &lt;field name=&quot;keywords&quot; type=&quot;text_general&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
   &lt;field name=&quot;category&quot; type=&quot;text_general&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;
   &lt;field name=&quot;content&quot; type=&quot;text_general&quot;
indexed=&quot;false&quot; stored=&quot;true&quot; /&gt;   /
Users can post their comments on a post and I should be able to retrieve
these comments and show it along side the original post. I only need to show
the last 3 comments and show a facet of the remaining comments which user
can click and see the rest of the comments ( something like facebook does ).
One alternative, I could think of, was adding a dynamic field for all
comments :
/&lt;dynamicField name=&quot;comment_*&quot;  type=&quot;string&quot;
indexed=&quot;false&quot;  stored=&quot;true&quot;/&gt;/
So, to store each comments, I would send a text to solr of the form ->
For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
Comment Text]/
And to keep the count of those comments, I could use another field like so
:/&lt;field name=&quot;comment_count&quot; type=&quot;int&quot;
indexed=&quot;true&quot; stored=&quot;true&quot;/&gt;/
With this approach, I will have to do some calculation when a comment is
deleted by the user but I still can manage to show the comments right.
My idea is to find the best solution for this scenario which will be fast
and also be simple.
Kindly suggest.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
Sent from the Solr - User mailing list archive at Nabble.com.