You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sumit Arora <su...@gmail.com> on 2010/08/26 11:54:09 UTC

How to do ? Articles and Its Associated Comments Indexing , One to Many relationship

I have set of Articles and then Comments on it, so in database I have two
major tables one for Articles and one for Comments, but each Article could
have many comments (One to Many).


If One Article will have 20 Comments, then on DB to SOLR - Index - Sync :
Solr will index 20 Similar Documents with a difference of each Comment.


Use Case :

On Search: If keyword would be a fit to more than one comment, then it will
return duplicate documents.


One Possible solution I thought to Apply:

******************************************

I should go for Indexing 20 Similar Documents with a difference of each
Comment.


While retrieving results from Query: I could use: collapse.field = By
Article Id


Am I following right approach?

Re: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship

Posted by Erick Erickson <er...@gmail.com>.
See below...

On Thu, Aug 26, 2010 at 4:31 AM, Sumit Arora <su...@gmail.com> wrote:

> Thanks Ephraim for your response.
>
> If I use MultiValued for Comments Field then While Picking data from Solr,
> Should I use following Logic :
>
> /*  Sample PseudoCode */
>
> Get Rows from Article and Article-Comments Table ;  *// It will retrieve -
> 1
> Article and 20 Comments*
>
> Begin;
>
> Include 'Article Fields Value' in 'Solr Fields Value' Defined in Schema.Xml
>  */* One Article in this Case, So it will generate one document id for Solr
> - */*
>
> Comments = 0;
>
> While (Comments ! = 20 )
>
> {
>   Include this Comment;
>
>   ++Comments;
> }
>
> End;
>
> Result : One Article with MultipleComments as MultiValued indexed in Solr,
> Finally Solr will have only one document or multiple document ?
>
>
A multi-valued field is just what it says, a field within a single
document. So you'd have one document with 20 values for
your comment field.

However, note that SOLR doesn't have partial updates of a document,
it deletes and re-adds a document when you update. This is handled
automatically for you if you have a uniquekey defined. That is, if
you add a new document with the SAME unique key as a previous
document, the previous one will be removed and the new one
will replace it (with a new internal document id).


> If I suppose to use HighLight Text in this case, and Search - Keyword exist
> in more than one Comments ? How I can achieve below result where it has
> found 'web' keyword exist in two comments.
>
> ... 1.The *web* portal will connect a lot of people for some specific
> domain, and then people can post their interesting story, upload files
>
>  ... 2.1 accessing multiple sites will slow down the user experience - try
> not to do it. *web* hosting is not too expensive as compared to the other
> components ...
>
>
>
I believe this is controlled by the hl.fragsize, see:
http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize

The other thing you should be aware of is "increment gap". This
is useful if you want, say, phrase queries to NOT work across
two comments. I.e.
comment 1: comments are very nice
comment 2: day in and day out

If you don't want a phrase query "nice day" to match the
enclosing document, you probably want to work with the
positionIncrementGap. See:
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html

Best
Erick


>
>
> On Thu, Aug 26, 2010 at 4:32 PM, Ephraim Ofir <Ep...@icq.com> wrote:
>
> > Why not define the comment field as multiValued? That way you only index
> > each document once and you don't need to collapse anything...
> >
> > Ephraim Ofir
> >
> >
> > -----Original Message-----
> > From: Sumit Arora [mailto:sumit1234@gmail.com]
> > Sent: Thursday, August 26, 2010 12:54 PM
> > To: solr-user@lucene.apache.org
> > Subject: How to do ? Articles and Its Associated Comments Indexing , One
> > to Many relationship
> >
> > I have set of Articles and then Comments on it, so in database I have
> > two
> > major tables one for Articles and one for Comments, but each Article
> > could
> > have many comments (One to Many).
> >
> >
> > If One Article will have 20 Comments, then on DB to SOLR - Index - Sync
> > :
> > Solr will index 20 Similar Documents with a difference of each Comment.
> >
> >
> > Use Case :
> >
> > On Search: If keyword would be a fit to more than one comment, then it
> > will
> > return duplicate documents.
> >
> >
> > One Possible solution I thought to Apply:
> >
> > ******************************************
> >
> > I should go for Indexing 20 Similar Documents with a difference of each
> > Comment.
> >
> >
> > While retrieving results from Query: I could use: collapse.field = By
> > Article Id
> >
> >
> > Am I following right approach?
> >
>

Re: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship

Posted by Sumit Arora <su...@gmail.com>.
Thanks Ephraim for your response.

If I use MultiValued for Comments Field then While Picking data from Solr,
Should I use following Logic :

/*  Sample PseudoCode */

Get Rows from Article and Article-Comments Table ;  *// It will retrieve - 1
Article and 20 Comments*

Begin;

Include 'Article Fields Value' in 'Solr Fields Value' Defined in Schema.Xml
 */* One Article in this Case, So it will generate one document id for Solr
- */*

Comments = 0;

While (Comments ! = 20 )

{
   Include this Comment;

   ++Comments;
}

End;

Result : One Article with MultipleComments as MultiValued indexed in Solr,
Finally Solr will have only one document or multiple document ?

If I suppose to use HighLight Text in this case, and Search - Keyword exist
in more than one Comments ? How I can achieve below result where it has
found 'web' keyword exist in two comments.

... 1.The *web* portal will connect a lot of people for some specific
domain, and then people can post their interesting story, upload files

 ... 2.1 accessing multiple sites will slow down the user experience - try
not to do it. *web* hosting is not too expensive as compared to the other
components ...




On Thu, Aug 26, 2010 at 4:32 PM, Ephraim Ofir <Ep...@icq.com> wrote:

> Why not define the comment field as multiValued? That way you only index
> each document once and you don't need to collapse anything...
>
> Ephraim Ofir
>
>
> -----Original Message-----
> From: Sumit Arora [mailto:sumit1234@gmail.com]
> Sent: Thursday, August 26, 2010 12:54 PM
> To: solr-user@lucene.apache.org
> Subject: How to do ? Articles and Its Associated Comments Indexing , One
> to Many relationship
>
> I have set of Articles and then Comments on it, so in database I have
> two
> major tables one for Articles and one for Comments, but each Article
> could
> have many comments (One to Many).
>
>
> If One Article will have 20 Comments, then on DB to SOLR - Index - Sync
> :
> Solr will index 20 Similar Documents with a difference of each Comment.
>
>
> Use Case :
>
> On Search: If keyword would be a fit to more than one comment, then it
> will
> return duplicate documents.
>
>
> One Possible solution I thought to Apply:
>
> ******************************************
>
> I should go for Indexing 20 Similar Documents with a difference of each
> Comment.
>
>
> While retrieving results from Query: I could use: collapse.field = By
> Article Id
>
>
> Am I following right approach?
>

RE: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship

Posted by Ephraim Ofir <Ep...@icq.com>.
Why not define the comment field as multiValued? That way you only index
each document once and you don't need to collapse anything...

Ephraim Ofir


-----Original Message-----
From: Sumit Arora [mailto:sumit1234@gmail.com] 
Sent: Thursday, August 26, 2010 12:54 PM
To: solr-user@lucene.apache.org
Subject: How to do ? Articles and Its Associated Comments Indexing , One
to Many relationship

I have set of Articles and then Comments on it, so in database I have
two
major tables one for Articles and one for Comments, but each Article
could
have many comments (One to Many).


If One Article will have 20 Comments, then on DB to SOLR - Index - Sync
:
Solr will index 20 Similar Documents with a difference of each Comment.


Use Case :

On Search: If keyword would be a fit to more than one comment, then it
will
return duplicate documents.


One Possible solution I thought to Apply:

******************************************

I should go for Indexing 20 Similar Documents with a difference of each
Comment.


While retrieving results from Query: I could use: collapse.field = By
Article Id


Am I following right approach?