You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2009/06/02 19:47:21 UTC

Re: Avoid duplicates in MoreLikeThis using field collapsing

But why does MLT return duplicates in the first place?  That seems strange to me.  If there are no duplicates in your index, how does MLT manage to return dupes?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Marc Sturlese <ma...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, May 29, 2009 7:05:15 AM
> Subject: Avoid duplicates in MoreLikeThis using field collapsing
> 
> 
> Hey there, 
> I am testing MoreLikeThis feaure (with MoreLikeThis component and with
> MoreLikeThis handler) and I am getting lots of duplicates. I have noticed
> that lots of the similar documents returned are duplicates. To avoid that I
> have tried to use the field collapsing patch but it's not taking effect.
> 
> In case of MoreLikeThis handler I think it's normal has I have seen it
> extends directly from RequestHandlerBase.java and not from
> SearchHandler.java that is the one that in the function handleRequestBody
> will deal with components:
> 
>       for( SearchComponent c : components ) {
>         rb.setTimer( subt.sub( c.getName() ) );
>         c.prepare(rb);
>         rb.getTimer().stop();
>       }
> 
> To sort it out I have "embbed" the collapseFilter in the getMoreLikeThis
> method of the MoreLikeThisHandler.java
> This is working alrite but would like to know if is there any more polite
> way to make MoreLikeThisHandler able to deal with components. I mean via
> solrconfig.xml or "pluging" something instead of "hacking" it.
> 
> Thanks in advance
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Avoid-duplicates-in-MoreLikeThis-using-field-collapsing-tp23778054p23778054.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Avoid duplicates in MoreLikeThis using field collapsing

Posted by Marc Sturlese <ma...@gmail.com>.
With DeDuplication path I create a signature field to control duplicates wich
is a MD5 of 3 different fields:
hashField = hash (fieldA + fieldB +fieldC)

With MoreLikeThis I want to show fieldA
There are documents that DeDuplication will not consider duplicates because
filedC was diferent for each. However fieldA is exaclty the same. These are
the duplicate documents that MoreLikeThis is showing me.

Hope I explained myself more or less ok...




Otis Gospodnetic wrote:
> 
> 
> But why does MLT return duplicates in the first place?  That seems strange
> to me.  If there are no duplicates in your index, how does MLT manage to
> return dupes?
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: Marc Sturlese <ma...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, May 29, 2009 7:05:15 AM
>> Subject: Avoid duplicates in MoreLikeThis using field collapsing
>> 
>> 
>> Hey there, 
>> I am testing MoreLikeThis feaure (with MoreLikeThis component and with
>> MoreLikeThis handler) and I am getting lots of duplicates. I have noticed
>> that lots of the similar documents returned are duplicates. To avoid that
>> I
>> have tried to use the field collapsing patch but it's not taking effect.
>> 
>> In case of MoreLikeThis handler I think it's normal has I have seen it
>> extends directly from RequestHandlerBase.java and not from
>> SearchHandler.java that is the one that in the function handleRequestBody
>> will deal with components:
>> 
>>       for( SearchComponent c : components ) {
>>         rb.setTimer( subt.sub( c.getName() ) );
>>         c.prepare(rb);
>>         rb.getTimer().stop();
>>       }
>> 
>> To sort it out I have "embbed" the collapseFilter in the getMoreLikeThis
>> method of the MoreLikeThisHandler.java
>> This is working alrite but would like to know if is there any more polite
>> way to make MoreLikeThisHandler able to deal with components. I mean via
>> solrconfig.xml or "pluging" something instead of "hacking" it.
>> 
>> Thanks in advance
>> 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Avoid-duplicates-in-MoreLikeThis-using-field-collapsing-tp23778054p23778054.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Avoid-duplicates-in-MoreLikeThis-using-field-collapsing-tp23778054p23837785.html
Sent from the Solr - User mailing list archive at Nabble.com.