You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by aldana <al...@gmx.de> on 2009/11/06 18:24:24 UTC

MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs

hi,

i am using MoreLikeThis handler to query similar ads. inside index are docs:
-------
X, which is the base for more-like-this query

A
B1
B2 <- identical to B
B1' <- not-identical but very similar to B
X' <- not-identical but very similar to X
------

in the query result i expect {A,B1} to be returned. the very similar
{A,B2,B1',X'} should be discarded.

looking at http://wiki.apache.org/solr/MoreLikeThis i cannot see any option,
how to achieve this. or maybe there is a trick when configuring mlt.qf?

what i would expect in configuration is:
-possibility to pass distance function for certain fields
-for distance function define an upper threshold, so too similar docs are
excluded (so kind of a 'negative' boost)






-----
manuel aldana
aldana((at))gmx.de
software-engineering blog: http://www.aldana-online.de
-- 
View this message in context: http://old.nabble.com/MoreLikeThis-SearchHandler%2C-offer-string-distance-to-avoid-duplicate-return-docs-tp26230839p26230839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

I'd start by looking at SOLR-236 and looking for a place where hit-hit similarity could be plugged in instead of looking for hit-hit pairs with identical fields.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: aldana <al...@gmx.de>
> To: solr-user@lucene.apache.org
> Sent: Fri, November 6, 2009 12:24:24 PM
> Subject: MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs
> 
> 
> hi,
> 
> i am using MoreLikeThis handler to query similar ads. inside index are docs:
> -------
> X, which is the base for more-like-this query
> 
> A
> B1
> B2 <- identical to B
> B1' <- not-identical but very similar to B
> X' <- not-identical but very similar to X
> ------
> 
> in the query result i expect {A,B1} to be returned. the very similar
> {A,B2,B1',X'} should be discarded.
> 
> looking at http://wiki.apache.org/solr/MoreLikeThis i cannot see any option,
> how to achieve this. or maybe there is a trick when configuring mlt.qf?
> 
> what i would expect in configuration is:
> -possibility to pass distance function for certain fields
> -for distance function define an upper threshold, so too similar docs are
> excluded (so kind of a 'negative' boost)
> 
> 
> 
> 
> 
> 
> -----
> manuel aldana
> aldana((at))gmx.de
> software-engineering blog: http://www.aldana-online.de
> -- 
> View this message in context: 
> http://old.nabble.com/MoreLikeThis-SearchHandler%2C-offer-string-distance-to-avoid-duplicate-return-docs-tp26230839p26230839.html
> Sent from the Solr - User mailing list archive at Nabble.com.