You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by aldana <al...@gmx.de> on 2009/11/06 18:24:24 UTC
MoreLikeThis SearchHandler, offer string-distance to avoid
duplicate return-docs
hi,
i am using MoreLikeThis handler to query similar ads. inside index are docs:
-------
X, which is the base for more-like-this query
A
B1
B2 <- identical to B
B1' <- not-identical but very similar to B
X' <- not-identical but very similar to X
------
in the query result i expect {A,B1} to be returned. the very similar
{A,B2,B1',X'} should be discarded.
looking at http://wiki.apache.org/solr/MoreLikeThis i cannot see any option,
how to achieve this. or maybe there is a trick when configuring mlt.qf?
what i would expect in configuration is:
-possibility to pass distance function for certain fields
-for distance function define an upper threshold, so too similar docs are
excluded (so kind of a 'negative' boost)
-----
manuel aldana
aldana((at))gmx.de
software-engineering blog: http://www.aldana-online.de
--
View this message in context: http://old.nabble.com/MoreLikeThis-SearchHandler%2C-offer-string-distance-to-avoid-duplicate-return-docs-tp26230839p26230839.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,
I'd start by looking at SOLR-236 and looking for a place where hit-hit similarity could be plugged in instead of looking for hit-hit pairs with identical fields.
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
----- Original Message ----
> From: aldana <al...@gmx.de>
> To: solr-user@lucene.apache.org
> Sent: Fri, November 6, 2009 12:24:24 PM
> Subject: MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs
>
>
> hi,
>
> i am using MoreLikeThis handler to query similar ads. inside index are docs:
> -------
> X, which is the base for more-like-this query
>
> A
> B1
> B2 <- identical to B
> B1' <- not-identical but very similar to B
> X' <- not-identical but very similar to X
> ------
>
> in the query result i expect {A,B1} to be returned. the very similar
> {A,B2,B1',X'} should be discarded.
>
> looking at http://wiki.apache.org/solr/MoreLikeThis i cannot see any option,
> how to achieve this. or maybe there is a trick when configuring mlt.qf?
>
> what i would expect in configuration is:
> -possibility to pass distance function for certain fields
> -for distance function define an upper threshold, so too similar docs are
> excluded (so kind of a 'negative' boost)
>
>
>
>
>
>
> -----
> manuel aldana
> aldana((at))gmx.de
> software-engineering blog: http://www.aldana-online.de
> --
> View this message in context:
> http://old.nabble.com/MoreLikeThis-SearchHandler%2C-offer-string-distance-to-avoid-duplicate-return-docs-tp26230839p26230839.html
> Sent from the Solr - User mailing list archive at Nabble.com.