You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ahammad <ah...@gmail.com> on 2010/11/03 19:41:35 UTC

Question about morelikethis and multiple fields

Hello,

I'm trying to implement a "Related Articles" feature within my search
application using the mlt handler.

To give you a little background information, my Solr index contains a single
core that is created by merging 10+ other cores. Within this core is my main
data item known as an "article"; however, there are other data items like
"technical documents", "tickets", etc.

When a user opens an article on my web application, I want to show "Related
Articles" based on 2 fields (title and body). I am using SolrJ as a back-end
for this .

The way I'm thinking of doing it is to search on the title of the existing
article, and hope that the first hit is that actual article. This works in
most of the cases, but occasionally it grabs either the wrong article or a
different type of data item altogether (the first hit my be a technical
document, which is totally unrelated to articles). The following is my
query:

?qt=%2Fmlt&mlt.match.include=true&mlt.mindf=1&mlt.mintf=1&mlt.fl=title,body&q=<search
string>&fq=dataItem:article&debugQuery=true

There is one main thing that I noticed is that this only seems to match on
the "body" field and not the "title" field. I think it's doing what it's
supposed to and I'm not fully grasping the idea of mlt.

So when it does the initial search to find the document against which it
will find related articles, what search handlers would it use? Normally, my
queries are carried out using dismax with some boosting functionality
applied to them. When I use the standard query handler however, with the qt
parameter defining mlt, what happens for the initial search?

Also, if anybody can suggest an alternative implementation to this I would
greatly appreciate it. Like I said, it's entirely possible that I don't
fully understand mlt and it's causing me to implement stuff in a weird way.

Thanks/

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Question-about-morelikethis-and-multiple-fields-tp1836778p1836778.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about morelikethis and multiple fields

Posted by ahammad <ah...@gmail.com>.
I don't quite understand what you mean by that. Did you mean TermVector
Components?

Also, I did some more digging and I found some messages on this mailing list
about filtering. From what I understand, using the standard query handler
(solr/select/?q=...) with a qt parameter allows you to filter on the initial
response using the fq parameter. While this is not a perfect solution for my
application, it will greatly reduce any errors that I may get in the data.
However, when I tried fq, all it's doing is filtering on the result set from
the mlt handler, not the initial response. I need to filter on both the
initial response and the result set.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Question-about-morelikethis-and-multiple-fields-tp1836778p1837351.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about morelikethis and multiple fields

Posted by da...@ontrenet.com.
Try adding TFV's (term frequency vectors) to the title field as well as
the body.

On Wed, 3 Nov 2010 11:41:35 -0700 (PDT), ahammad <ah...@gmail.com>
wrote:
> Hello,
> 
> I'm trying to implement a "Related Articles" feature within my search
> application using the mlt handler.
> 
> To give you a little background information, my Solr index contains a
> single
> core that is created by merging 10+ other cores. Within this core is my
> main
> data item known as an "article"; however, there are other data items
like
> "technical documents", "tickets", etc.
> 
> When a user opens an article on my web application, I want to show
"Related
> Articles" based on 2 fields (title and body). I am using SolrJ as a
> back-end
> for this .
> 
> The way I'm thinking of doing it is to search on the title of the
existing
> article, and hope that the first hit is that actual article. This works
in
> most of the cases, but occasionally it grabs either the wrong article or
a
> different type of data item altogether (the first hit my be a technical
> document, which is totally unrelated to articles). The following is my
> query:
> 
>
?qt=%2Fmlt&mlt.match.include=true&mlt.mindf=1&mlt.mintf=1&mlt.fl=title,body&q=<search
> string>&fq=dataItem:article&debugQuery=true
> 
> There is one main thing that I noticed is that this only seems to match
on
> the "body" field and not the "title" field. I think it's doing what it's
> supposed to and I'm not fully grasping the idea of mlt.
> 
> So when it does the initial search to find the document against which it
> will find related articles, what search handlers would it use? Normally,
my
> queries are carried out using dismax with some boosting functionality
> applied to them. When I use the standard query handler however, with the
qt
> parameter defining mlt, what happens for the initial search?
> 
> Also, if anybody can suggest an alternative implementation to this I
would
> greatly appreciate it. Like I said, it's entirely possible that I don't
> fully understand mlt and it's causing me to implement stuff in a weird
way.
> 
> Thanks/