You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Benjamin Murauer <b....@gmail.com> on 2012/06/10 13:32:54 UTC

x most similar documents

Hi there,
i have a solr server running containing tweets. my schema.xml contains
following fields:

<fields>
 <field name="id" type="string" indexed="true" stored="true"
 required="true" />
 <field name="tweet" type="text_general"
indexed="true" stored="true" termVectors="true"/>
 <field
name="hashtags" type="text_general" indexed="true" stored="true"
termVectors="true"/>
</fields>

my problem is actually quite simple; somewhere in my GUI the user types
text and i want to retrieve tweets that are most similar to it.
Therefore, i tried the "morelikethis" functionality. My problem is that
currently, mlt finds additional tweets to every tweet found by the
"select" handler. I'm not sure however if the select handler finds the
most fitting tweet or just returns the first match. currently, i am
using following query:

http://localhost:8983/solr/select/?q=tweet:heaven&mlt=true&mlt.fl=tweet,hashtags&wt=json&indent=true

Am i missing something critical? So eventually, i just want to retrieve
x tweets with the most similar text, sorted by their similarity (cosine
of termVectors). Is MoreLikeThis the way to go?

Thanks in advance!

Re: x most similar documents

Posted by Jack Krupansky <ja...@basetechnology.com>.

Oops, I said "MLT will use the first search result from the original query", 
but that is for the MLT handler. For the MLT component you get a separate 
set of documents for each document in the results of the original query.

-- Jack Krupansky

-----Original Message----- 
From: Jack Krupansky
Sent: Sunday, June 10, 2012 1:25 PM
To: solr-user@lucene.apache.org
Subject: Re: x most similar documents

Yes, it sounds like MLT is the way to go, but sometimes you have to get
creative in figuring out how to set the numerous parameters. And sometimes
you have to use the MLT request handler rather than /select with the MLT
component.

You might also encounter issues related to the shortness of the text of
tweets. Some of the MLT parameters might be optimized for much larger texts.

Can you give us an example of a (very brief) tweet that your query finds,
the tweet(s) that MLT returns, and what other tweet(s) you would have
expected.

MLT will use the first search result from the original query.

-- Jack Krupansky

-----Original Message----- 
From: Benjamin Murauer
Sent: Sunday, June 10, 2012 7:32 AM
To: solr-user@lucene.apache.org
Subject: x most similar documents

Hi there,
i have a solr server running containing tweets. my schema.xml contains
following fields:

<fields>
<field name="id" type="string" indexed="true" stored="true"
required="true" />
<field name="tweet" type="text_general"
indexed="true" stored="true" termVectors="true"/>
<field
name="hashtags" type="text_general" indexed="true" stored="true"
termVectors="true"/>
</fields>

my problem is actually quite simple; somewhere in my GUI the user types
text and i want to retrieve tweets that are most similar to it.
Therefore, i tried the "morelikethis" functionality. My problem is that
currently, mlt finds additional tweets to every tweet found by the
"select" handler. I'm not sure however if the select handler finds the
most fitting tweet or just returns the first match. currently, i am
using following query:

http://localhost:8983/solr/select/?q=tweet:heaven&mlt=true&mlt.fl=tweet,hashtags&wt=json&indent=true

Am i missing something critical? So eventually, i just want to retrieve
x tweets with the most similar text, sorted by their similarity (cosine
of termVectors). Is MoreLikeThis the way to go?

Thanks in advance!

Re: x most similar documents

Posted by Jack Krupansky <ja...@basetechnology.com>.

Yes, it sounds like MLT is the way to go, but sometimes you have to get 
creative in figuring out how to set the numerous parameters. And sometimes 
you have to use the MLT request handler rather than /select with the MLT 
component.

You might also encounter issues related to the shortness of the text of 
tweets. Some of the MLT parameters might be optimized for much larger texts.

Can you give us an example of a (very brief) tweet that your query finds, 
the tweet(s) that MLT returns, and what other tweet(s) you would have 
expected.

MLT will use the first search result from the original query.

-- Jack Krupansky

-----Original Message----- 
From: Benjamin Murauer
Sent: Sunday, June 10, 2012 7:32 AM
To: solr-user@lucene.apache.org
Subject: x most similar documents

Hi there,
i have a solr server running containing tweets. my schema.xml contains
following fields:

<fields>
<field name="id" type="string" indexed="true" stored="true"
required="true" />
<field name="tweet" type="text_general"
indexed="true" stored="true" termVectors="true"/>
<field
name="hashtags" type="text_general" indexed="true" stored="true"
termVectors="true"/>
</fields>

my problem is actually quite simple; somewhere in my GUI the user types
text and i want to retrieve tweets that are most similar to it.
Therefore, i tried the "morelikethis" functionality. My problem is that
currently, mlt finds additional tweets to every tweet found by the
"select" handler. I'm not sure however if the select handler finds the
most fitting tweet or just returns the first match. currently, i am
using following query:

http://localhost:8983/solr/select/?q=tweet:heaven&mlt=true&mlt.fl=tweet,hashtags&wt=json&indent=true

Am i missing something critical? So eventually, i just want to retrieve
x tweets with the most similar text, sorted by their similarity (cosine
of termVectors). Is MoreLikeThis the way to go?

Thanks in advance!