You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Raimon Bosch <ra...@gmail.com> on 2012/11/05 17:40:01 UTC

More Like this without a document?

Hi,

I'm designing a K-nearest neighbors classifier for Solr. So I am taking
information IMDB and creating a set of documents with the description of
each movie and the categories selected for each document.

To validate if the classification is correct I'm using cross-validation. So
I do not include in the index the documents that I want to guess.

If I want to use MoreLikeThis algorithm I need to add this documents in the
index? The MoreLikeThis will work with soft commits? Is there a solution to
do a MoreLikeThis without adding the document in the index?

Thanks,
Raimon Bosch.

Re: More Like this without a document?

Posted by Walter Underwood <wu...@wunderwood.org>.
I wrote something like this for Ultraseek. After the document was parsed and analyzed, I took the top terms (by tf.idf) and did a search, then added fields with the categories.

You might be able to use the document analysis request handler for this. Analyze it, then choose terms, do the search, modify the doc, then submit it for indexing. It would get parsed twice, but that might not be a big deal.

Warning, this could put a big load on Solr. My implementation really pounded Ultraseek. The queries are long and they don't really match what is in the caches.

wunder

On Nov 5, 2012, at 8:40 AM, Raimon Bosch wrote:

> Hi,
> 
> I'm designing a K-nearest neighbors classifier for Solr. So I am taking
> information IMDB and creating a set of documents with the description of
> each movie and the categories selected for each document.
> 
> To validate if the classification is correct I'm using cross-validation. So
> I do not include in the index the documents that I want to guess.
> 
> If I want to use MoreLikeThis algorithm I need to add this documents in the
> index? The MoreLikeThis will work with soft commits? Is there a solution to
> do a MoreLikeThis without adding the document in the index?
> 
> Thanks,
> Raimon Bosch.





Re: More Like this without a document?

Posted by Chris Hostetter <ho...@fucit.org>.
: If I want to use MoreLikeThis algorithm I need to add this documents in the
: index? The MoreLikeThis will work with soft commits? Is there a solution to
: do a MoreLikeThis without adding the document in the index?

you can feed the MoreLikeThisHandler a ContentStream (ie: POST data, or 
file upload, or "stream.body" request param) of text instead of sending it 
a query and it will use that raw text to find "more like this"

http://wiki.apache.org/solr/MoreLikeThisHandler

-Hoss