You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike Anderson <mi...@mit.edu> on 2009/12/09 17:27:54 UTC

atypical MLT use-case

This is somewhat of an odd use-case for MLT. Basically I'm using it for
near-duplicate detection (I'm not using the built in dup detection for a
variety of reasons). While this might sound like an okay idea, the problem
lies in the order of which things happen. Ideally, duplicate detection would
prevent me from adding a document to my index which is already there (or at
least partially there). However, more like this only works on documents
which are *already* in the index. Ideally what I would be able to do is:
post an xml document to solr, and receive a MLT response (the same kind of
MLT response I would recieve had the document been in Solr already, and
queried with id=#{id}&mlt=true).

Is anybody aware of how I could achieve this functionality leveraging
existing handlers? If not I will bump over to solr-dev and see if this is a
tractable problem.

Thanks in advance,
Mike

Re: atypical MLT use-case

Posted by Andre Parodi <an...@gmail.com>.
solr 1.4 enterprise search server.

it's on the left column of the solr homepage.

http://www.packtpub.com/solr-1-4-enterprise-search-server?utm_source=http://lucene.apache.org/solr/&utm_medium=spons&utm_content=pod&utm_campaign=mdb_000275

On 09/12/09 19:14, Mike Anderson wrote:
> wow! exactly what i'm looking for. What solr1.4 book is this?
>
> thanks so much. If anybody knows the details of how to use this I'd love to hear your tips, experiences, or comments.
>
> -mike
>
>
> On Dec 9, 2009, at 12:55 PM, Andre Parodi wrote:
>
>    
>> the solr 1.4  book says you can do this.
>>
>> usages of mlt:
>> "As a request handler with an external input document: What if you want similarity results based on something that isn't in the index? A final option that Solr supports is returning MLT results based on text data sent to the MLT handler (through HTTP POST). For example, if you were to send a text file to the handler, then Solr's MLT handler would return the documents in the index that are most similar to it. This is atypical but an interesting option nonetheless."
>>
>> not sure about the details of how though as i haven't used mlt myself.
>>
>>
>> On 09/12/09 17:27, Mike Anderson wrote:
>>      
>>> This is somewhat of an odd use-case for MLT. Basically I'm using it for
>>> near-duplicate detection (I'm not using the built in dup detection for a
>>> variety of reasons). While this might sound like an okay idea, the problem
>>> lies in the order of which things happen. Ideally, duplicate detection would
>>> prevent me from adding a document to my index which is already there (or at
>>> least partially there). However, more like this only works on documents
>>> which are *already* in the index. Ideally what I would be able to do is:
>>> post an xml document to solr, and receive a MLT response (the same kind of
>>> MLT response I would recieve had the document been in Solr already, and
>>> queried with id=#{id}&mlt=true).
>>>
>>> Is anybody aware of how I could achieve this functionality leveraging
>>> existing handlers? If not I will bump over to solr-dev and see if this is a
>>> tractable problem.
>>>
>>> Thanks in advance,
>>> Mike
>>>
>>>
>>>        
>>      
>    


Re: atypical MLT use-case

Posted by Andre Parodi <an...@gmail.com>.
the solr 1.4  book says you can do this.

usages of mlt:
"As a request handler with an external input document: What if you want 
similarity results based on something that isn't in the index? A final 
option that Solr supports is returning MLT results based on text data 
sent to the MLT handler (through HTTP POST). For example, if you were to 
send a text file to the handler, then Solr's MLT handler would return 
the documents in the index that are most similar to it. This is atypical 
but an interesting option nonetheless."

not sure about the details of how though as i haven't used mlt myself.


On 09/12/09 17:27, Mike Anderson wrote:
> This is somewhat of an odd use-case for MLT. Basically I'm using it for
> near-duplicate detection (I'm not using the built in dup detection for a
> variety of reasons). While this might sound like an okay idea, the problem
> lies in the order of which things happen. Ideally, duplicate detection would
> prevent me from adding a document to my index which is already there (or at
> least partially there). However, more like this only works on documents
> which are *already* in the index. Ideally what I would be able to do is:
> post an xml document to solr, and receive a MLT response (the same kind of
> MLT response I would recieve had the document been in Solr already, and
> queried with id=#{id}&mlt=true).
>
> Is anybody aware of how I could achieve this functionality leveraging
> existing handlers? If not I will bump over to solr-dev and see if this is a
> tractable problem.
>
> Thanks in advance,
> Mike
>
>