You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Charlie Maroto <ch...@gmail.com> on 2012/04/21 00:38:25 UTC

Opposite to MoreLikeThis?

Hi all,

Is there a way to implement the opposite to MoreLikeThis (LessLikeThis, I
guess :).  The requirement we have is to remove all documents with content
like that of a given document id or a text provided by the end-user.  In
the current index implementation (not using Solr), the user can narrow
results by indicating what document(s) are not relevant to him and then
request to remove from the search results any document whose content is
like that of the selected document(s)

Our index has close to 100 million documents and they cover multiple topics
that are not related to one another.  So, a search for some broad terms may
retrieve documents about engineering, agriculture, communications, etc.  As
the user is trying to discover the relevant documents, he may select an
agriculture-related document to exclude it and those documents like it from
the results set; same w/ engineering-like content, etc. until most of the
documents are about communications.

Of course, some exclusions may actually remove relevant content but those
filters can be removed to go back to the previous set of results.

Any ideas from similar implementations or suggestions are welcomed!
Thanks,
Carlos

Re: Opposite to MoreLikeThis?

Posted by Lance Norskog <go...@gmail.com>.
Are these documents classified already? Sounds like it would be much
faster to suppress documents with the same tags as your target tags.

On Fri, Apr 20, 2012 at 4:16 PM, Darren Govoni <da...@ontrenet.com> wrote:
> You could run the MLT for the document in question, then gather all
> those doc id's in the MLT results and negate those in a subsequent
> query. Not sure how robust that would work with very large result sets,
> but something to try.
>
> Another approach would be to gather the "interesting terms" from the
> document in question and then negate those terms in subsequent queries.
> Perhaps with many negated terms, Solr will rank the results based on
> most negated terms above less negated terms, simulating a ranked "less
> like" effect.
>
> On Fri, 2012-04-20 at 15:38 -0700, Charlie Maroto wrote:
>> Hi all,
>>
>> Is there a way to implement the opposite to MoreLikeThis (LessLikeThis, I
>> guess :).  The requirement we have is to remove all documents with content
>> like that of a given document id or a text provided by the end-user.  In
>> the current index implementation (not using Solr), the user can narrow
>> results by indicating what document(s) are not relevant to him and then
>> request to remove from the search results any document whose content is
>> like that of the selected document(s)
>>
>> Our index has close to 100 million documents and they cover multiple topics
>> that are not related to one another.  So, a search for some broad terms may
>> retrieve documents about engineering, agriculture, communications, etc.  As
>> the user is trying to discover the relevant documents, he may select an
>> agriculture-related document to exclude it and those documents like it from
>> the results set; same w/ engineering-like content, etc. until most of the
>> documents are about communications.
>>
>> Of course, some exclusions may actually remove relevant content but those
>> filters can be removed to go back to the previous set of results.
>>
>> Any ideas from similar implementations or suggestions are welcomed!
>> Thanks,
>> Carlos
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Opposite to MoreLikeThis?

Posted by Darren Govoni <da...@ontrenet.com>.
You could run the MLT for the document in question, then gather all
those doc id's in the MLT results and negate those in a subsequent
query. Not sure how robust that would work with very large result sets,
but something to try.

Another approach would be to gather the "interesting terms" from the
document in question and then negate those terms in subsequent queries.
Perhaps with many negated terms, Solr will rank the results based on
most negated terms above less negated terms, simulating a ranked "less
like" effect.

On Fri, 2012-04-20 at 15:38 -0700, Charlie Maroto wrote:
> Hi all,
> 
> Is there a way to implement the opposite to MoreLikeThis (LessLikeThis, I
> guess :).  The requirement we have is to remove all documents with content
> like that of a given document id or a text provided by the end-user.  In
> the current index implementation (not using Solr), the user can narrow
> results by indicating what document(s) are not relevant to him and then
> request to remove from the search results any document whose content is
> like that of the selected document(s)
> 
> Our index has close to 100 million documents and they cover multiple topics
> that are not related to one another.  So, a search for some broad terms may
> retrieve documents about engineering, agriculture, communications, etc.  As
> the user is trying to discover the relevant documents, he may select an
> agriculture-related document to exclude it and those documents like it from
> the results set; same w/ engineering-like content, etc. until most of the
> documents are about communications.
> 
> Of course, some exclusions may actually remove relevant content but those
> filters can be removed to go back to the previous set of results.
> 
> Any ideas from similar implementations or suggestions are welcomed!
> Thanks,
> Carlos