You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Francisco Sanmartin <fr...@olx.com> on 2008/04/21 23:02:21 UTC

More Like This boost

Is it possible to boost the query that MoreLikeThis returns before 
sending it to Solr? I mean, technically is possible, because you can add 
a factor to the whole query but...does it make sense? (Remember that 
MoreLikeThis can already boosts each term inside the query).

For example, this could be a result of MoreLikeThis (with native 
boosting enabled)

queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
morelikethis^0.67)

what I want to do is

queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
morelikethis^0.67)^0.60      <---(notice the boost of 0.60 for the whole 
query)

does Solr applys the boost with a "distributive" property ? (like in 
mathematics). Does it really boost it or it ignores it (Because terms 
have been already boosted inside)?

Thanks in advance.

Pako

Re: More Like This boost

Posted by Jonathan Ariel <io...@gmail.com>.

Ok. Here it is.
https://issues.apache.org/jira/browse/LUCENE-1272




On Tue, Apr 22, 2008 at 2:24 PM, Francisco Sanmartin <fr...@olx.com>
wrote:

> Yep, it would be nice for  MLT to have this feature, that's why I am trying
> to do it from the querys before sending the query to Solr. These are the
> steps I'm following:
>
> 1. execute a mlt.like() with the text document_example.getTitle() against
> the field "Title" of all the other documents. This returns a query
> containing the most relevant words in the example_document and in the rest
> of documents in the Title. We will call this query "QueryTitle". For example
> QueryTitle = (words^0.4 in^0.3 the^0.56 title^0.65)
> 2. execute a mlt.like() with the text document_example.getDescription()
> against the field "Description" of all the other documents. This returns a
> query containin the most relevant words in the example_document and in the
> rest of documents in the Description. We will call this query
> "QueryDescription". For example QueryDescription = (other^0.66 words^0.7
> in^0.33 the^0.49 description^0.43)
>
> Up to here, everything is possible with the options that offers MLT.
>
> Now, with the info MLT gave me (QueryTitle and QueryDescription), i want to
> look in Solr for the documents (and more filters) to retrieve the best
> matches. But I want QueryTitle to be more important that QueryDescription,
> for example 70% and 30% respectively. This means that we should do
> QueryTitle^0.70 and QueryDescription^0.30. This means having a query for
> Solr like this:
> (words^0.4 in^0.3 the^0.56 title^0.65)^0.70 (other^0.66 words^0.7 in^0.33
> the^0.49 description^0.43)^0.30
>
> The question is...is Solr able to "understand" a query boosted who has its
> terms boosted already? (Remember that MLT returns the "interesting terms"
> boosted). This does make sense? Will the words obtained from a mlt.like() on
> the title be 70% relevant while the words obtained from a mlt.like() on the
> description will be only 30% relevant?
>
> Of course it would be a nice feature to be able to boost these things
> natively and do only one call to MLT...Don't hesitate to contact me if you
> need any help on developing this feature.
>
>
> Thanks!
>
> Pako
>
> Erik Hatcher wrote:
>
>> No, the MLT feature does not have that kind of field-specific boosting
>> capability.  It sounds like it could be a useful enhancement though.  Of
>> course you do get boosts for "interesting terms" already, but maybe having
>> an additional field-specific boost would be a nice touch too.
>>
>>    Erik
>>
>> On Apr 22, 2008, at 9:13 AM, Francisco Sanmartin wrote:
>>
>>> I know that only one query of that type does not change anything. But
>>> when it's two or more with different boosts, i hope it does. Here is the
>>> situation:
>>> My docs have "Title" and "Description". What I want to do is to give more
>>> relevancy to the morelikethis on the title than on the description. So the
>>> query would be like this:
>>>
>>> query = (words^0.4 in^0.3 the^0.56 title^0.65)^0.70 (words^0.7 in^0.33
>>> the^0.49 description^0.43)^0.30
>>>
>>> This way, the words in the title are more relevant than the words in the
>>> description, right?
>>>
>>> Thanks!
>>>
>>> Pako
>>>
>>>
>>> Erik Hatcher wrote:
>>>
>>>>
>>>> On Apr 21, 2008, at 5:02 PM, Francisco Sanmartin wrote:
>>>>
>>>>> Is it possible to boost the query that MoreLikeThis returns before
>>>>> sending it to Solr? I mean, technically is possible, because you can add a
>>>>> factor to the whole query but...does it make sense? (Remember that
>>>>> MoreLikeThis can already boosts each term inside the query).
>>>>>
>>>>> For example, this could be a result of MoreLikeThis (with native
>>>>> boosting enabled)
>>>>>
>>>>> queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29
>>>>> morelikethis^0.67)
>>>>>
>>>>> what I want to do is
>>>>>
>>>>> queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29
>>>>> morelikethis^0.67)^0.60      <---(notice the boost of 0.60 for the whole
>>>>> query)
>>>>>
>>>>
>>>> That last boost wouldn't change the doc ordering at all, so it'd be
>>>> kinda useless.
>>>>
>>>> What are you trying to accomplish?
>>>>
>>>>    Erik
>>>>
>>>>
>>>>
>>
>>
>

Re: More Like This boost

Posted by Francisco Sanmartin <fr...@olx.com>.

Yep, it would be nice for  MLT to have this feature, that's why I am 
trying to do it from the querys before sending the query to Solr. These 
are the steps I'm following:

1. execute a mlt.like() with the text document_example.getTitle() 
against the field "Title" of all the other documents. This returns a 
query containing the most relevant words in the example_document and in 
the rest of documents in the Title. We will call this query 
"QueryTitle". For example QueryTitle = (words^0.4 in^0.3 the^0.56 
title^0.65)
2. execute a mlt.like() with the text document_example.getDescription() 
against the field "Description" of all the other documents. This returns 
a query containin the most relevant words in the example_document and in 
the rest of documents in the Description. We will call this query 
"QueryDescription". For example QueryDescription = (other^0.66 words^0.7 
in^0.33 the^0.49 description^0.43)

Up to here, everything is possible with the options that offers MLT.

Now, with the info MLT gave me (QueryTitle and QueryDescription), i want 
to look in Solr for the documents (and more filters) to retrieve the 
best matches. But I want QueryTitle to be more important that 
QueryDescription, for example 70% and 30% respectively. This means that 
we should do QueryTitle^0.70 and QueryDescription^0.30. This means 
having a query for Solr like this:
(words^0.4 in^0.3 the^0.56 title^0.65)^0.70 (other^0.66 words^0.7 
in^0.33 the^0.49 description^0.43)^0.30

The question is...is Solr able to "understand" a query boosted who has 
its terms boosted already? (Remember that MLT returns the "interesting 
terms" boosted). This does make sense? Will the words obtained from a 
mlt.like() on the title be 70% relevant while the words obtained from a 
mlt.like() on the description will be only 30% relevant?

Of course it would be a nice feature to be able to boost these things 
natively and do only one call to MLT...Don't hesitate to contact me if 
you need any help on developing this feature.

Thanks!

Pako

Erik Hatcher wrote:
> No, the MLT feature does not have that kind of field-specific boosting 
> capability.  It sounds like it could be a useful enhancement though.  
> Of course you do get boosts for "interesting terms" already, but maybe 
> having an additional field-specific boost would be a nice touch too.
>
>     Erik
>
> On Apr 22, 2008, at 9:13 AM, Francisco Sanmartin wrote:
>> I know that only one query of that type does not change anything. But 
>> when it's two or more with different boosts, i hope it does. Here is 
>> the situation:
>> My docs have "Title" and "Description". What I want to do is to give 
>> more relevancy to the morelikethis on the title than on the 
>> description. So the query would be like this:
>>
>> query = (words^0.4 in^0.3 the^0.56 title^0.65)^0.70 (words^0.7 
>> in^0.33 the^0.49 description^0.43)^0.30
>>
>> This way, the words in the title are more relevant than the words in 
>> the description, right?
>>
>> Thanks!
>>
>> Pako
>>
>>
>> Erik Hatcher wrote:
>>>
>>> On Apr 21, 2008, at 5:02 PM, Francisco Sanmartin wrote:
>>>> Is it possible to boost the query that MoreLikeThis returns before 
>>>> sending it to Solr? I mean, technically is possible, because you 
>>>> can add a factor to the whole query but...does it make sense? 
>>>> (Remember that MoreLikeThis can already boosts each term inside the 
>>>> query).
>>>>
>>>> For example, this could be a result of MoreLikeThis (with native 
>>>> boosting enabled)
>>>>
>>>> queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
>>>> morelikethis^0.67)
>>>>
>>>> what I want to do is
>>>>
>>>> queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
>>>> morelikethis^0.67)^0.60      <---(notice the boost of 0.60 for the 
>>>> whole query)
>>>
>>> That last boost wouldn't change the doc ordering at all, so it'd be 
>>> kinda useless.
>>>
>>> What are you trying to accomplish?
>>>
>>>     Erik
>>>
>>>
>
>

Re: More Like This boost

Posted by Walter Underwood <wu...@netflix.com>.

It should help to weight the terms with their frequency in the
original document. That will distinguish between two documents
with the same terms, but different focus.

wunder

On 4/22/08 7:46 AM, "Erik Hatcher" <er...@ehatchersolutions.com> wrote:

> No, the MLT feature does not have that kind of field-specific
> boosting capability.  It sounds like it could be a useful enhancement
> though.  Of course you do get boosts for "interesting terms" already,
> but maybe having an additional field-specific boost would be a nice
> touch too.
> 
> Erik
> 
> On Apr 22, 2008, at 9:13 AM, Francisco Sanmartin wrote:
>> I know that only one query of that type does not change anything.
>> But when it's two or more with different boosts, i hope it does.
>> Here is the situation:
>> My docs have "Title" and "Description". What I want to do is to
>> give more relevancy to the morelikethis on the title than on the
>> description. So the query would be like this:
>> 
>> query = (words^0.4 in^0.3 the^0.56 title^0.65)^0.70 (words^0.7
>> in^0.33 the^0.49 description^0.43)^0.30
>> 
>> This way, the words in the title are more relevant than the words
>> in the description, right?
>> 
>> Thanks!
>> 
>> Pako
>> 
>> 
>> Erik Hatcher wrote:
>>> 
>>> On Apr 21, 2008, at 5:02 PM, Francisco Sanmartin wrote:
>>>> Is it possible to boost the query that MoreLikeThis returns
>>>> before sending it to Solr? I mean, technically is possible,
>>>> because you can add a factor to the whole query but...does it
>>>> make sense? (Remember that MoreLikeThis can already boosts each
>>>> term inside the query).
>>>> 
>>>> For example, this could be a result of MoreLikeThis (with native
>>>> boosting enabled)
>>>> 
>>>> queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29
>>>> morelikethis^0.67)
>>>> 
>>>> what I want to do is
>>>> 
>>>> queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29
>>>> morelikethis^0.67)^0.60      <---(notice the boost of 0.60 for
>>>> the whole query)
>>> 
>>> That last boost wouldn't change the doc ordering at all, so it'd
>>> be kinda useless.
>>> 
>>> What are you trying to accomplish?
>>> 
>>>     Erik
>>> 
>>> 
>

Re: More Like This boost

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

No, the MLT feature does not have that kind of field-specific  
boosting capability.  It sounds like it could be a useful enhancement  
though.  Of course you do get boosts for "interesting terms" already,  
but maybe having an additional field-specific boost would be a nice  
touch too.

	Erik

On Apr 22, 2008, at 9:13 AM, Francisco Sanmartin wrote:
> I know that only one query of that type does not change anything.  
> But when it's two or more with different boosts, i hope it does.  
> Here is the situation:
> My docs have "Title" and "Description". What I want to do is to  
> give more relevancy to the morelikethis on the title than on the  
> description. So the query would be like this:
>
> query = (words^0.4 in^0.3 the^0.56 title^0.65)^0.70 (words^0.7  
> in^0.33 the^0.49 description^0.43)^0.30
>
> This way, the words in the title are more relevant than the words  
> in the description, right?
>
> Thanks!
>
> Pako
>
>
> Erik Hatcher wrote:
>>
>> On Apr 21, 2008, at 5:02 PM, Francisco Sanmartin wrote:
>>> Is it possible to boost the query that MoreLikeThis returns  
>>> before sending it to Solr? I mean, technically is possible,  
>>> because you can add a factor to the whole query but...does it  
>>> make sense? (Remember that MoreLikeThis can already boosts each  
>>> term inside the query).
>>>
>>> For example, this could be a result of MoreLikeThis (with native  
>>> boosting enabled)
>>>
>>> queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29  
>>> morelikethis^0.67)
>>>
>>> what I want to do is
>>>
>>> queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29  
>>> morelikethis^0.67)^0.60      <---(notice the boost of 0.60 for  
>>> the whole query)
>>
>> That last boost wouldn't change the doc ordering at all, so it'd  
>> be kinda useless.
>>
>> What are you trying to accomplish?
>>
>>     Erik
>>
>>

Re: More Like This boost

Posted by Francisco Sanmartin <fr...@olx.com>.

I know that only one query of that type does not change anything. But 
when it's two or more with different boosts, i hope it does. Here is the 
situation:
My docs have "Title" and "Description". What I want to do is to give 
more relevancy to the morelikethis on the title than on the description. 
So the query would be like this:

query = (words^0.4 in^0.3 the^0.56 title^0.65)^0.70 (words^0.7 in^0.33 
the^0.49 description^0.43)^0.30

This way, the words in the title are more relevant than the words in the 
description, right?

Thanks!

Pako

Erik Hatcher wrote:
>
> On Apr 21, 2008, at 5:02 PM, Francisco Sanmartin wrote:
>> Is it possible to boost the query that MoreLikeThis returns before 
>> sending it to Solr? I mean, technically is possible, because you can 
>> add a factor to the whole query but...does it make sense? (Remember 
>> that MoreLikeThis can already boosts each term inside the query).
>>
>> For example, this could be a result of MoreLikeThis (with native 
>> boosting enabled)
>>
>> queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
>> morelikethis^0.67)
>>
>> what I want to do is
>>
>> queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29 
>> morelikethis^0.67)^0.60      <---(notice the boost of 0.60 for the 
>> whole query)
>
> That last boost wouldn't change the doc ordering at all, so it'd be 
> kinda useless.
>
> What are you trying to accomplish?
>
>     Erik
>
>

Re: More Like This boost

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Apr 21, 2008, at 5:02 PM, Francisco Sanmartin wrote:
> Is it possible to boost the query that MoreLikeThis returns before  
> sending it to Solr? I mean, technically is possible, because you  
> can add a factor to the whole query but...does it make sense?  
> (Remember that MoreLikeThis can already boosts each term inside the  
> query).
>
> For example, this could be a result of MoreLikeThis (with native  
> boosting enabled)
>
> queryResultMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29  
> morelikethis^0.67)
>
> what I want to do is
>
> queryResulltMLT = (this^0.4 is^0.5 a^0.6 query^0.33 of^0.29  
> morelikethis^0.67)^0.60      <---(notice the boost of 0.60 for the  
> whole query)

That last boost wouldn't change the doc ordering at all, so it'd be  
kinda useless.

What are you trying to accomplish?

	Erik