You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tomasz Kępski <to...@kepski.pl> on 2009/11/23 14:01:51 UTC

Boost document base on field length

Hi,

I would like to boost documents with longer descriptions to move down 
documents with 0 length description,
I'm wondering if there is possibility to boost document basing on the 
field length while searching or the only way is to store field length as 
an int in a separate field while indexing?

Tom

Re: Boost document base on field length

Posted by Lance Norskog <go...@gmail.com>.
The Lucene norms, if set, are 1/number of terms in the field.

I cannot find a function that makes norms available. Yo gurus- is this
impossible, a bad idea, or just an oversight?

On Tue, Nov 24, 2009 at 6:06 AM, Tomasz Kępski <to...@kepski.pl> wrote:
> Hi,
>
>> I think i'm reading he question differently then Grant -- his suggestion
>> applies when you are searching in the description field, and don't want
>> documents with shorter descriptions to score higher when the same terms
>> match the same number of times (the default behavior of lengthNorm)
>
>> my udnerstanding is that you want documents that don't have a description
>> to score lower then documents that do -- and you might be querying against
>> completely differnet fields (description might not even be indexed)
>>
>> in that case there is no easy way to to achieve this with just the
>> description field ... the easy thing to do is to index a boolean
>> "has_description" field and then incorporate that into your query (or as the
>> input to a function query)
>
> You get my point Hoss. In my case long description = good value. And your
> intuition is amazing ;-) I do have a field which is not used in search at
> all (image url) but docs with image have for me greater value than without
> it.
>
> I would add two fields then (boolean for photo and int for description
> length) fill them up during indexation and would play with them during the
> search.
>
> Thanks,
> Tom
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Boost document base on field length

Posted by Tomasz Kępski <to...@kepski.pl>.
Hi,

> I think i'm reading he question differently then Grant -- his suggestion 
> applies when you are searching in the description field, and don't want 
> documents with shorter descriptions to score higher when the same terms 
> match the same number of times (the default behavior of lengthNorm)

> my udnerstanding is that you want documents that don't have a description 
> to score lower then documents that do -- and you might be querying against 
> completely differnet fields (description might not even be indexed)
> 
> in that case there is no easy way to to achieve this with just the 
> description field ... the easy thing to do is to index a boolean 
> "has_description" field and then incorporate that into your query (or as 
> the input to a function query)

You get my point Hoss. In my case long description = good value. And 
your intuition is amazing ;-) I do have a field which is not used in 
search at all (image url) but docs with image have for me greater value 
than without it.

I would add two fields then (boolean for photo and int for description 
length) fill them up during indexation and would play with them during 
the search.

Thanks,
Tom


Re: Boost document base on field length

Posted by Chris Hostetter <ho...@fucit.org>.
: > I would like to boost documents with longer descriptions to move down documents with 0 length description,
: > I'm wondering if there is possibility to boost document basing on the field length while searching or the only way is to store field length as an int in a separate field while indexing?
: 
: Override the default Similarity (see the end of the schema.xml file) 
: with your own Similarity implementation and then in that class override 
: the lengthNorm() method.


I think i'm reading he question differently then Grant -- his suggestion 
applies when you are searching in the description field, and don't want 
documents with shorter descriptions to score higher when the same terms 
match the same number of times (the default behavior of lengthNorm)

my udnerstanding is that you want documents that don't have a description 
to score lower then documents that do -- and you might be querying against 
completely differnet fields (description might not even be indexed)

in that case there is no easy way to to achieve this with just the 
description field ... the easy thing to do is to index a boolean 
"has_description" field and then incorporate that into your query (or as 
the input to a function query)


-Hoss


Re: Boost document base on field length

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 23, 2009, at 8:01 AM, Tomasz Kępski wrote:

> Hi,
> 
> I would like to boost documents with longer descriptions to move down documents with 0 length description,
> I'm wondering if there is possibility to boost document basing on the field length while searching or the only way is to store field length as an int in a separate field while indexing?

Override the default Similarity (see the end of the schema.xml file) with your own Similarity implementation and then in that class override the lengthNorm() method.