You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Neil Hooey <nh...@gmail.com> on 2011/05/05 17:45:52 UTC

Consolodation of boosts on multivalued fields

Currently when you assign boosts to multivalue fields during
index-time, they are consolidated, and the individual boosts are lost.

There are some relevant cases where the individual boost values are
important, so I'd like to fix this behaviour.

I've created an issue here, which gives some examples:
https://issues.apache.org/jira/browse/SOLR-2499

Do you have any ideas of where to get started with this fix, or have
an idea of how difficult the fix might be?

Thanks,

- Neil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Consolodation of boosts on multivalued fields

Posted by Neil Hooey <nh...@gmail.com>.
Does anyone have any insight on the right way to approach removing
consolidation of boosts on multivalue fields?

On Fri, May 6, 2011 at 10:51 AM, Neil Hooey <nh...@gmail.com> wrote:
> After looking for places in the code where boosts are consolidated for
> multivalue fields, I found this class:
> lucene/src/java/org/apache/lucene/index/FieldInvertState.java
>
> Which has a "float boost" member variable. FieldInvertState seems to
> keep track of the index positions of several Fields with the same
> name.
>
> I'm considering changing that float to a "Vector<float>" to keep track
> of each individual field's boost, but there is a lot of code that
> calls FieldInvertState.getBoost(). The excerpts are listed at the end
> of this email.
>
> Does anyone have a good idea of how to get FieldInvertState to store
> boosts for each field, if that's even the right direction to go?
>
> Calls to FieldInvertState.getBoost():
> ----------------------------------------------------------------------
> $ ack --java -i 'state\.getBoost'
> lucene/contrib/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
> 103:   * Implemented as <code> state.getBoost() *
> 117:    return state.getBoost() * computeLengthNorm(numTokens);
>
> lucene/contrib/misc/src/test/org/apache/lucene/index/TestFieldNormModifier.java
> 53:          return state.getBoost() * (discountOverlaps ?
> state.getLength() - state.getNumOverlap() : state.getLength());
>
> lucene/contrib/misc/src/test/org/apache/lucene/misc/TestLengthNormModifier.java
> 58:            return state.getBoost() * (discountOverlaps ?
> state.getLength() - state.getNumOverlap() : state.getLength());
> 179:          return state.getBoost() * (discountOverlaps ?
> state.getLength() - state.getNumOverlap() : state.getLength());
>
> lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java
> 26:   *  <code>state.getBoost()*lengthNorm(numTerms)</code>, where
> 40:    return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
>
> lucene/src/test/org/apache/lucene/index/TestIndexReaderCloneNorms.java
> 52:          return state.getBoost();
>
> lucene/src/test/org/apache/lucene/index/TestNorms.java
> 51:          return state.getBoost();
>
> lucene/src/test/org/apache/lucene/index/TestOmitTf.java
> 44:        @Override public float computeNorm(FieldInvertState state)
> { return state.getBoost(); }
>
> lucene/src/test/org/apache/lucene/search/TestDisjunctionMaxQuery.java
> 67:      return state.getBoost();
>
> lucene/src/test/org/apache/lucene/search/TestSimilarity.java
> 47:        @Override public float computeNorm(FieldInvertState state)
> { return state.getBoost(); }
>
> lucene/src/test/org/apache/lucene/search/payloads/TestPayloadNearQuery.java
> 326:          return state.getBoost();
>
> lucene/src/test/org/apache/lucene/search/payloads/TestPayloadTermQuery.java
> 319:          return state.getBoost();
>
>
> On Thu, May 5, 2011 at 11:45 AM, Neil Hooey <nh...@gmail.com> wrote:
>> Currently when you assign boosts to multivalue fields during
>> index-time, they are consolidated, and the individual boosts are lost.
>>
>> There are some relevant cases where the individual boost values are
>> important, so I'd like to fix this behaviour.
>>
>> I've created an issue here, which gives some examples:
>> https://issues.apache.org/jira/browse/SOLR-2499
>>
>> Do you have any ideas of where to get started with this fix, or have
>> an idea of how difficult the fix might be?
>>
>> Thanks,
>>
>> - Neil
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Consolodation of boosts on multivalued fields

Posted by Neil Hooey <nh...@gmail.com>.
After looking for places in the code where boosts are consolidated for
multivalue fields, I found this class:
lucene/src/java/org/apache/lucene/index/FieldInvertState.java

Which has a "float boost" member variable. FieldInvertState seems to
keep track of the index positions of several Fields with the same
name.

I'm considering changing that float to a "Vector<float>" to keep track
of each individual field's boost, but there is a lot of code that
calls FieldInvertState.getBoost(). The excerpts are listed at the end
of this email.

Does anyone have a good idea of how to get FieldInvertState to store
boosts for each field, if that's even the right direction to go?

Calls to FieldInvertState.getBoost():
----------------------------------------------------------------------
$ ack --java -i 'state\.getBoost'
lucene/contrib/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
103:   * Implemented as <code> state.getBoost() *
117:    return state.getBoost() * computeLengthNorm(numTokens);

lucene/contrib/misc/src/test/org/apache/lucene/index/TestFieldNormModifier.java
53:          return state.getBoost() * (discountOverlaps ?
state.getLength() - state.getNumOverlap() : state.getLength());

lucene/contrib/misc/src/test/org/apache/lucene/misc/TestLengthNormModifier.java
58:            return state.getBoost() * (discountOverlaps ?
state.getLength() - state.getNumOverlap() : state.getLength());
179:          return state.getBoost() * (discountOverlaps ?
state.getLength() - state.getNumOverlap() : state.getLength());

lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java
26:   *  <code>state.getBoost()*lengthNorm(numTerms)</code>, where
40:    return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));

lucene/src/test/org/apache/lucene/index/TestIndexReaderCloneNorms.java
52:          return state.getBoost();

lucene/src/test/org/apache/lucene/index/TestNorms.java
51:          return state.getBoost();

lucene/src/test/org/apache/lucene/index/TestOmitTf.java
44:        @Override public float computeNorm(FieldInvertState state)
{ return state.getBoost(); }

lucene/src/test/org/apache/lucene/search/TestDisjunctionMaxQuery.java
67:      return state.getBoost();

lucene/src/test/org/apache/lucene/search/TestSimilarity.java
47:        @Override public float computeNorm(FieldInvertState state)
{ return state.getBoost(); }

lucene/src/test/org/apache/lucene/search/payloads/TestPayloadNearQuery.java
326:          return state.getBoost();

lucene/src/test/org/apache/lucene/search/payloads/TestPayloadTermQuery.java
319:          return state.getBoost();


On Thu, May 5, 2011 at 11:45 AM, Neil Hooey <nh...@gmail.com> wrote:
> Currently when you assign boosts to multivalue fields during
> index-time, they are consolidated, and the individual boosts are lost.
>
> There are some relevant cases where the individual boost values are
> important, so I'd like to fix this behaviour.
>
> I've created an issue here, which gives some examples:
> https://issues.apache.org/jira/browse/SOLR-2499
>
> Do you have any ideas of where to get started with this fix, or have
> an idea of how difficult the fix might be?
>
> Thanks,
>
> - Neil
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org