You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lahiru Samarakoon <la...@gmail.com> on 2011/06/13 08:31:15 UTC

Modifying Length Normalization calculation

Hi All,

I want to change the length normalization calculation specific to my
application. By changing the "*number of terms*" according to my
requirement. The "*StandardTokenizer*" works perfectly for my application,
However, the *number of terms* calculated by the tokenizer is not the
effective number of terms for the application. I have an mechanism to
calculate that value and I need to know how can I apply that value in length
normalization calculations.

Please advice.

Thank you,

Best Regards,
Lahiru.

Re: Modifying Length Normalization calculation

Posted by Lahiru Samarakoon <la...@gmail.com>.
Hi Ian,

The order is right and your method is working for me.

Thanks  [?]

Lahiru

On Mon, Jun 13, 2011 at 7:15 PM, Ian Lea <ia...@gmail.com> wrote:

> This is getting beyond my level of expertise, but I'll have a go at
> your questions.  Hopefully someone better informed will step in with
> corrections or confirmation.
>
> > ...
> > The application calls the *writer.addDocument(d);* method and in this
> > process the *lengthNorm(String fieldName, int numTerms)*  method is
> called.
> > I can extend the *DefaultSimilarity* class and override the
> > *lengthNorm*method, but how can I explicitly specify the
> > *numTerms* value?
>
> I don't know that you can, but you don't have to use the value passed in.
>
> > ...
> > Does *computeNorm* method is called for every field or is it only called
> for
> > analyzed fields?
>
> All indexed fields, at a guess.  Which can be analyzed or not.
>
> > The order we call *addDocument* and the order the *computeNorm *method is
> > called is the same ?
>
> Probably.
>
> > Is there is a possibility that I can access the *Document* object inside
> the
> > *Similiarity* class ?
>
> Not that I know of via API calls. If you had your own Similarity
> implementation, and methods are called in the order you expect, you
> could add a setDoc(Document) method and/or a setCalcValue(n) method
> and use them as you wished in your custom computeNorm() or
> lengthNorm() code.
>
>
> --
> Ian.
>
>
> > On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea <ia...@gmail.com> wrote:
> >
> >> org.apache.lucene.search.Similarity would be the place to look,
> >> specifically computeNorm(String field, FieldInvertState state).  There
> >> is comprehensive info in the javadocs.  Note that values are
> >> calculated at indexing and stored in the index encoded, with some loss
> >> of precision.
> >>
> >>
> >> --
> >> Ian.
> >>
> >> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <la...@gmail.com>
> >> wrote:
> >> > Hi All,
> >> >
> >> > I want to change the length normalization calculation specific to my
> >> > application. By changing the "*number of terms*" according to my
> >> > requirement. The "*StandardTokenizer*" works perfectly for my
> >> application,
> >> > However, the *number of terms* calculated by the tokenizer is not the
> >> > effective number of terms for the application. I have an mechanism to
> >> > calculate that value and I need to know how can I apply that value in
> >> length
> >> > normalization calculations.
> >> >
> >> > Please advice.
> >> >
> >> > Thank you,
> >> >
> >> > Best Regards,
> >> > Lahiru.
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Modifying Length Normalization calculation

Posted by Ian Lea <ia...@gmail.com>.
This is getting beyond my level of expertise, but I'll have a go at
your questions.  Hopefully someone better informed will step in with
corrections or confirmation.

> ...
> The application calls the *writer.addDocument(d);* method and in this
> process the *lengthNorm(String fieldName, int numTerms)*  method is called.
> I can extend the *DefaultSimilarity* class and override the
> *lengthNorm*method, but how can I explicitly specify the
> *numTerms* value?

I don't know that you can, but you don't have to use the value passed in.

> ...
> Does *computeNorm* method is called for every field or is it only called for
> analyzed fields?

All indexed fields, at a guess.  Which can be analyzed or not.

> The order we call *addDocument* and the order the *computeNorm *method is
> called is the same ?

Probably.

> Is there is a possibility that I can access the *Document* object inside the
> *Similiarity* class ?

Not that I know of via API calls. If you had your own Similarity
implementation, and methods are called in the order you expect, you
could add a setDoc(Document) method and/or a setCalcValue(n) method
and use them as you wished in your custom computeNorm() or
lengthNorm() code.


--
Ian.


> On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> org.apache.lucene.search.Similarity would be the place to look,
>> specifically computeNorm(String field, FieldInvertState state).  There
>> is comprehensive info in the javadocs.  Note that values are
>> calculated at indexing and stored in the index encoded, with some loss
>> of precision.
>>
>>
>> --
>> Ian.
>>
>> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <la...@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I want to change the length normalization calculation specific to my
>> > application. By changing the "*number of terms*" according to my
>> > requirement. The "*StandardTokenizer*" works perfectly for my
>> application,
>> > However, the *number of terms* calculated by the tokenizer is not the
>> > effective number of terms for the application. I have an mechanism to
>> > calculate that value and I need to know how can I apply that value in
>> length
>> > normalization calculations.
>> >
>> > Please advice.
>> >
>> > Thank you,
>> >
>> > Best Regards,
>> > Lahiru.
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Modifying Length Normalization calculation

Posted by Lahiru Samarakoon <la...@gmail.com>.
HI Ian,

Thank you very much for the reply.

The application calls the *writer.addDocument(d);* method and in this
process the *lengthNorm(String fieldName, int numTerms)*  method is called.
I can extend the *DefaultSimilarity* class and override the
*lengthNorm*method, but how can I explicitly specify the
*numTerms* value?

In my application, numTerms = (Analyzed Length of the field content)  -
(app specific calculated value)

(Analyzed Length of the field content) = original numTerms value calculated
in the *computeNorm*, which is known.

Does *computeNorm* method is called for every field or is it only called for
analyzed fields?

The order we call *addDocument* and the order the *computeNorm *method is
called is the same ?

Is there is a possibility that I can access the *Document* object inside the
*Similiarity* class ?

Regards,
Lahiru

On Mon, Jun 13, 2011 at 3:09 PM, Ian Lea <ia...@gmail.com> wrote:

> org.apache.lucene.search.Similarity would be the place to look,
> specifically computeNorm(String field, FieldInvertState state).  There
> is comprehensive info in the javadocs.  Note that values are
> calculated at indexing and stored in the index encoded, with some loss
> of precision.
>
>
> --
> Ian.
>
> On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <la...@gmail.com>
> wrote:
> > Hi All,
> >
> > I want to change the length normalization calculation specific to my
> > application. By changing the "*number of terms*" according to my
> > requirement. The "*StandardTokenizer*" works perfectly for my
> application,
> > However, the *number of terms* calculated by the tokenizer is not the
> > effective number of terms for the application. I have an mechanism to
> > calculate that value and I need to know how can I apply that value in
> length
> > normalization calculations.
> >
> > Please advice.
> >
> > Thank you,
> >
> > Best Regards,
> > Lahiru.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Modifying Length Normalization calculation

Posted by Ian Lea <ia...@gmail.com>.
org.apache.lucene.search.Similarity would be the place to look,
specifically computeNorm(String field, FieldInvertState state).  There
is comprehensive info in the javadocs.  Note that values are
calculated at indexing and stored in the index encoded, with some loss
of precision.


--
Ian.

On Mon, Jun 13, 2011 at 7:31 AM, Lahiru Samarakoon <la...@gmail.com> wrote:
> Hi All,
>
> I want to change the length normalization calculation specific to my
> application. By changing the "*number of terms*" according to my
> requirement. The "*StandardTokenizer*" works perfectly for my application,
> However, the *number of terms* calculated by the tokenizer is not the
> effective number of terms for the application. I have an mechanism to
> calculate that value and I need to know how can I apply that value in length
> normalization calculations.
>
> Please advice.
>
> Thank you,
>
> Best Regards,
> Lahiru.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org