You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Saurabh Gokhale <sa...@gmail.com> on 2011/06/19 07:06:12 UTC
Index result percentile variation based on Index time Norm and No-Norm
Hi All,
I have a question regarding index time parameter Index.ANALYZED_NO_NORMS
and Index.ANALYZED
As per the definition, ANALYZED_NO_NORMS is no different than ANALYZED
option except that it does not store norm (boost) information in the index.
So if I create 2 indexes: 1 with ANALYZED option and another
with ANALYZED_NO_NORMS option, for the same indexed document and for the
same search query both should bring same result with same matching
percentile.
But the percentile for the index created using NO_NORM is higher than the
one created with NORM. Why is that so?
Attached is the java example with 2 indexes created and the same document
searched using BooleanQuery.
The result indicates that though the matched documents are same, their
percentiles differ? can some one pls help me understand norm better? is it
because in the index with norm, lucene adds some boost for index fields?
Result of running the program is:
--------------------------------------------------------------------------------------------
Adding Doc to NORM Index :: author : author1
---------------------------------------
Adding Doc to NO NORM Index :: author : author1
=======================================
Adding Doc to NORM Index :: author : author2
---------------------------------------
Adding Doc to NO NORM Index :: author : author2
=======================================
Adding Doc to NORM Index :: author : author3
---------------------------------------
Adding Doc to NO NORM Index :: author : author3
=======================================
Adding Doc to NORM Index :: author : author4
---------------------------------------
Adding Doc to NO NORM Index :: author : author4
=======================================
Search for the book with Author: author1
NORM:: WITH NORM Query :: (author:author1) (subject:book subject:first
subject:my) -isbn:123
Match: 19.738173% || Doc Author: author2 || Doc subject: My next book ||
Doc ISBN: 333
Match: 6.168179% || Doc Author: author3 || Doc subject: this first text ||
Doc ISBN: 444
Search for the book with Author: author1
NORM:: WITHOUT NORM Query :: (author:author1) (subject:book subject:first
subject:my) -isbn:123
Match: 39.476345% || Doc Author: author2 || Doc subject: My next book ||
Doc ISBN: 333
Match: 9.869086% || Doc Author: author3 || Doc subject: this first text ||
Doc ISBN: 444
Thanks
Saurabh
Re: Index result percentile variation based on Index time Norm and No-Norm
Posted by Simon Willnauer <si...@googlemail.com>.
On Sun, Jun 19, 2011 at 5:39 PM, Saurabh Gokhale
<sa...@gmail.com> wrote:
> Yes, it makes sense. So in the case of No_Norm I suppose, all the fields
> small or large, are treated the same instead of as per their sizes and
> implicit boost factor.
well yes they will use TF / IDF for scoring additional length
normalization and boost are omitted
simomn
>
> Thanks for the explanation
>
> Saurabh
>
> On Sun, Jun 19, 2011 at 9:26 AM, Simon Willnauer <
> simon.willnauer@googlemail.com> wrote:
>
>> if you use norms lucene uses the boost of the fields / document and
>> multiplies it with a length normalization factor -> 1.0 /
>> Math.sqrt(numTerms) so you scores should be different. Does that
>> explain what you are seeing?
>>
>> Simon
>>
>> On Sun, Jun 19, 2011 at 7:06 AM, Saurabh Gokhale
>> <sa...@gmail.com> wrote:
>> > Hi All,
>> > I have a question regarding index time parameter Index.ANALYZED_NO_NORMS
>> > and Index.ANALYZED
>> > As per the definition, ANALYZED_NO_NORMS is no different than ANALYZED
>> > option except that it does not store norm (boost) information in the
>> index.
>> > So if I create 2 indexes: 1 with ANALYZED option and another
>> > with ANALYZED_NO_NORMS option, for the same indexed document and for the
>> > same search query both should bring same result with same matching
>> > percentile.
>> > But the percentile for the index created using NO_NORM is higher than the
>> > one created with NORM. Why is that so?
>> > Attached is the java example with 2 indexes created and the same document
>> > searched using BooleanQuery.
>> > The result indicates that though the matched documents are same, their
>> > percentiles differ? can some one pls help me understand norm better? is
>> it
>> > because in the index with norm, lucene adds some boost for index fields?
>> >
>> >
>> > Result of running the program is:
>> >
>> --------------------------------------------------------------------------------------------
>> > Adding Doc to NORM Index :: author : author1
>> > ---------------------------------------
>> > Adding Doc to NO NORM Index :: author : author1
>> > =======================================
>> > Adding Doc to NORM Index :: author : author2
>> > ---------------------------------------
>> > Adding Doc to NO NORM Index :: author : author2
>> > =======================================
>> > Adding Doc to NORM Index :: author : author3
>> > ---------------------------------------
>> > Adding Doc to NO NORM Index :: author : author3
>> > =======================================
>> > Adding Doc to NORM Index :: author : author4
>> > ---------------------------------------
>> > Adding Doc to NO NORM Index :: author : author4
>> > =======================================
>> > Search for the book with Author: author1
>> > NORM:: WITH NORM Query :: (author:author1) (subject:book subject:first
>> > subject:my) -isbn:123
>> > Match: 19.738173% || Doc Author: author2 || Doc subject: My next book
>> ||
>> > Doc ISBN: 333
>> > Match: 6.168179% || Doc Author: author3 || Doc subject: this first text
>> ||
>> > Doc ISBN: 444
>> > Search for the book with Author: author1
>> > NORM:: WITHOUT NORM Query :: (author:author1) (subject:book
>> subject:first
>> > subject:my) -isbn:123
>> > Match: 39.476345% || Doc Author: author2 || Doc subject: My next book
>> ||
>> > Doc ISBN: 333
>> > Match: 9.869086% || Doc Author: author3 || Doc subject: this first text
>> ||
>> > Doc ISBN: 444
>> >
>> >
>> > Thanks
>> > Saurabh
>> >
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index result percentile variation based on Index time Norm and No-Norm
Posted by Saurabh Gokhale <sa...@gmail.com>.
Yes, it makes sense. So in the case of No_Norm I suppose, all the fields
small or large, are treated the same instead of as per their sizes and
implicit boost factor.
Thanks for the explanation
Saurabh
On Sun, Jun 19, 2011 at 9:26 AM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:
> if you use norms lucene uses the boost of the fields / document and
> multiplies it with a length normalization factor -> 1.0 /
> Math.sqrt(numTerms) so you scores should be different. Does that
> explain what you are seeing?
>
> Simon
>
> On Sun, Jun 19, 2011 at 7:06 AM, Saurabh Gokhale
> <sa...@gmail.com> wrote:
> > Hi All,
> > I have a question regarding index time parameter Index.ANALYZED_NO_NORMS
> > and Index.ANALYZED
> > As per the definition, ANALYZED_NO_NORMS is no different than ANALYZED
> > option except that it does not store norm (boost) information in the
> index.
> > So if I create 2 indexes: 1 with ANALYZED option and another
> > with ANALYZED_NO_NORMS option, for the same indexed document and for the
> > same search query both should bring same result with same matching
> > percentile.
> > But the percentile for the index created using NO_NORM is higher than the
> > one created with NORM. Why is that so?
> > Attached is the java example with 2 indexes created and the same document
> > searched using BooleanQuery.
> > The result indicates that though the matched documents are same, their
> > percentiles differ? can some one pls help me understand norm better? is
> it
> > because in the index with norm, lucene adds some boost for index fields?
> >
> >
> > Result of running the program is:
> >
> --------------------------------------------------------------------------------------------
> > Adding Doc to NORM Index :: author : author1
> > ---------------------------------------
> > Adding Doc to NO NORM Index :: author : author1
> > =======================================
> > Adding Doc to NORM Index :: author : author2
> > ---------------------------------------
> > Adding Doc to NO NORM Index :: author : author2
> > =======================================
> > Adding Doc to NORM Index :: author : author3
> > ---------------------------------------
> > Adding Doc to NO NORM Index :: author : author3
> > =======================================
> > Adding Doc to NORM Index :: author : author4
> > ---------------------------------------
> > Adding Doc to NO NORM Index :: author : author4
> > =======================================
> > Search for the book with Author: author1
> > NORM:: WITH NORM Query :: (author:author1) (subject:book subject:first
> > subject:my) -isbn:123
> > Match: 19.738173% || Doc Author: author2 || Doc subject: My next book
> ||
> > Doc ISBN: 333
> > Match: 6.168179% || Doc Author: author3 || Doc subject: this first text
> ||
> > Doc ISBN: 444
> > Search for the book with Author: author1
> > NORM:: WITHOUT NORM Query :: (author:author1) (subject:book
> subject:first
> > subject:my) -isbn:123
> > Match: 39.476345% || Doc Author: author2 || Doc subject: My next book
> ||
> > Doc ISBN: 333
> > Match: 9.869086% || Doc Author: author3 || Doc subject: this first text
> ||
> > Doc ISBN: 444
> >
> >
> > Thanks
> > Saurabh
> >
>
Re: Index result percentile variation based on Index time Norm and No-Norm
Posted by Simon Willnauer <si...@googlemail.com>.
if you use norms lucene uses the boost of the fields / document and
multiplies it with a length normalization factor -> 1.0 /
Math.sqrt(numTerms) so you scores should be different. Does that
explain what you are seeing?
Simon
On Sun, Jun 19, 2011 at 7:06 AM, Saurabh Gokhale
<sa...@gmail.com> wrote:
> Hi All,
> I have a question regarding index time parameter Index.ANALYZED_NO_NORMS
> and Index.ANALYZED
> As per the definition, ANALYZED_NO_NORMS is no different than ANALYZED
> option except that it does not store norm (boost) information in the index.
> So if I create 2 indexes: 1 with ANALYZED option and another
> with ANALYZED_NO_NORMS option, for the same indexed document and for the
> same search query both should bring same result with same matching
> percentile.
> But the percentile for the index created using NO_NORM is higher than the
> one created with NORM. Why is that so?
> Attached is the java example with 2 indexes created and the same document
> searched using BooleanQuery.
> The result indicates that though the matched documents are same, their
> percentiles differ? can some one pls help me understand norm better? is it
> because in the index with norm, lucene adds some boost for index fields?
>
>
> Result of running the program is:
> --------------------------------------------------------------------------------------------
> Adding Doc to NORM Index :: author : author1
> ---------------------------------------
> Adding Doc to NO NORM Index :: author : author1
> =======================================
> Adding Doc to NORM Index :: author : author2
> ---------------------------------------
> Adding Doc to NO NORM Index :: author : author2
> =======================================
> Adding Doc to NORM Index :: author : author3
> ---------------------------------------
> Adding Doc to NO NORM Index :: author : author3
> =======================================
> Adding Doc to NORM Index :: author : author4
> ---------------------------------------
> Adding Doc to NO NORM Index :: author : author4
> =======================================
> Search for the book with Author: author1
> NORM:: WITH NORM Query :: (author:author1) (subject:book subject:first
> subject:my) -isbn:123
> Match: 19.738173% || Doc Author: author2 || Doc subject: My next book ||
> Doc ISBN: 333
> Match: 6.168179% || Doc Author: author3 || Doc subject: this first text ||
> Doc ISBN: 444
> Search for the book with Author: author1
> NORM:: WITHOUT NORM Query :: (author:author1) (subject:book subject:first
> subject:my) -isbn:123
> Match: 39.476345% || Doc Author: author2 || Doc subject: My next book ||
> Doc ISBN: 333
> Match: 9.869086% || Doc Author: author3 || Doc subject: this first text ||
> Doc ISBN: 444
>
>
> Thanks
> Saurabh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org