You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Avi Levy <le...@wesee.com> on 2013/04/03 19:34:03 UTC

Retrieval Performance degradation when indexing a numeric field

Hello,

I have a Lucene.NET index created with version 2.9.4.1. I have re-indexed
the index from scratch and added a numeric field to the index representing a
date. The field is not stored. The numeric value represents a date in the
format of yyyyMMddhhmm. 

I noticed that when I use queries on the index they take significantly
longer. 
Below you can see a table for the average query time. 
Each run was of 3000 different queries, all running one after the other with
a short sleep between them.


 

No dates indexed

Dates indexed


Not optimized

79.76 ms

312.63


Optimized

75.26 ms

78.40

I optimized by setting these parameters: UseCompaundFile = false,
RamBufferSize = 200, TermIndexInterval = 16.

Is this an expected behavior?
Is there something I can do to improve the performance of non-optimized
index?

 

Thanks,
Avi

 

 

 


RE: Retrieval Performance degradation when indexing a numeric field

Posted by Avi Levy <le...@wesee.com>.
Sorry, a correction, 
When I add a clause to the Boolean query with the dates it takes 600ms (not
300 ms as I wrote below).

-----Original Message-----
From: Avi Levy [mailto:levy@wesee.com] 
Sent: Thursday, April 04, 2013 4:45 PM
To: user@lucenenet.apache.org; 'Maxim Terletsky'
Subject: RE: Retrieval Performance degradation when indexing a numeric field

Thanks for the replies. I will try the suggestion for field precision, and
splitting the dates.

The table was not formatted properly.  Here is the data again:
No dates indexed & Not optimized - 79.76 ms No dates indexed & Optimized -
75.26 ms Dates indexed & Not optimized - 312.63 ms Dates indexed & Optimized
- 78.40 ms

Here are the fields definition:
var idField = new Field( "ID", String.Empty, Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS ); document.Add( idField );

var id2Field = new Field( "ID2", String.Empty, Field.Store.YES,
Field.Index.NO ); document.Add( id2Field );

var txtField = new Field( "txtField", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txtField );

var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txtField );

var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED ); document.Add( txtField );

var dateField = new NumericField( "Date", Field.Store.NO, true );
document.Add(dateField);

I set the values to the fields. For the new date field I set it like this:
Int64 dateInt = <some date>;
dateField.SetLongValue(dateInt);

The query:
var fields = new String[3];
Dictionary<String, Single> boosts = new Dictionary<String, Single>();

fields[0]="txtField";
boosts.Add( fields[0],<Value>);
fields[1]="txt2Field";
boosts.Add( fields[1],<Value>);
fields[2]="txt3Field";
boosts.Add( fields[2],<Value>);
MultiFieldQueryParser parser = new MultiFieldQueryParser( Version.LUCENE_29,
fields, analyzer, boosts ); var boolQuery = new BooleanQuery(); Query
simpleParsedQuery = parser.Parse( queryText ); boolQuery.Add(
simpleParsedQuery, BooleanClause.Occur.MUST );


Notice that I don't search by the date field.
Also, the Boolean query is more complex, but I did not include the other
parts in it for simplicity.

BTW, when I add another clause to the Boolean query with the date, I get
very bad results at around 300ms.
NumericRangeQuery datesQuery = NumericRangeQuery.NewLongRange(  "Date",
<Date>, Int64.MaxValue, true, true ); boolQuery.Add( datesQuery,
BooleanClause.Occur.MUST );


-----Original Message-----
From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On
Behalf Of Itamar Syn-Hershko
Sent: Thursday, April 04, 2013 10:07 AM
To: user@lucenenet.apache.org; Maxim Terletsky
Subject: Re: Retrieval Performance degradation when indexing a numeric field

Not sure that I'm following. Can you show an example of a Document and a
Query?


On Thu, Apr 4, 2013 at 8:19 AM, Maxim Terletsky <sx...@yahoo.com> wrote:

> Hi,
> We deliberately left the queries the same for both indexes, the one 
> with date field indexed and the one without. On both indexes the query 
> didn't include the date field, only some string field.
>
>
>
> Maxim
>
>
> ________________________________
>  From: Itamar Syn-Hershko <it...@code972.com>
> To: user@lucenenet.apache.org
> Sent: Wednesday, April 3, 2013 8:57 PM
> Subject: Re: Retrieval Performance degradation when indexing a numeric 
> field
>
> What type of queries? This could happen, yes
>
> Try playing with field precision, and moving to filters where possible 
> On Apr 3, 2013 8:33 PM, "Avi Levy" <le...@wesee.com> wrote:
>
> > Hello,
> >
> > I have a Lucene.NET index created with version 2.9.4.1. I have
> re-indexed
> > the index from scratch and added a numeric field to the index
> representing
> > a
> > date. The field is not stored. The numeric value represents a date 
> > in the format of yyyyMMddhhmm.
> >
> > I noticed that when I use queries on the index they take 
> > significantly longer.
> > Below you can see a table for the average query time.
> > Each run was of 3000 different queries, all running one after the 
> > other with a short sleep between them.
> >
> >
> >
> >
> > No dates indexed
> >
> > Dates indexed
> >
> >
> > Not optimized
> >
> > 79.76 ms
> >
> > 312.63
> >
> >
> > Optimized
> >
> > 75.26 ms
> >
> > 78.40
> >
> > I optimized by setting these parameters: UseCompaundFile = false, 
> > RamBufferSize = 200, TermIndexInterval = 16.
> >
> > Is this an expected behavior?
> > Is there something I can do to improve the performance of 
> > non-optimized index?
> >
> >
> >
> > Thanks,
> > Avi
> >
> >
> >
> >
> >
> >
> >
> >
>


RE: Retrieval Performance degradation when indexing a numeric field

Posted by Avi Levy <le...@wesee.com>.
Thanks for the replies. I will try the suggestion for field precision, and
splitting the dates.

The table was not formatted properly.  Here is the data again:
No dates indexed & Not optimized - 79.76 ms
No dates indexed & Optimized - 75.26 ms
Dates indexed & Not optimized - 312.63 ms
Dates indexed & Optimized - 78.40 ms

Here are the fields definition:
var idField = new Field( "ID", String.Empty, Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS );
document.Add( idField );

var id2Field = new Field( "ID2", String.Empty, Field.Store.YES,
Field.Index.NO );
document.Add( id2Field );

var txtField = new Field( "txtField", String.Empty, Field.Store.NO,
Field.Index. ANALYZED );
document.Add( txtField );

var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED );
document.Add( txtField );

var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO,
Field.Index. ANALYZED );
document.Add( txtField );

var dateField = new NumericField( "Date", Field.Store.NO, true );
document.Add(dateField);

I set the values to the fields. For the new date field I set it like this:
Int64 dateInt = <some date>;
dateField.SetLongValue(dateInt);

The query:
var fields = new String[3];
Dictionary<String, Single> boosts = new Dictionary<String, Single>();

fields[0]="txtField";
boosts.Add( fields[0],<Value>);
fields[1]="txt2Field";
boosts.Add( fields[1],<Value>);
fields[2]="txt3Field";
boosts.Add( fields[2],<Value>);
MultiFieldQueryParser parser = new MultiFieldQueryParser( Version.LUCENE_29,
fields, analyzer, boosts );
var boolQuery = new BooleanQuery();
Query simpleParsedQuery = parser.Parse( queryText );
boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST );

Notice that I don't search by the date field.
Also, the Boolean query is more complex, but I did not include the other
parts in it for simplicity.

BTW, when I add another clause to the Boolean query with the date, I get
very bad results at around 300ms.
NumericRangeQuery datesQuery = NumericRangeQuery.NewLongRange(  "Date",
<Date>, Int64.MaxValue, true, true );
boolQuery.Add( datesQuery, BooleanClause.Occur.MUST );


-----Original Message-----
From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On
Behalf Of Itamar Syn-Hershko
Sent: Thursday, April 04, 2013 10:07 AM
To: user@lucenenet.apache.org; Maxim Terletsky
Subject: Re: Retrieval Performance degradation when indexing a numeric field

Not sure that I'm following. Can you show an example of a Document and a
Query?


On Thu, Apr 4, 2013 at 8:19 AM, Maxim Terletsky <sx...@yahoo.com> wrote:

> Hi,
> We deliberately left the queries the same for both indexes, the one 
> with date field indexed and the one without. On both indexes the query 
> didn't include the date field, only some string field.
>
>
>
> Maxim
>
>
> ________________________________
>  From: Itamar Syn-Hershko <it...@code972.com>
> To: user@lucenenet.apache.org
> Sent: Wednesday, April 3, 2013 8:57 PM
> Subject: Re: Retrieval Performance degradation when indexing a numeric 
> field
>
> What type of queries? This could happen, yes
>
> Try playing with field precision, and moving to filters where possible 
> On Apr 3, 2013 8:33 PM, "Avi Levy" <le...@wesee.com> wrote:
>
> > Hello,
> >
> > I have a Lucene.NET index created with version 2.9.4.1. I have
> re-indexed
> > the index from scratch and added a numeric field to the index
> representing
> > a
> > date. The field is not stored. The numeric value represents a date 
> > in the format of yyyyMMddhhmm.
> >
> > I noticed that when I use queries on the index they take 
> > significantly longer.
> > Below you can see a table for the average query time.
> > Each run was of 3000 different queries, all running one after the 
> > other with a short sleep between them.
> >
> >
> >
> >
> > No dates indexed
> >
> > Dates indexed
> >
> >
> > Not optimized
> >
> > 79.76 ms
> >
> > 312.63
> >
> >
> > Optimized
> >
> > 75.26 ms
> >
> > 78.40
> >
> > I optimized by setting these parameters: UseCompaundFile = false, 
> > RamBufferSize = 200, TermIndexInterval = 16.
> >
> > Is this an expected behavior?
> > Is there something I can do to improve the performance of 
> > non-optimized index?
> >
> >
> >
> > Thanks,
> > Avi
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: Retrieval Performance degradation when indexing a numeric field

Posted by Itamar Syn-Hershko <it...@code972.com>.
Not sure that I'm following. Can you show an example of a Document and a
Query?


On Thu, Apr 4, 2013 at 8:19 AM, Maxim Terletsky <sx...@yahoo.com> wrote:

> Hi,
> We deliberately left the queries the same for both indexes, the one with
> date field indexed and the one without. On both indexes the query didn't
> include the date field, only some string field.
>
>
>
> Maxim
>
>
> ________________________________
>  From: Itamar Syn-Hershko <it...@code972.com>
> To: user@lucenenet.apache.org
> Sent: Wednesday, April 3, 2013 8:57 PM
> Subject: Re: Retrieval Performance degradation when indexing a numeric
> field
>
> What type of queries? This could happen, yes
>
> Try playing with field precision, and moving to filters where possible
> On Apr 3, 2013 8:33 PM, "Avi Levy" <le...@wesee.com> wrote:
>
> > Hello,
> >
> > I have a Lucene.NET index created with version 2.9.4.1. I have
> re-indexed
> > the index from scratch and added a numeric field to the index
> representing
> > a
> > date. The field is not stored. The numeric value represents a date in the
> > format of yyyyMMddhhmm.
> >
> > I noticed that when I use queries on the index they take significantly
> > longer.
> > Below you can see a table for the average query time.
> > Each run was of 3000 different queries, all running one after the other
> > with
> > a short sleep between them.
> >
> >
> >
> >
> > No dates indexed
> >
> > Dates indexed
> >
> >
> > Not optimized
> >
> > 79.76 ms
> >
> > 312.63
> >
> >
> > Optimized
> >
> > 75.26 ms
> >
> > 78.40
> >
> > I optimized by setting these parameters: UseCompaundFile = false,
> > RamBufferSize = 200, TermIndexInterval = 16.
> >
> > Is this an expected behavior?
> > Is there something I can do to improve the performance of non-optimized
> > index?
> >
> >
> >
> > Thanks,
> > Avi
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: Retrieval Performance degradation when indexing a numeric field

Posted by Maxim Terletsky <sx...@yahoo.com>.
Hi,
We deliberately left the queries the same for both indexes, the one with date field indexed and the one without. On both indexes the query didn't include the date field, only some string field.


 
Maxim


________________________________
 From: Itamar Syn-Hershko <it...@code972.com>
To: user@lucenenet.apache.org 
Sent: Wednesday, April 3, 2013 8:57 PM
Subject: Re: Retrieval Performance degradation when indexing a numeric field
 
What type of queries? This could happen, yes

Try playing with field precision, and moving to filters where possible
On Apr 3, 2013 8:33 PM, "Avi Levy" <le...@wesee.com> wrote:

> Hello,
>
> I have a Lucene.NET index created with version 2.9.4.1. I have re-indexed
> the index from scratch and added a numeric field to the index representing
> a
> date. The field is not stored. The numeric value represents a date in the
> format of yyyyMMddhhmm.
>
> I noticed that when I use queries on the index they take significantly
> longer.
> Below you can see a table for the average query time.
> Each run was of 3000 different queries, all running one after the other
> with
> a short sleep between them.
>
>
>
>
> No dates indexed
>
> Dates indexed
>
>
> Not optimized
>
> 79.76 ms
>
> 312.63
>
>
> Optimized
>
> 75.26 ms
>
> 78.40
>
> I optimized by setting these parameters: UseCompaundFile = false,
> RamBufferSize = 200, TermIndexInterval = 16.
>
> Is this an expected behavior?
> Is there something I can do to improve the performance of non-optimized
> index?
>
>
>
> Thanks,
> Avi
>
>
>
>
>
>
>
>

Re: Retrieval Performance degradation when indexing a numeric field

Posted by Itamar Syn-Hershko <it...@code972.com>.
What type of queries? This could happen, yes

Try playing with field precision, and moving to filters where possible
On Apr 3, 2013 8:33 PM, "Avi Levy" <le...@wesee.com> wrote:

> Hello,
>
> I have a Lucene.NET index created with version 2.9.4.1. I have re-indexed
> the index from scratch and added a numeric field to the index representing
> a
> date. The field is not stored. The numeric value represents a date in the
> format of yyyyMMddhhmm.
>
> I noticed that when I use queries on the index they take significantly
> longer.
> Below you can see a table for the average query time.
> Each run was of 3000 different queries, all running one after the other
> with
> a short sleep between them.
>
>
>
>
> No dates indexed
>
> Dates indexed
>
>
> Not optimized
>
> 79.76 ms
>
> 312.63
>
>
> Optimized
>
> 75.26 ms
>
> 78.40
>
> I optimized by setting these parameters: UseCompaundFile = false,
> RamBufferSize = 200, TermIndexInterval = 16.
>
> Is this an expected behavior?
> Is there something I can do to improve the performance of non-optimized
> index?
>
>
>
> Thanks,
> Avi
>
>
>
>
>
>
>
>

RE: Retrieval Performance degradation when indexing a numeric field

Posted by Franklin Simmons <fs...@sccmediaserver.com>.
FWIW, it's been my experience that it is best to split year/month/day and hour/minute/second into two fields, particularly with respect for searches such as from the beginning of time to now, and for sorting. YMMV.  Please note my experience is limited to Lucene.Net v2.3.


-----Original Message-----
From: Avi Levy [mailto:levy@wesee.com] 
Sent: Wednesday, April 03, 2013 1:34 PM
To: user@lucenenet.apache.org
Subject: Retrieval Performance degradation when indexing a numeric field

Hello,

I have a Lucene.NET index created with version 2.9.4.1. I have re-indexed the index from scratch and added a numeric field to the index representing a date. The field is not stored. The numeric value represents a date in the format of yyyyMMddhhmm. 

I noticed that when I use queries on the index they take significantly longer. 
Below you can see a table for the average query time. 
Each run was of 3000 different queries, all running one after the other with a short sleep between them.


 

No dates indexed

Dates indexed


Not optimized

79.76 ms

312.63


Optimized

75.26 ms

78.40

I optimized by setting these parameters: UseCompaundFile = false, RamBufferSize = 200, TermIndexInterval = 16.

Is this an expected behavior?
Is there something I can do to improve the performance of non-optimized index?

 

Thanks,
Avi