You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Sylvain Puccianti <sp...@yahoo.fr> on 2002/06/20 23:27:57 UTC

Datefiltering performance issues

Hi. I am experiencing some performance issues with the
Datefilter. Basically, I'm searching an index with
around 200000 documents. I've got several threads
sharing the same IndexReader object. A single thread
searching and date filtering the index returns in
about 300ms. If 5 threads are performing searches
simultaneously, date filtering (mainly the creation of
the bitset of documents matching the date criteria I'm
passing) takes around 8s ! With 10 threads,
performance drops to 30s per query !
My investigations led me to the get(Term term) method
of the TermInfoReader. If I'm right (which I'm not
sure of at all...), this method is synchronized and
each thread has to call it for each date term within
the date bounds. So, it looks like there is some
contention here... If I understand well, this method
is synchronized because there is only one single
instance of TermInfoReader per SegmentReader, so each
thread shares the same TermEnum for date filtering.
Does anybody have any idea on how I could make
Datefiletering faster ?

Any help is welcomed ! Thanks,

Sylvain

___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Datefiltering performance issues

Posted by Matthew King <ma...@gnik.com>.
In a former life (not with Lucene), I've handled this range problem by 
indexing the dates in multiple pieces (YYYY, YYYYMM, YYYYMMDD) and then 
at query time constructed multiple ranges to cover what the user wanted:

So,
   [19990323 20020612]
becomes:
   [19990323 19990331] AND
   [199904 199912] AND
   [2000 2001] AND
   [200201 200205] AND
   [20020601 20020612]

(I may have my lucene query syntax mussed up here, but hopefully my 
intention is clear)

This dramatically limits the number of terms that need to be evaluated.  
(at the expense of larger index size)   Also, the 3 term types also need 
to be in separate "fields" (or prefixed) so that the ranges only include 
one type.

The same trick can be played with non-dates by taking using a 2 word 
prefix.  ("dog" gets indexed as "dog" and "do")   Obviously care should 
be taken as to what fields have this extra indexing done.  (probably 
just Keyword)

It's an idea anyway...

- matt

On Friday, June 21, 2002, at 01:35 PM, Sylvain Puccianti wrote:

> Thanks for the quick answer !
> I've just downloaded the 1.2 release jar, and my test
> gives  me the same results. The more threads I've got,
> the slower Datefiltering gets (performance degradation
> is almost exponential).
> I tried to use the RangeQuery, as advised by Scott
> Ganyo, but it does not work very well. RangeQuery
> creates a TermQuery for each term within lowerTerm and
> higherTerm. If my range is too high, as I've got
> thoushands of documents, it just blows up memory...
> Is there any way to avoid sharing the TermInfosReader
> between all threads when creating the Bitset, or
> somehow avoid synchronizing the get method (if it is
> actually the bottleneck here) ?
>
> Thanks,
>
> Sylvain
>
> --- Doug Cutting <cu...@lucene.com> a écrit : > What
> version of Lucene are you using?  There was a
>> patch made in January
>> to address multi-threaded performance of DateFilter.
>>
>> Doug
>>
>>
>> --
>> To unsubscribe, e-mail:
>> <ma...@jakarta.apache.org>
>> For additional commands, e-mail:
>> <ma...@jakarta.apache.org>
>>
>
> ___________________________________________________________
> Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
> Yahoo! Mail : http://fr.mail.yahoo.com
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-
> help@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Datefiltering performance issues

Posted by Sylvain Puccianti <sp...@yahoo.fr>.
Thanks for the quick answer !
I've just downloaded the 1.2 release jar, and my test 
gives  me the same results. The more threads I've got,
the slower Datefiltering gets (performance degradation
is almost exponential).
I tried to use the RangeQuery, as advised by Scott
Ganyo, but it does not work very well. RangeQuery
creates a TermQuery for each term within lowerTerm and
higherTerm. If my range is too high, as I've got
thoushands of documents, it just blows up memory...
Is there any way to avoid sharing the TermInfosReader
between all threads when creating the Bitset, or
somehow avoid synchronizing the get method (if it is
actually the bottleneck here) ?

Thanks,

Sylvain

--- Doug Cutting <cu...@lucene.com> a écrit : > What
version of Lucene are you using?  There was a
> patch made in January 
> to address multi-threaded performance of DateFilter.
> 
> Doug
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
>  

___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Datefiltering performance issues

Posted by Doug Cutting <cu...@lucene.com>.
What version of Lucene are you using?  There was a patch made in January 
to address multi-threaded performance of DateFilter.

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>