You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Kevin A. Burton" <bu...@newsmonster.org> on 2004/03/29 11:25:51 UTC

Is RangeQuery more efficient than DateFilter?

I have a 7G index.  A query for a random term comes back fast (300ms) 
when I'm not using a DateFilter but when I add the DateFilter it takes 
2.6 seconds.  Way too long.  I assume this is because the filter API 
does a post process so it has to read fields off disk.

Is it possible to do to this with a RangeQuery.  For example you could 
create a "days since January 1, 1970" fields and do a range query from 
between 5 and 10... and then add the original field as well.

I have to make some app changes so I figured I would ask here before 
moving forward.

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


Re: Is RangeQuery more efficient than DateFilter?

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
I've added some information contained on this thread on the wiki.

http://wiki.apache.org/jakarta-lucene/DateRangeQueries

If you wish to add more information, go right ahead, but since I added
this info, I believe it's ultimately my responsibility to maintain it.

sv

On Mon, 29 Mar 2004, Kevin A. Burton wrote:

> Erik Hatcher wrote:
>
> >
> > One more point... caching is done by the IndexReader used for the
> > search, so you will need to keep that instance (i.e. the
> > IndexSearcher) around to benefit from the caching.
> >
> Great... Damn... looked at the source of CachingWrapperFilter and it
> makes sense.  Thanks for the pointer.  The results were pretty amazing.
> Here are the results before and after. Times are in millis:
>
> Before caching the Field:
>
> Searching for Jakarta:
> 2238
> 1910
> 1899
> 1901
> 1904
> 1906
>
> After caching the field:
> 2253
> 10
> 6
> 8
> 6
> 6
>
> That's a HUGE difference :)
>
> I'm very happy :)
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Is RangeQuery more efficient than DateFilter?

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Erik Hatcher wrote:

>
> One more point... caching is done by the IndexReader used for the 
> search, so you will need to keep that instance (i.e. the 
> IndexSearcher) around to benefit from the caching.
>
Great... Damn... looked at the source of CachingWrapperFilter and it 
makes sense.  Thanks for the pointer.  The results were pretty amazing.  
Here are the results before and after. Times are in millis:

Before caching the Field:

Searching for Jakarta:
2238
1910
1899
1901
1904
1906

After caching the field:
2253
10
6
8
6
6

That's a HUGE difference :)

I'm very happy :)

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



Re: Is RangeQuery more efficient than DateFilter?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 29, 2004, at 8:41 AM, Erik Hatcher wrote:
> On Mar 29, 2004, at 4:25 AM, Kevin A. Burton wrote:
>> I have a 7G index.  A query for a random term comes back fast (300ms) 
>> when I'm not using a DateFilter but when I add the DateFilter it 
>> takes 2.6 seconds.  Way too long.  I assume this is because the 
>> filter API does a post process so it has to read fields off disk.
>>
>> Is it possible to do to this with a RangeQuery.  For example you 
>> could create a "days since January 1, 1970" fields and do a range 
>> query from between 5 and 10... and then add the original field as 
>> well.
>
> Are you keeping DateFilter around for more than one search?  The 
> drawback to pure DateFilter is that it does not cache, so each search 
> re-enumerates the terms in the range.  In fact, DateFilter by itself 
> is practically of no use, I think.
>
> If you have a set of canned date ranges, there are two approaches 
> worth considering:  DateFilter wrapped by a CachingWrappingFilter, or 
> a RangeQuery wrapped in a QueryFilter (which does cache).
>
> Performance-wise, I don't really think there is much (any?) difference 
> in these two approaches, so take your pick.  Once the bit sets are 
> cached in a filter, searches will be quite fast.

One more point... caching is done by the IndexReader used for the 
search, so you will need to keep that instance (i.e. the IndexSearcher) 
around to benefit from the caching.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Is RangeQuery more efficient than DateFilter?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 29, 2004, at 4:25 AM, Kevin A. Burton wrote:
> I have a 7G index.  A query for a random term comes back fast (300ms) 
> when I'm not using a DateFilter but when I add the DateFilter it takes 
> 2.6 seconds.  Way too long.  I assume this is because the filter API 
> does a post process so it has to read fields off disk.
>
> Is it possible to do to this with a RangeQuery.  For example you could 
> create a "days since January 1, 1970" fields and do a range query from 
> between 5 and 10... and then add the original field as well.

Are you keeping DateFilter around for more than one search?  The 
drawback to pure DateFilter is that it does not cache, so each search 
re-enumerates the terms in the range.  In fact, DateFilter by itself is 
practically of no use, I think.

If you have a set of canned date ranges, there are two approaches worth 
considering:  DateFilter wrapped by a CachingWrappingFilter, or a 
RangeQuery wrapped in a QueryFilter (which does cache).

Performance-wise, I don't really think there is much (any?) difference 
in these two approaches, so take your pick.  Once the bit sets are 
cached in a filter, searches will be quite fast.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org