You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2007/03/06 08:03:42 UTC

Caching of BitSets from filters and Query.equals()

Not sure if I'm going about this the right way, but I want to use Query 
instances as a key to a HashMap to cache BitSet instances from filtering 
operations.  They are all for the same reader.

That means equals() for any instance of the same generic Query would have to 
return true if the terms, boost and other factors of the Query would result in 
the same BitSet.  Most of the Query instances override equals and return true 
based on the Query.  At least BooleanQuery would not give true, because it does 
not base the equals on the encapsulated clauses, but in practice that is not a 
problem as BooleanQuery will not be used as a Filter.

It looks like it will work in the main, but I was wondering if there was any 
unwritten, but exepected, contract for a Query's equals()?

Thanks
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Caching of BitSets from filters and Query.equals()

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 6, 2007, at 6:35 AM, Antony Bowesman wrote:
> Erik Hatcher wrote:
>> Have a look at the CachingWrappingFilter:
>>     <http://lucene.apache.org/java/docs/api/org/apache/lucene/ 
>> search/CachingWrapperFilter.html> It caches filters by IndexReader  
>> instance.
>
> Doesn't that still have the same issue in terms of equality of  
> conditions that created the filter.  If I have conditions that  
> filter Term X, then the cached Filter is only valid for new  
> requests for Term X.  Term equality is defined by the Javadocs as  
> having the same Field and Text, but to cache a Query, its equality  
> must be deterministic in a similar way, but it isn't.

A Query's equality is defined as having the same structure and order  
as another Query.

> I was hoping that Query.equals() would be defined so that equality  
> would be based on the results that Query generates for a given reader.

That is certainly not the case, as stated above.  query1.equals 
(query2) when all the nested clauses also report back they  
are .equals one another.  This is very important in our unit tests -  
to construct a query through the QueryParser and then through the API  
and compare them.

> I'm hosting an indexing framework, so I've no idea what searches or  
> filters a caller will want to perform.

Have a look at Solr's caching mechanisms for filters, queries, and  
documents.  Very slick and scalable stuff.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Caching of BitSets from filters and Query.equals()

Posted by Antony Bowesman <ad...@teamware.com>.
Chris Hostetter wrote:
> : equals to get q1.equals(q2).  The core Lucene Query implementations do override
> : equals() to satisfy that test, but some of the contrib Query implementations do
> : not override equals, so you would never see the same Query twice and caching
> : BitSets for those Query instances would be a waste of time.
> 
> fileing bugs about those Query instances would be helpful .. bugs with
> patches that demonstrate the problem in unit tests and fix them would be
> even more helpful :)

OK, I'll put it on my todo list, but I've got to get the product out of the door 
this month...

> These classes may prove useful in submitting test cases...
> 
> http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/search/QueryUtils.java?view=log
> http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/search/CheckHits.java

Thanks for those pointers.
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Caching of BitSets from filters and Query.equals()

Posted by Chris Hostetter <ho...@fucit.org>.
: equals to get q1.equals(q2).  The core Lucene Query implementations do override
: equals() to satisfy that test, but some of the contrib Query implementations do
: not override equals, so you would never see the same Query twice and caching
: BitSets for those Query instances would be a waste of time.

fileing bugs about those Query instances would be helpful .. bugs with
patches that demonstrate the problem in unit tests and fix them would be
even more helpful :)

These classes may prove useful in submitting test cases...

http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/search/QueryUtils.java?view=log
http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/search/CheckHits.java



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Caching of BitSets from filters and Query.equals()

Posted by Antony Bowesman <ad...@teamware.com>.
Chris Hostetter wrote:
> : I was hoping that Query.equals() would be defined so that equality would be
> : based on the results that Query generates for a given reader.
> 
> if query1.equals(query2) then the results of query1 on an
> indexreader should be identical to the results of query2 on the same
> indexreader 

Thanks Hoss and Erik.  This is the case I wanted, but re-reading my desire 
above, I see it looks more like the inverse.  Sorry for the confusion.

> ... but there inverse can not be garunteed: if query1 and
> query2 generate identical results when queried against an indexreader that
> says absolutely nothing about wether query1.equals(query2).

Yes, that's not what I was after - As you say, it's not possible to implement.

> in general, what you describe really isn't needed for caching query result
> sets ... what matters is that if you've already seen the query before
> (which you can tell using q1.equals(q2)) then you don't need to execute it

Exactly, and to be sure of that you have to be able to rely on an overridden 
equals to get q1.equals(q2).  The core Lucene Query implementations do override 
equals() to satisfy that test, but some of the contrib Query implementations do 
not override equals, so you would never see the same Query twice and caching 
BitSets for those Query instances would be a waste of time.

Antony





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Caching of BitSets from filters and Query.equals()

Posted by Chris Hostetter <ho...@fucit.org>.
: I was hoping that Query.equals() would be defined so that equality would be
: based on the results that Query generates for a given reader.

if query1.equals(query2) then the results of query1 on an
indexreader should be identical to the results of query2 on the same
indexreader ... but there inverse can not be garunteed: if query1 and
query2 generate identical results when queried against an indexreader that
says absolutely nothing about wether query1.equals(query2).

if you think about it, there's no possible way it ever could, because a
critical piece of information isn't available when testing the
.equals()ness of those queries: the indexreader.  if i have a completley
empty index then the queries "foo:bar" and "yak:wak"will both have the
exact same results, but those same queries on an index with a single
document added might now generate different results -- so how could an
algorithm like you describe possibly be implemented in a Query.equals()
method when the IndexReader isn't known?

in general, what you describe really isn't needed for caching query result
sets ... what matters is that if you've already seen the query before
(which you can tell using q1.equals(q2)) then you don't need to execute it
.. wether or not it results in the same set of docs as a completley
unrelated query doesn't really tell you much (i suppose you could save
some space by reusing the same BitSet object ... but that can be done by
testing hte equality of hte resulting BitSet)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Caching of BitSets from filters and Query.equals()

Posted by Antony Bowesman <ad...@teamware.com>.
Erik Hatcher wrote:
> Have a look at the CachingWrappingFilter:
> 
>     <http://lucene.apache.org/java/docs/api/org/apache/lucene/search/CachingWrapperFilter.html> 
> 
> 
> It caches filters by IndexReader instance.

Doesn't that still have the same issue in terms of equality of conditions that 
created the filter.  If I have conditions that filter Term X, then the cached 
Filter is only valid for new requests for Term X.  Term equality is defined by 
the Javadocs as having the same Field and Text, but to cache a Query, its 
equality must be deterministic in a similar way, but it isn't.

I was hoping that Query.equals() would be defined so that equality would be 
based on the results that Query generates for a given reader.

I'm hosting an indexing framework, so I've no idea what searches or filters a 
caller will want to perform.

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Caching of BitSets from filters and Query.equals()

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Have a look at the CachingWrappingFilter:

	<http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ 
CachingWrapperFilter.html>

It caches filters by IndexReader instance.

	Erik


On Mar 6, 2007, at 2:03 AM, Antony Bowesman wrote:

> Not sure if I'm going about this the right way, but I want to use  
> Query instances as a key to a HashMap to cache BitSet instances  
> from filtering operations.  They are all for the same reader.
>
> That means equals() for any instance of the same generic Query  
> would have to return true if the terms, boost and other factors of  
> the Query would result in the same BitSet.  Most of the Query  
> instances override equals and return true based on the Query.  At  
> least BooleanQuery would not give true, because it does not base  
> the equals on the encapsulated clauses, but in practice that is not  
> a problem as BooleanQuery will not be used as a Filter.
>
> It looks like it will work in the main, but I was wondering if  
> there was any unwritten, but exepected, contract for a Query's  
> equals()?
>
> Thanks
> Antony
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org