You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Sharma, Siddharth" <Si...@Staples.com> on 2005/10/17 21:31:52 UTC

Too many clauses

Query:  caught a class org.apache.lucene.queryParser.ParseException
 with message: Too many boolean clauses

I realize why this is happening (the 1024 clauses limit for BooleanQuery).
My question is more design related.

During customer registration, the customer defines a set of skus/products
that we should never display to them. These products are part of our catalog
offering but we are forbidden to make them available to this customer. This
list is called the block list and can potentially be large (6 to 7
thousand).

When a customer logs in, this block list is identified and currently I am
using QueryParser to parse these skus to block/exclude. That is why I am
hitting against the 1024 upper bound.

To circumvent it, here are a few options that I have thought of:
1. Chunk it up: 
  a. Create a filter based on a query that has a maximum of 1024. 
  b. Get its bits.
  c. Get the next 1024 blocked skus and create a filter out of it and get   
     its bits.
  d. AND the two BitSets.
  e. Do this till all blocked skus and other filters are ANDed together for 
     the final BitSet.

2. Build the block list into the index somehow
  a. My index is based on SKUs, not on customer.
  b. I could add a field in each SKU document that contains the customer-ids

     who want this SKU blocked.
  c. But this field's value could be very large.

3. Some other obvious way that I am stupid enough not to be able to 
   visualize.

Thanks in advance
Sid





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Too many clauses

Posted by Chris Hostetter <ho...@fucit.org>.
:
: To circumvent it, here are a few options that I have thought of:
: 1. Chunk it up:
:   a. Create a filter based on a query that has a maximum of 1024.
:   b. Get its bits.
:   c. Get the next 1024 blocked skus and create a filter out of it and get
:      its bits.
:   d. AND the two BitSets.
:   e. Do this till all blocked skus and other filters are ANDed together for
:      the final BitSet.

Instead of building up your filter based on a query, why not build up your
filter directly? ... Using a QueryFilter requires that scoring happen --
but you don't care about the scoring, you just want to know if a doc
matches a keyword or not.  Take a look at the way RangeFilter is
implimented.  it should be able to searve as a good example of how you can
write a "SetFilter" that takes in a field name and a set of keywords, and
only "passes" documents where one of the keywords shows up as an indexed
value for that field.  Now you don't have toworry baout the 1024 limit,
you don't have to "chunk" anything, your searches will be faster because
you don't need to worry about the scoring aspects of a the BooleanQueries.


Hint: you can sort the input Set, and then iterate over it, pulling out
the TermDocs for each, and scoring each doc in each TermDocs.  now your
Filter indicates all the products that do match those skus, and you'll
want an "InverseFilter to wrap it and indicate all the products that
*don't* match those skus.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org