You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Luo, Jeff" <jl...@cas.org> on 2009/09/24 15:26:36 UTC

RE: [PMX:FAKE_SENDER] Re: large OR-boolean query

I think the searching is the bottle neck. Solr/Lucene is slow when the
maxBooleanClauses is bigger enough. 

In my previous example, I should say the large query is broken into 100
smaller ones.

Since we still want facet counts with this large query, is there any way
one can accurately aggregate the facet counts coming back from multiple
threads as you suggested?

Thanks a a lot for your reply,

Jeff

-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Wednesday, September 23, 2009 4:39 PM
To: solr-dev@lucene.apache.org
Subject: [PMX:FAKE_SENDER] Re: large OR-boolean query

On Wed, Sep 23, 2009 at 4:26 PM, Luo, Jeff <jl...@cas.org> wrote:
> We are experimenting a parallel approach to issue a large OR-Boolean
> query, e.g., keywords:(1 OR 2 OR 3 OR ... OR 102400), against several
> solr shards.
>
> The way we are trying is to break the large query into smaller ones,
> e.g.,
> the example above can be broken into 10 small queries: keywords:(1 OR
2
> OR 3 OR ... OR 1024), keywords:(1025 OR 1026 OR 1027 OR ... OR 2048),
> etc
>
> Now each shard will get 10 requests and the master will merge the
> results coming back from each shard, similar to the regular
distributed
> search.

You're going to end up with a lot of custom code I think.
Where's the bottleneck... searching or faceting?

If faceting is the bottleneck, making an implementation that utilized
multiple threads would be one of the best ways.
If searching, you could develop a custom query type (QParserPlugin)
that handled your type of queries and split them across multiple
threads.

-Yonik
http://www.lucidimagination.com

Re: [PMX:FAKE_SENDER] Re: large OR-boolean query

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Sep 24, 2009 at 6:56 PM, Luo, Jeff <jl...@cas.org> wrote:

> I think the searching is the bottle neck. Solr/Lucene is slow when the
> maxBooleanClauses is bigger enough.
>
>
Your comment reminded me of this post:

http://invertedindex.blogspot.com/2009/07/making-booleanqueries-faster.html

-- 
Regards,
Shalin Shekhar Mangar.

Re: [PMX:FAKE_SENDER] Re: large OR-boolean query

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Sep 24, 2009 at 9:26 AM, Luo, Jeff <jl...@cas.org> wrote:
> I think the searching is the bottle neck. Solr/Lucene is slow when the
> maxBooleanClauses is bigger enough.

OK, the I'd go with the custom query.  You can reduce the message size
and get gains in query parsing speed too:

{!parallel_or}1,2,3,4,5,6,7,8,9,...2048

Sort all of the terms first before creating the lower level
disjunctions - this can speed up the term seeking (for example seeking
to 1000, then 1001, then 1002 in the same thread will be faster than
1000, 2000, 1.

If you don't need scoring, then it can be made even faster by using bitsets.

-Yonik
http://www.lucidimagination.com