You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Joel Bernstein <jo...@gmail.com> on 2015/08/27 18:22:11 UTC

Main query runs in both phases of distributed search

I've been working on some performance tuning for Alfresco and found that
the main query is being executed in the first and second phase of
distributed search when there are facet refinements.

The code where this happens is in line 347 of the QueryComponent (trunk).

This turns out be pretty expensive in Alfresco's use case.

We already have this DocSet in the first phase but we currently don't cache
DocSets for facets.

Perhaps it's time to consider doing this.

Anybody have any thoughts or objections?

Re: Main query runs in both phases of distributed search

Posted by Joel Bernstein <jo...@gmail.com>.

Sounds like a good plan to me. I'll open a ticket for this.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Aug 27, 2015 at 12:45 PM, Yonik Seeley <ys...@gmail.com> wrote:

> On Thu, Aug 27, 2015 at 12:22 PM, Joel Bernstein <jo...@gmail.com>
> wrote:
> >
> > I've been working on some performance tuning for Alfresco and found that
> the
> > main query is being executed in the first and second phase of distributed
> > search when there are facet refinements.
> >
> > The code where this happens is in line 347 of the QueryComponent (trunk).
> >
> > This turns out be pretty expensive in Alfresco's use case.
> >
> > We already have this DocSet in the first phase but we currently don't
> cache
> > DocSets for facets.
> >
> > Perhaps it's time to consider doing this.
> >
> > Anybody have any thoughts or objections?
>
> Hmmm, so we cache the filters, but we don't cache the set for the main
> query.
> If we did, then there would just be the cost of intersecting those
> (which is very fast).
> Of course, that doesn't work for post filters.
>
> Easiest might be to wrap up the query+filters as a single query and
> reuse the existing filter cache if you know it's going to be needed in
> the second phase.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Main query runs in both phases of distributed search

Posted by Yonik Seeley <ys...@gmail.com>.

On Thu, Aug 27, 2015 at 12:22 PM, Joel Bernstein <jo...@gmail.com> wrote:
>
> I've been working on some performance tuning for Alfresco and found that the
> main query is being executed in the first and second phase of distributed
> search when there are facet refinements.
>
> The code where this happens is in line 347 of the QueryComponent (trunk).
>
> This turns out be pretty expensive in Alfresco's use case.
>
> We already have this DocSet in the first phase but we currently don't cache
> DocSets for facets.
>
> Perhaps it's time to consider doing this.
>
> Anybody have any thoughts or objections?

Hmmm, so we cache the filters, but we don't cache the set for the main query.
If we did, then there would just be the cost of intersecting those
(which is very fast).
Of course, that doesn't work for post filters.

Easiest might be to wrap up the query+filters as a single query and
reuse the existing filter cache if you know it's going to be needed in
the second phase.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Main query runs in both phases of distributed search

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Thu, 2015-08-27 at 12:22 -0400, Joel Bernstein wrote:

> I've been working on some performance tuning for Alfresco and found
> that the main query is being executed in the first and second phase of
> distributed search when there are facet refinements.
> 
[...]

Related: Depending on result size and concrete facet request, that
second phase can take markedly longer than the first phase, resulting in
quite a peculiar response time pattern:
https://twitter.com/anjacks0n/status/509284768035262464

> We already have this DocSet in the first phase but we currently don't
> cache DocSets for facets. 
> 
> Perhaps it's time to consider doing this.

> Anybody have any thoughts or objections?

For String faceting, where a counter structure is used, the overhead of
the second phase can be brought way down if the counter structure is
cached from the first phase: Resolving a term count is just a matter of
resolving its ordinal, then doing a lookup in the counter structure.
Unfortunately that does not work for Numerics.

- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org