You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Ronald K. Braun" <ro...@gmail.com> on 2017/02/10 18:29:20 UTC

Field collapsing, facets, and qtime: caching issue?

I'm experimenting with field collapsing in solrcloud 6.2.1 and have this
set of request parameters against a collection:

/default?indent=on&q=*:*&wt=json&fq={!collapse+field=groupid}

My default handler is just defaults:

    <requestHandler name="/default" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="echoParams">explicit</str>
        </lst>
    </requestHandler>

The first query runs about 600ms, then subsequent repeats of the same query
are 0-5ms for qTime, which I interpret to mean that the query is cached
after the first hit.  All as expected.

However, if I enable facets without actually requesting a facet:

/default?indent=on&q=*:*&wt=json&fq={!collapse+field=groupid}&facet=true

then every submission of the query runs at ~600ms.  I interpret this to
mean that caching is somehow defeated when facet processing is set.  Facets
are empty as expected:

    facet_counts": {
      "facet_queries": { },
      "facet_fields": { },
      "facet_ranges": { },
      "facet_intervals": { },
      "facet_heatmaps": { }
    }

If I remove the collapse directive

/default?indent=on&q=*:*&wt=json&facet=true

qTimes are back down to 0 after the initial query whether or not faceting
is requested.

Is this expected behaviour or am I missing some supporting configuration
for proper field collapsing?

Thanks!

Ron

Re: Field collapsing, facets, and qtime: caching issue?

Posted by Joel Bernstein <jo...@gmail.com>.
The additional work is done in the QueryComponent I believe. There is a
flag that tells the QueryComponent if the DocSet is needed. If that's set
to true and it's not available it will build the DocSet.

We ran into the facet refinement issue I mentioned at Alfresco and I
created this ticket: https://issues.apache.org/jira/browse/SOLR-8092.

Fixing this problem would likely resolve your scenario as well.

I haven't broken ground on it yet though.






Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Feb 13, 2017 at 12:52 PM, ronbraun <ro...@gmail.com> wrote:

> Thanks for the explanation, Joel.  When you say the query/collapse needs to
> be re-run, is this the facet component that needs to do this?  The
> confusing
> part is that the debug suggests the time is being spent in the query
> component when faceting is enabled.  My naive reading of your response
> would
> give me the expectation that by enabling facets with facet=true, the facet
> component would need to do additional work and so the qTime cost would be
> paid by that component.  Here is the debug I get for repeated hits against
> /default?indent=on&q=*:*&wt=json&fq={!collapse+field=groupid}&facet=true&
> debugQuery=on:
>
>     "process": {
>         "time": 200.0,
>         "query": { "time": 200.0 },
>         "facet": { "time": 0.0 },
>         "facet_module": { "time": 0.0 },
>         "mlt": { "time": 0.0 },
>         "highlight": { "time": 0.0 },
>         "stats": { "time": 0.0 },
>         "expand": { "time": 0.0 },
>         "terms": { "time": 0.0 },
>         "debug": { "time": 0.0 }
>     }
>
> Or perhaps the facet component uses the query component to rerun the query
> and the time is billed to that component?
>
> Regardless, is the lack of caching a known and ticketed issue?  The
> consensus across various other solr tickets regarding grouped search seems
> to be to prefer the collapse/expand approach to grouping.  I'm using
> non-grouped search now but would like to switch to grouped and
> collapse/expand could work for my use case, but the effective defeat of
> query caching for any faceted application seems pretty problematic and I'd
> be hesitant to switch over if I'm effectively losing query caching by doing
> so.  My query cache hit rate is reasonably high.
>
> Thanks!
>
> Ron
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Field-collapsing-facets-and-qtime-caching-
> issue-tp4319759p4320114.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Field collapsing, facets, and qtime: caching issue?

Posted by ronbraun <ro...@gmail.com>.
Thanks for the explanation, Joel.  When you say the query/collapse needs to
be re-run, is this the facet component that needs to do this?  The confusing
part is that the debug suggests the time is being spent in the query
component when faceting is enabled.  My naive reading of your response would
give me the expectation that by enabling facets with facet=true, the facet
component would need to do additional work and so the qTime cost would be
paid by that component.  Here is the debug I get for repeated hits against
/default?indent=on&q=*:*&wt=json&fq={!collapse+field=groupid}&facet=true&debugQuery=on:

    "process": {
        "time": 200.0,
        "query": { "time": 200.0 },
        "facet": { "time": 0.0 },
        "facet_module": { "time": 0.0 },
        "mlt": { "time": 0.0 },
        "highlight": { "time": 0.0 },
        "stats": { "time": 0.0 },
        "expand": { "time": 0.0 },
        "terms": { "time": 0.0 },
        "debug": { "time": 0.0 }
    }

Or perhaps the facet component uses the query component to rerun the query
and the time is billed to that component?

Regardless, is the lack of caching a known and ticketed issue?  The
consensus across various other solr tickets regarding grouped search seems
to be to prefer the collapse/expand approach to grouping.  I'm using
non-grouped search now but would like to switch to grouped and
collapse/expand could work for my use case, but the effective defeat of
query caching for any faceted application seems pretty problematic and I'd
be hesitant to switch over if I'm effectively losing query caching by doing
so.  My query cache hit rate is reasonably high.

Thanks!

Ron




--
View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-facets-and-qtime-caching-issue-tp4319759p4320114.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field collapsing, facets, and qtime: caching issue?

Posted by Joel Bernstein <jo...@gmail.com>.
It's been a little while since I looked at this section of the code. But
what I believe is going on is that the queryResultCache has kicked in which
will give you the DocList (the top N docs that match query/filters/sort)
back immediately. But faceting requires a DocSet which is a bitset of all
docs that match the query. The DocSet is not cached in this scenario, so it
needs to be regenerated, which means re-running the query/collapse. So I
believe your instincts are correct. This same issue gets worse if you have
facets that need refinement. In this scenario the DocSet is needed on the
first and second pass and is not cached, the so query/collapse need to be
run twice for facets.

The fix for this would be to start caching the DocSets needed for faceting.




Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Feb 10, 2017 at 1:29 PM, Ronald K. Braun <ro...@gmail.com> wrote:

> I'm experimenting with field collapsing in solrcloud 6.2.1 and have this
> set of request parameters against a collection:
>
> /default?indent=on&q=*:*&wt=json&fq={!collapse+field=groupid}
>
> My default handler is just defaults:
>
>     <requestHandler name="/default" class="solr.SearchHandler">
>         <lst name="defaults">
>             <str name="echoParams">explicit</str>
>         </lst>
>     </requestHandler>
>
> The first query runs about 600ms, then subsequent repeats of the same query
> are 0-5ms for qTime, which I interpret to mean that the query is cached
> after the first hit.  All as expected.
>
> However, if I enable facets without actually requesting a facet:
>
> /default?indent=on&q=*:*&wt=json&fq={!collapse+field=groupid}&facet=true
>
> then every submission of the query runs at ~600ms.  I interpret this to
> mean that caching is somehow defeated when facet processing is set.  Facets
> are empty as expected:
>
>     facet_counts": {
>       "facet_queries": { },
>       "facet_fields": { },
>       "facet_ranges": { },
>       "facet_intervals": { },
>       "facet_heatmaps": { }
>     }
>
> If I remove the collapse directive
>
> /default?indent=on&q=*:*&wt=json&facet=true
>
> qTimes are back down to 0 after the initial query whether or not faceting
> is requested.
>
> Is this expected behaviour or am I missing some supporting configuration
> for proper field collapsing?
>
> Thanks!
>
> Ron
>