You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alessandro Benedetti (JIRA)" <ji...@apache.org> on 2016/05/25 10:46:12 UTC

[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

    [ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298469#comment-15298469 ] 

Alessandro Benedetti edited comment on SOLR-8096 at 5/25/16 10:45 AM:
----------------------------------------------------------------------

Just adding some additional information as I just incurred on the issue with Solr 6.0 :
Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high cardinality on top of grouping.
Groping was not affecting at all.

All the symptoms are there, Solr 4.10.2 around 70 ms (enum) - 150 ms fcs  and Solr 6.0 around 550 ms .
The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0.
In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio of 0.96 .
Switching from enum to fc to fcs to uif did not change that much.

Moving to DocValues didn't improve that much the situation ( but I was on an optimized index, so I need to try the multi-segmented one according to [~mkhludnev] contribution in Solr 5.4.0 ) .

Moving to field collapsing moved down the query to 110-120 ms ( but this is normal, we were faceting on 260 /1 million orignal docs)
Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination with field collapsing we reached 80-90 ms when warmed.

What are the plan for the future related this ?
Do we want to deprecate the legacy facets implementation and move everything to Json facets ( like it happened with the UIF ) ?
So backward compatible but different implementation ?

Cheers

 


was (Author: alessandro.benedetti):
Just adding some additional information as I just incurred on the issue with Solr 6.0 :
Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high cardinality on top of grouping.
Groping was not affecting at all.

All the symptoms are there, Solr 4.10.2 around 150 ms  and Solr 6.0 around 550 ms .
The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0.
In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio of 0.96 .
Switching from enum to fc to fcs to uif did not change that much.

Moving to DocValues didn't improve that much the situation ( but I was on an optimized index, so I need to try the multi-segmented one according to [~mkhludnev] contribution in Solr 5.4.0 ) .

Moving to field collapsing moved down the query to 110-120 ms ( but this is normal, we were faceting on 260 /1 million orignal docs)
Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination with field collapsing we reached 80-90 ms when warmed.

What are the plan for the future related this ?
Do we want to deprecate the legacy facets implementation and move everything to Json facets ( like it happened with the UIF ) ?
So backward compatible but different implementation ?

Cheers

 

> Major faceting performance regressions
> --------------------------------------
>
>                 Key: SOLR-8096
>                 URL: https://issues.apache.org/jira/browse/SOLR-8096
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>            Reporter: Yonik Seeley
>            Priority: Critical
>         Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was removed as part of LUCENE-5666, causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, with each field having between 0 and 5 values per document.  *Higher numbers represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time		
> ||...................................|| Percent of index being faceted
> ||num_unique_values||	10%	|| 50% || 90% ||
> |10	        | 351.17%	| 1587.08%	| 3057.28% |
> |100   	| 158.10%	| 203.61%	| 1421.93% |
> |1000	| 143.78%	| 168.01%	| 1325.87% |
> |10000	| 137.98%	| 175.31%	| 1233.97% |
> |100000	| 142.98%	| 159.42%	| 1252.45% |
> |1000000	| 255.15%	| 165.17%	| 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting with 5x took 143% of the 4x time, when ~10% of the docs in the index were faceted.
> One user who brought the performance problem to our attention: http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org