You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Neil Ireson <n....@sheffield.ac.uk> on 2014/11/13 14:49:13 UTC

Pivot performance

Hi all,

I was running an experiment which involved counting terms by day, so I was using pivot facets to get the counts. However as the number of time and term values increased the performance got very rubbish. So I knocked up a quick test, using a collection of 1 million documents with a different number of random values, to compare different ways of getting the counts.

1) Combined = combining the time and term in a single field.
2) Facet = for each term set the query to the term and then get the time facet 
3) Pivot = get the pivot facet.

The results show that, as the number of values (i.e. number of terms * number of times) increases, everything is fine until around 100,000 values and then it goes pair-shaped for pivots, taking nearly 4 minutes for 1 million values, the facet based approach produces much more robust performance.

          |      Processing time in ms     |
Values    |  Combined|     Facet|     Pivot|
9         |       144|       391|        62|
100       |       170|       501|        52|
961       |       789|      1014|       153|
10000     |       907|      1966|       940|
99856     |      1647|      3832|      1960|
499849    |      5445|      7573|    136423|
999867    |      9926|      8690|    233725|


In the end I used the facet rather than pivot approach but I’d like to know why pivots have such a catastrophic performance crash? Is this an expected behaviour of pivots or am I doing something wrong?

N

Re: Pivot performance

Posted by Neil Ireson <n....@sheffield.ac.uk>.

I thought for completeness I’d try and find which version change caused the issue and in fact the performance was fine up to and including 4.9.0 and so the problem seems to have appeared only since the latest version.

N


> On 13 Nov 2014, at 14:46, Neil Ireson <n....@sheffield.ac.uk> wrote:
> 
> I found a post (http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-td4074617.html <http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-td4074617.html>) commenting that the pivot performance issue happened after version 4.0.0. So I ran my test on version 4.0.0 and found that the pivoting did not suffer the performance crash, and generally produced much better results.
> 
> Values    |  Combined|     Facet|     Pivot|
> 9         |       180|       300|        34|
> 100       |       163|       521|        30|
> 961       |       729|       666|        72|
> 10000     |       709|      1006|       659|
> 99856     |      1896|      2214|       719|
> 499849    |      2989|      4863|      1719|
> 999872    |      5552|      8113|      3856|
> 
> Therefore I think something has definitely go awry.
> 
> N
> 
> 
>> On 13 Nov 2014, at 13:49, Neil Ireson <n.ireson@sheffield.ac.uk <ma...@sheffield.ac.uk>> wrote:
>> 
>> Hi all,
>> 
>> I was running an experiment which involved counting terms by day, so I was using pivot facets to get the counts. However as the number of time and term values increased the performance got very rubbish. So I knocked up a quick test, using a collection of 1 million documents with a different number of random values, to compare different ways of getting the counts.
>> 
>> 1) Combined = combining the time and term in a single field.
>> 2) Facet = for each term set the query to the term and then get the time facet 
>> 3) Pivot = get the pivot facet.
>> 
>> The results show that, as the number of values (i.e. number of terms * number of times) increases, everything is fine until around 100,000 values and then it goes pair-shaped for pivots, taking nearly 4 minutes for 1 million values, the facet based approach produces much more robust performance.
>> 
>>           |      Processing time in ms     |
>> Values    |  Combined|     Facet|     Pivot|
>> 9         |       144|       391|        62|
>> 100       |       170|       501|        52|
>> 961       |       789|      1014|       153|
>> 10000     |       907|      1966|       940|
>> 99856     |      1647|      3832|      1960|
>> 499849    |      5445|      7573|    136423|
>> 999867    |      9926|      8690|    233725|
>> 
>> 
>> In the end I used the facet rather than pivot approach but I’d like to know why pivots have such a catastrophic performance crash? Is this an expected behaviour of pivots or am I doing something wrong?
>> 
>> N
>> 
>

Re: Pivot performance

Posted by Neil Ireson <n....@sheffield.ac.uk>.

I found a post (http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-td4074617.html <http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-td4074617.html>) commenting that the pivot performance issue happened after version 4.0.0. So I ran my test on version 4.0.0 and found that the pivoting did not suffer the performance crash, and generally produced much better results.

Values    |  Combined|     Facet|     Pivot|
9         |       180|       300|        34|
100       |       163|       521|        30|
961       |       729|       666|        72|
10000     |       709|      1006|       659|
99856     |      1896|      2214|       719|
499849    |      2989|      4863|      1719|
999872    |      5552|      8113|      3856|

Therefore I think something has definitely go awry.

N


> On 13 Nov 2014, at 13:49, Neil Ireson <n....@sheffield.ac.uk> wrote:
> 
> Hi all,
> 
> I was running an experiment which involved counting terms by day, so I was using pivot facets to get the counts. However as the number of time and term values increased the performance got very rubbish. So I knocked up a quick test, using a collection of 1 million documents with a different number of random values, to compare different ways of getting the counts.
> 
> 1) Combined = combining the time and term in a single field.
> 2) Facet = for each term set the query to the term and then get the time facet 
> 3) Pivot = get the pivot facet.
> 
> The results show that, as the number of values (i.e. number of terms * number of times) increases, everything is fine until around 100,000 values and then it goes pair-shaped for pivots, taking nearly 4 minutes for 1 million values, the facet based approach produces much more robust performance.
> 
>           |      Processing time in ms     |
> Values    |  Combined|     Facet|     Pivot|
> 9         |       144|       391|        62|
> 100       |       170|       501|        52|
> 961       |       789|      1014|       153|
> 10000     |       907|      1966|       940|
> 99856     |      1647|      3832|      1960|
> 499849    |      5445|      7573|    136423|
> 999867    |      9926|      8690|    233725|
> 
> 
> In the end I used the facet rather than pivot approach but I’d like to know why pivots have such a catastrophic performance crash? Is this an expected behaviour of pivots or am I doing something wrong?
> 
> N
>