You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Solr User <so...@gmail.com> on 2016/09/26 21:59:30 UTC

Re: Faceting and Grouping Performance Degradation in Solr 5

Thanks again for your work on honoring the facet.method.  I have an
observation that I would like to share and get your feedback on if possible.

I performance tested Solr 5.5.2 with various facet queries and the only way
I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
Here are the details.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
5.5.2 (without deletes): 125 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
5.5.2 (without deletes): 42 ms
5.5.2 (1 segment without deletes): 34 ms



On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
abenedetti@apache.org> wrote:

> Interesting developments :
>
> https://issues.apache.org/jira/browse/SOLR-9176
>
> I think we found why term Enum seems slower in recent Solr !
> In our case it is likely to be related to the commit I mention in the Jira.
> Have a check Joel !
>
> On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> abenedetti@apache.org> wrote:
>
> > I am investigating this scenario right now.
> > I can confirm that the enum slowness is in Solr 6.0 as well.
> > And I agree with Joel, it seems to be un-related with the famous faceting
> > regression :(
> >
> > Furthermore with the legacy facet approach, if you set docValues for the
> > field you are not going to be able to try the enum approach anymore.
> >
> > org/apache/solr/request/SimpleFacets.java:448
> >
> > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> >   // only fc can handle docvalues types
> >   method = FacetMethod.FC;
> > }
> >
> >
> > I got really horrible regressions simply using term enum in both Solr 4
> > and Solr 6.
> >
> > And even the most optimized fcs approach with docValues and
> > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> >
> > i.e.
> >
> > For some sample queries I have 40 ms vs 160 ms and similar...
> > I think we should open an issue if we can confirm it is not related with
> > the other.
> > A lot of people will continue using the legacy approach for a while...
> >
> > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <jo...@gmail.com>
> > wrote:
> >
> >> The enum slowness is interesting. It would appear on the surface to not
> be
> >> related to the FieldCache issue. I don't think the main emphasis of the
> >> JSON facet API has been the enum approach. You may find using the JSON
> >> facet API and eliminating the use of enum meets your performance needs.
> >>
> >> With the CollapsingQParserPlugin top_fc is definitely faster during
> >> queries. The tradeoff is slower warming times and increased memory usage
> >> if
> >> the collapse fields are used in faceting, as faceting will load the
> field
> >> into a different cache.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Wed, May 18, 2016 at 5:28 PM, Solr User <so...@gmail.com> wrote:
> >>
> >> > Joel,
> >> >
> >> > Thank you for taking the time to respond to my question.  I tried the
> >> JSON
> >> > Facet API for one query that uses facet.method=enum (since this one
> has
> >> a
> >> > ton of unique values and performed better with enum) but this was way
> >> > slower than even the slower Solr 5 times.  I did not try the new API
> >> with
> >> > the non-enum queries though so I will give that a go.  It looks like
> >> Solr
> >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> >> >
> >> > If these do not prove helpful, it looks like I will need to wait for
> >> > SOLR-8096 to be resolved before upgrading.
> >> >
> >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> use
> >> > collapse/expand for some queries but traditional grouping for others
> >> due to
> >> > performance.  It will be interesting to see if those grouping queries
> >> > perform better now using CollapsingQParser with top_fc.
> >> >
> >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <jo...@gmail.com>
> >> > wrote:
> >> >
> >> > > Yes, SOLR-8096 is the issue here.
> >> > >
> >> > > I don't believe indexing with docValues is going to help too much
> with
> >> > > this. The enum slowness may not be related, but I'm not positive
> about
> >> > > that.
> >> > >
> >> > > The major slowdowns are likely due to the removal of the top level
> >> > > FieldCache from general use and the removal of the FieldValuesCache
> >> which
> >> > > was used for multi-value field faceting.
> >> > >
> >> > > The JSON facet API covers all the functionality in the traditional
> >> > > faceting, and it has been developed to be very performant.
> >> > >
> >> > > You may also want to see if Collapse/Expand can meet your
> applications
> >> > > needs rather Grouping. It allows you to specify using a top level
> >> > > FieldCache if performance is a blocker without it.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Joel Bernstein
> >> > > http://joelsolr.blogspot.com/
> >> > >
> >> > > On Wed, May 18, 2016 at 10:42 AM, Solr User <so...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Does anyone know the answer to this?
> >> > > >
> >> > > > On Wed, May 4, 2016 at 2:19 PM, Solr User <so...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr
> 5.4.1
> >> > but
> >> > > > had
> >> > > > > to abort due to average response times degraded from a baseline
> >> > volume
> >> > > > > performance test.  The affected queries involved faceting (both
> >> enum
> >> > > > method
> >> > > > > and default) and grouping.  There is a critical bug
> >> > > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open
> >> > which I
> >> > > > > gather is the cause of the slower response times.  One concern I
> >> have
> >> > > is
> >> > > > > that discussions around the issue offer the suggestion of
> indexing
> >> > with
> >> > > > > docValues which alleviated the problem in at least that one
> >> reported
> >> > > > case.
> >> > > > > However, indexing with docValues did not improve the performance
> >> in
> >> > my
> >> > > > case.
> >> > > > >
> >> > > > > Can someone please confirm or correct my understanding that this
> >> > issue
> >> > > > has
> >> > > > > no path forward at this time and specifically that it is already
> >> > known
> >> > > > that
> >> > > > > docValues does not necessarily solve this?
> >> > > > >
> >> > > > > Thanks in advance!
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Faceting and Grouping Performance Degradation in Solr 5

Posted by Solr User <so...@gmail.com>.
I am pleased to report that we are in Production on Solr 5.5.3 with
comparable performance to Solr 4.8.1 through leveraging facet.method=uif as
well as https://issues.apache.org/jira/browse/SOLR-9176.  Thanks to
everyone who worked on these!

On Mon, Oct 3, 2016 at 3:55 PM, Solr User <so...@gmail.com> wrote:

> Below is some further testing.  This was done in an environment that had
> no other queries or updates during testing.  We ran through several
> scenarios so I pasted this with HTML formatting below so you may view this
> as a table.  Sorry if you have to pull this out into a different file for
> viewing, but I did not want the formatting to be messed up.  The times are
> average times in milliseconds.  Same test methodology as above except there
> was a 5 minute warmup and a 15 minute test.
>
> Note that both the segment and deletions were recorded from only 1 out of
> 2 of the shards so we cannot try to extrapolate a function between them and
> the outcome.  In other words, just view them as "non-optimized" versus
> "optimized" and "has deletions" versus "no deletions".  The only exceptions
> are the 0 deletes were true for both shards and the 1 segment and 8 segment
> cases were true for both shards.  A few of the tests were repeated as well.
>
> The only conclusion that I could draw is that the number of segments and
> the number of deletes appear to greatly influence the response times, at
> least more than any difference in Solr version.  There also appears to be
> some external contributor to variance....maybe network, etc.
>
> Thoughts?
>
>
> <table><tbody><tr><td>Date</td><td>9/29/2016</td><td>9/29/
> 2016</td><td>9/29/2016</td><td>9/30/2016</td><td>9/30/
> 2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/
> 2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/
> 2016</td><td>9/30/2016</td><td>9/30/2016</td><td>10/3/
> 2016</td><td>10/3/2016</td><td>10/3/2016</td><td>10/3/2016</td></tr><tr><td>Solr
> Version</td><td>5.5.2</td><td>5.5.2</td><td>4.8.1</td><td>4.
> 8.1</td><td>4.8.1</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2<
> /td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</
> td><td>5.5.2</td><td>4.8.1</td><td>4.8.1</td><td>4.8.1</
> td><td>4.8.1</td></tr><tr><td>Deleted Docs</td><td>57873</td><td>
> 57873</td><td>176958</td><td>593694</td><td>593694</td><td>
> 57873</td><td>57873</td><td>57873</td><td>57873</td><td>0<
> /td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0<
> /td><td>0</td></tr><tr><td>Segment Count</td><td>34</td><td>34</
> td><td>18</td><td>27</td><td>27</td><td>34</td><td>34</td><
> td>34</td><td>34</td><td>8</td><td>8</td><td>1</td><td>1</
> td><td>8</td><td>8</td><td>1</td><td>1</td></tr><tr><td>
> facet.method=uif</td><td>YES</td><td>YES</td><td>N/A</td><
> td>N/A</td><td>N/A</td><td>YES</td><td>YES</td><td>NO</
> td><td>NO</td><td>NO</td><td>YES</td><td>YES</td><td>NO</
> td><td>N/A</td><td>N/A</td><td>N/A</td><td>N/A</td></tr><tr><td>Scenario
> #1</td><td>198</td><td>210</td><td>145</td><td>186</td><
> td>190</td><td>208</td><td>209</td><td>210</td><td>206</
> td><td>109</td><td>142</td><td>73</td><td>70</td><td>160</
> td><td>109</td><td>83</td><td>85</td></tr><tr><td>Scenario
> #2</td><td>92</td><td>88</td><td>59</td><td>62</td><td>58</
> td><td>72</td><td>70</td><td>77</td><td>74</td><td>68</td><
> td>73</td><td>63</td><td>61</td><td>66</td><td>54</td><td>
> 52</td><td>51</td></tr></tbody></table>
>
>
>
>
> On Wed, Sep 28, 2016 at 4:44 PM, Solr User <so...@gmail.com> wrote:
>
>> I plan to re-test this in a separate environment that I have more control
>> over and will share the results when I can.
>>
>> On Wed, Sep 28, 2016 at 3:37 PM, Solr User <so...@gmail.com> wrote:
>>
>>> Certainly.  And I would of course welcome anyone else to test this for
>>> themselves especially with facet.method=uif to see if that has indeed
>>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>>> testing is invalid due to variance, problem in process, etc.  One thing I
>>> was pondering is if I should force merge the index to a certain amount of
>>> segments because indexing yields a random number of segments and
>>> deletions.  The only thing stopping me short of doing that were
>>> observations of longer Solr 4 times even with more deletions and similar
>>> number of segments.
>>>
>>> We use Soasta as our testing tool.  Before testing, load is sent for
>>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>>> with input being pulled from data files.  The requests are repeatable test
>>> to test.
>>>
>>> The numbers posted above are average response times as reported by
>>> Soasta.  However, respective time differences are supported by Splunk which
>>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>>> JVM's.
>>>
>>> The versions are deployed to the same machines thereby overlaying the
>>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>>> of indexing all documents and then deleting any that were not touched.
>>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>>> results as the previous Solr 4 test.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
>>> wrote:
>>>
>>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>>> > Further testing indicates that any performance difference is not due
>>>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>>>> > deletes.
>>>>
>>>> Sanity check: Could you describe how you test?
>>>>
>>>> * How many queries do you issue for each test?
>>>> * Are each query a new one or do you re-use the same query?
>>>> * Do you discard the first X calls?
>>>> * Are the numbers averages, medians or something third?
>>>> * What do you do about disk cache?
>>>> * Are both Solr's on the same machine?
>>>> * Do they use the same index?
>>>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>>>
>>>> - Toke Eskildsen, State and University Library, Denmark
>>>>
>>>
>>>
>>
>

Re: Faceting and Grouping Performance Degradation in Solr 5

Posted by Solr User <so...@gmail.com>.
Below is some further testing.  This was done in an environment that had no
other queries or updates during testing.  We ran through several scenarios
so I pasted this with HTML formatting below so you may view this as a
table.  Sorry if you have to pull this out into a different file for
viewing, but I did not want the formatting to be messed up.  The times are
average times in milliseconds.  Same test methodology as above except there
was a 5 minute warmup and a 15 minute test.

Note that both the segment and deletions were recorded from only 1 out of 2
of the shards so we cannot try to extrapolate a function between them and
the outcome.  In other words, just view them as "non-optimized" versus
"optimized" and "has deletions" versus "no deletions".  The only exceptions
are the 0 deletes were true for both shards and the 1 segment and 8 segment
cases were true for both shards.  A few of the tests were repeated as well.

The only conclusion that I could draw is that the number of segments and
the number of deletes appear to greatly influence the response times, at
least more than any difference in Solr version.  There also appears to be
some external contributor to variance....maybe network, etc.

Thoughts?


<table><tbody><tr><td>Date</td><td>9/29/2016</td><td>9/29/2016</td><td>9/29/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>9/30/2016</td><td>10/3/2016</td><td>10/3/2016</td><td>10/3/2016</td><td>10/3/2016</td></tr><tr><td>Solr
Version</td><td>5.5.2</td><td>5.5.2</td><td>4.8.1</td><td>4.8.1</td><td>4.8.1</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</td><td>5.5.2</td><td>4.8.1</td><td>4.8.1</td><td>4.8.1</td><td>4.8.1</td></tr><tr><td>Deleted
Docs</td><td>57873</td><td>57873</td><td>176958</td><td>593694</td><td>593694</td><td>57873</td><td>57873</td><td>57873</td><td>57873</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>Segment
Count</td><td>34</td><td>34</td><td>18</td><td>27</td><td>27</td><td>34</td><td>34</td><td>34</td><td>34</td><td>8</td><td>8</td><td>1</td><td>1</td><td>8</td><td>8</td><td>1</td><td>1</td></tr><tr><td>facet.method=uif</td><td>YES</td><td>YES</td><td>N/A</td><td>N/A</td><td>N/A</td><td>YES</td><td>YES</td><td>NO</td><td>NO</td><td>NO</td><td>YES</td><td>YES</td><td>NO</td><td>N/A</td><td>N/A</td><td>N/A</td><td>N/A</td></tr><tr><td>Scenario
#1</td><td>198</td><td>210</td><td>145</td><td>186</td><td>190</td><td>208</td><td>209</td><td>210</td><td>206</td><td>109</td><td>142</td><td>73</td><td>70</td><td>160</td><td>109</td><td>83</td><td>85</td></tr><tr><td>Scenario
#2</td><td>92</td><td>88</td><td>59</td><td>62</td><td>58</td><td>72</td><td>70</td><td>77</td><td>74</td><td>68</td><td>73</td><td>63</td><td>61</td><td>66</td><td>54</td><td>52</td><td>51</td></tr></tbody></table>




On Wed, Sep 28, 2016 at 4:44 PM, Solr User <so...@gmail.com> wrote:

> I plan to re-test this in a separate environment that I have more control
> over and will share the results when I can.
>
> On Wed, Sep 28, 2016 at 3:37 PM, Solr User <so...@gmail.com> wrote:
>
>> Certainly.  And I would of course welcome anyone else to test this for
>> themselves especially with facet.method=uif to see if that has indeed
>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>> testing is invalid due to variance, problem in process, etc.  One thing I
>> was pondering is if I should force merge the index to a certain amount of
>> segments because indexing yields a random number of segments and
>> deletions.  The only thing stopping me short of doing that were
>> observations of longer Solr 4 times even with more deletions and similar
>> number of segments.
>>
>> We use Soasta as our testing tool.  Before testing, load is sent for
>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>> with input being pulled from data files.  The requests are repeatable test
>> to test.
>>
>> The numbers posted above are average response times as reported by
>> Soasta.  However, respective time differences are supported by Splunk which
>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>> JVM's.
>>
>> The versions are deployed to the same machines thereby overlaying the
>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>> of indexing all documents and then deleting any that were not touched.
>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>> results as the previous Solr 4 test.
>>
>>
>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
>> wrote:
>>
>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>> > Further testing indicates that any performance difference is not due
>>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>>> > deletes.
>>>
>>> Sanity check: Could you describe how you test?
>>>
>>> * How many queries do you issue for each test?
>>> * Are each query a new one or do you re-use the same query?
>>> * Do you discard the first X calls?
>>> * Are the numbers averages, medians or something third?
>>> * What do you do about disk cache?
>>> * Are both Solr's on the same machine?
>>> * Do they use the same index?
>>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>
>>
>

Re: Faceting and Grouping Performance Degradation in Solr 5

Posted by Solr User <so...@gmail.com>.
I plan to re-test this in a separate environment that I have more control
over and will share the results when I can.

On Wed, Sep 28, 2016 at 3:37 PM, Solr User <so...@gmail.com> wrote:

> Certainly.  And I would of course welcome anyone else to test this for
> themselves especially with facet.method=uif to see if that has indeed
> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
> testing is invalid due to variance, problem in process, etc.  One thing I
> was pondering is if I should force merge the index to a certain amount of
> segments because indexing yields a random number of segments and
> deletions.  The only thing stopping me short of doing that were
> observations of longer Solr 4 times even with more deletions and similar
> number of segments.
>
> We use Soasta as our testing tool.  Before testing, load is sent for 10-15
> minutes to make sure any Solr caches have stabilized.  Then the test is run
> for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
> Scenario #2 tested at 100 req/sec.  Each request is different with input
> being pulled from data files.  The requests are repeatable test to test.
>
> The numbers posted above are average response times as reported by
> Soasta.  However, respective time differences are supported by Splunk which
> indexes the Solr logs and Dynatrace which is instrumented on one of the
> JVM's.
>
> The versions are deployed to the same machines thereby overlaying the
> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
> the same input data.  Being in SolrCloud mode, the full indexing comprises
> of indexing all documents and then deleting any that were not touched.
> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
> results as the previous Solr 4 test.
>
>
> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
>
>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>> > Further testing indicates that any performance difference is not due
>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>> > deletes.
>>
>> Sanity check: Could you describe how you test?
>>
>> * How many queries do you issue for each test?
>> * Are each query a new one or do you re-use the same query?
>> * Do you discard the first X calls?
>> * Are the numbers averages, medians or something third?
>> * What do you do about disk cache?
>> * Are both Solr's on the same machine?
>> * Do they use the same index?
>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>
>

Re: Faceting and Grouping Performance Degradation in Solr 5

Posted by Solr User <so...@gmail.com>.
Certainly.  And I would of course welcome anyone else to test this for
themselves especially with facet.method=uif to see if that has indeed
bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
testing is invalid due to variance, problem in process, etc.  One thing I
was pondering is if I should force merge the index to a certain amount of
segments because indexing yields a random number of segments and
deletions.  The only thing stopping me short of doing that were
observations of longer Solr 4 times even with more deletions and similar
number of segments.

We use Soasta as our testing tool.  Before testing, load is sent for 10-15
minutes to make sure any Solr caches have stabilized.  Then the test is run
for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
Scenario #2 tested at 100 req/sec.  Each request is different with input
being pulled from data files.  The requests are repeatable test to test.

The numbers posted above are average response times as reported by Soasta.
However, respective time differences are supported by Splunk which indexes
the Solr logs and Dynatrace which is instrumented on one of the JVM's.

The versions are deployed to the same machines thereby overlaying the
previous installation.  Going Solr 4 to Solr 5, full indexing is run with
the same input data.  Being in SolrCloud mode, the full indexing comprises
of indexing all documents and then deleting any that were not touched.
Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
results as the previous Solr 4 test.


On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> > Further testing indicates that any performance difference is not due
> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> > deletes.
>
> Sanity check: Could you describe how you test?
>
> * How many queries do you issue for each test?
> * Are each query a new one or do you re-use the same query?
> * Do you discard the first X calls?
> * Are the numbers averages, medians or something third?
> * What do you do about disk cache?
> * Are both Solr's on the same machine?
> * Do they use the same index?
> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>
> - Toke Eskildsen, State and University Library, Denmark
>

Re: Faceting and Grouping Performance Degradation in Solr 5

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> Further testing indicates that any performance difference is not due
> to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> deletes.

Sanity check: Could you describe how you test?

* How many queries do you issue for each test?
* Are each query a new one or do you re-use the same query?
* Do you discard the first X calls?
* Are the numbers averages, medians or something third?
* What do you do about disk cache?
* Are both Solr's on the same machine?
* Do they use the same index?
* Do you alternate between testing 4.8.1 and 5.5.2 first?

- Toke Eskildsen, State and University Library, Denmark

Re: Faceting and Grouping Performance Degradation in Solr 5

Posted by Solr User <so...@gmail.com>.
Further testing indicates that any performance difference is not due to
deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes.
The times appear to converge on an optimized index.  Below are the
details.  Not sure what else to make of this at this point other than
moving forward with an upgrade with an optimized index wherever possible.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
4.8.1 (without deletes): 104 ms
5.5.2 (without deletes): 125 ms
4.8.1 (1 segment without deletes): 55 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
4.8.1 (without deletes): 35 ms
5.5.2 (without deletes): 42 ms
4.8.1 (1 segment without deletes): 28 ms
5.5.2 (1 segment without deletes): 34 ms

On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti <abenedetti@apache.org
> wrote:

> Hi !
> At the time we didn't investigate the deletion implication at all.
> This can be interesting.
> if you proceed with your investigations and discover what changed in the
> deletion approach, I would be more than happy to help!
>
> Cheers
>
> On Mon, Sep 26, 2016 at 10:59 PM, Solr User <so...@gmail.com> wrote:
>
> > Thanks again for your work on honoring the facet.method.  I have an
> > observation that I would like to share and get your feedback on if
> > possible.
> >
> > I performance tested Solr 5.5.2 with various facet queries and the only
> way
> > I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> > Here are the details.
> >
> > Scenario #1:  Using facet.method=uif with faceting on several
> multi-valued
> > fields.
> > 4.8.1 (with deletes): 115 ms
> > 5.5.2 (with deletes): 155 ms
> > 5.5.2 (without deletes): 125 ms
> > 5.5.2 (1 segment without deletes): 44 ms
> >
> > Scenario #2:  Using facet.method=enum with faceting on several
> multi-valued
> > fields.  These fields are different than Scenario #1 and perform much
> > better with enum hence that method is used instead.
> > 4.8.1 (with deletes): 38 ms
> > 5.5.2 (with deletes): 49 ms
> > 5.5.2 (without deletes): 42 ms
> > 5.5.2 (1 segment without deletes): 34 ms
> >
> >
> >
> > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> > abenedetti@apache.org> wrote:
> >
> > > Interesting developments :
> > >
> > > https://issues.apache.org/jira/browse/SOLR-9176
> > >
> > > I think we found why term Enum seems slower in recent Solr !
> > > In our case it is likely to be related to the commit I mention in the
> > Jira.
> > > Have a check Joel !
> > >
> > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > > abenedetti@apache.org> wrote:
> > >
> > > > I am investigating this scenario right now.
> > > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > > And I agree with Joel, it seems to be un-related with the famous
> > faceting
> > > > regression :(
> > > >
> > > > Furthermore with the legacy facet approach, if you set docValues for
> > the
> > > > field you are not going to be able to try the enum approach anymore.
> > > >
> > > > org/apache/solr/request/SimpleFacets.java:448
> > > >
> > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > > >   // only fc can handle docvalues types
> > > >   method = FacetMethod.FC;
> > > > }
> > > >
> > > >
> > > > I got really horrible regressions simply using term enum in both
> Solr 4
> > > > and Solr 6.
> > > >
> > > > And even the most optimized fcs approach with docValues and
> > > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > > >
> > > > i.e.
> > > >
> > > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > > I think we should open an issue if we can confirm it is not related
> > with
> > > > the other.
> > > > A lot of people will continue using the legacy approach for a
> while...
> > > >
> > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <joelsolr@gmail.com
> >
> > > > wrote:
> > > >
> > > >> The enum slowness is interesting. It would appear on the surface to
> > not
> > > be
> > > >> related to the FieldCache issue. I don't think the main emphasis of
> > the
> > > >> JSON facet API has been the enum approach. You may find using the
> JSON
> > > >> facet API and eliminating the use of enum meets your performance
> > needs.
> > > >>
> > > >> With the CollapsingQParserPlugin top_fc is definitely faster during
> > > >> queries. The tradeoff is slower warming times and increased memory
> > usage
> > > >> if
> > > >> the collapse fields are used in faceting, as faceting will load the
> > > field
> > > >> into a different cache.
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >> On Wed, May 18, 2016 at 5:28 PM, Solr User <so...@gmail.com>
> wrote:
> > > >>
> > > >> > Joel,
> > > >> >
> > > >> > Thank you for taking the time to respond to my question.  I tried
> > the
> > > >> JSON
> > > >> > Facet API for one query that uses facet.method=enum (since this
> one
> > > has
> > > >> a
> > > >> > ton of unique values and performed better with enum) but this was
> > way
> > > >> > slower than even the slower Solr 5 times.  I did not try the new
> API
> > > >> with
> > > >> > the non-enum queries though so I will give that a go.  It looks
> like
> > > >> Solr
> > > >> > 5.5.1 also has a facet.method=uif which will be interesting to
> try.
> > > >> >
> > > >> > If these do not prove helpful, it looks like I will need to wait
> for
> > > >> > SOLR-8096 to be resolved before upgrading.
> > > >> >
> > > >> > Thanks also for your comment on top_fc for the
> CollapsingQParser.  I
> > > use
> > > >> > collapse/expand for some queries but traditional grouping for
> others
> > > >> due to
> > > >> > performance.  It will be interesting to see if those grouping
> > queries
> > > >> > perform better now using CollapsingQParser with top_fc.
> > > >> >
> > > >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <
> > joelsolr@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Yes, SOLR-8096 is the issue here.
> > > >> > >
> > > >> > > I don't believe indexing with docValues is going to help too
> much
> > > with
> > > >> > > this. The enum slowness may not be related, but I'm not positive
> > > about
> > > >> > > that.
> > > >> > >
> > > >> > > The major slowdowns are likely due to the removal of the top
> level
> > > >> > > FieldCache from general use and the removal of the
> > FieldValuesCache
> > > >> which
> > > >> > > was used for multi-value field faceting.
> > > >> > >
> > > >> > > The JSON facet API covers all the functionality in the
> traditional
> > > >> > > faceting, and it has been developed to be very performant.
> > > >> > >
> > > >> > > You may also want to see if Collapse/Expand can meet your
> > > applications
> > > >> > > needs rather Grouping. It allows you to specify using a top
> level
> > > >> > > FieldCache if performance is a blocker without it.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Joel Bernstein
> > > >> > > http://joelsolr.blogspot.com/
> > > >> > >
> > > >> > > On Wed, May 18, 2016 at 10:42 AM, Solr User <so...@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > > Does anyone know the answer to this?
> > > >> > > >
> > > >> > > > On Wed, May 4, 2016 at 2:19 PM, Solr User <so...@gmail.com>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr
> > > 5.4.1
> > > >> > but
> > > >> > > > had
> > > >> > > > > to abort due to average response times degraded from a
> > baseline
> > > >> > volume
> > > >> > > > > performance test.  The affected queries involved faceting
> > (both
> > > >> enum
> > > >> > > > method
> > > >> > > > > and default) and grouping.  There is a critical bug
> > > >> > > > > https://issues.apache.org/jira/browse/SOLR-8096 currently
> > open
> > > >> > which I
> > > >> > > > > gather is the cause of the slower response times.  One
> > concern I
> > > >> have
> > > >> > > is
> > > >> > > > > that discussions around the issue offer the suggestion of
> > > indexing
> > > >> > with
> > > >> > > > > docValues which alleviated the problem in at least that one
> > > >> reported
> > > >> > > > case.
> > > >> > > > > However, indexing with docValues did not improve the
> > performance
> > > >> in
> > > >> > my
> > > >> > > > case.
> > > >> > > > >
> > > >> > > > > Can someone please confirm or correct my understanding that
> > this
> > > >> > issue
> > > >> > > > has
> > > >> > > > > no path forward at this time and specifically that it is
> > already
> > > >> > known
> > > >> > > > that
> > > >> > > > > docValues does not necessarily solve this?
> > > >> > > > >
> > > >> > > > > Thanks in advance!
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > --------------------------
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Faceting and Grouping Performance Degradation in Solr 5

Posted by Alessandro Benedetti <ab...@apache.org>.
Hi !
At the time we didn't investigate the deletion implication at all.
This can be interesting.
if you proceed with your investigations and discover what changed in the
deletion approach, I would be more than happy to help!

Cheers

On Mon, Sep 26, 2016 at 10:59 PM, Solr User <so...@gmail.com> wrote:

> Thanks again for your work on honoring the facet.method.  I have an
> observation that I would like to share and get your feedback on if
> possible.
>
> I performance tested Solr 5.5.2 with various facet queries and the only way
> I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> Here are the details.
>
> Scenario #1:  Using facet.method=uif with faceting on several multi-valued
> fields.
> 4.8.1 (with deletes): 115 ms
> 5.5.2 (with deletes): 155 ms
> 5.5.2 (without deletes): 125 ms
> 5.5.2 (1 segment without deletes): 44 ms
>
> Scenario #2:  Using facet.method=enum with faceting on several multi-valued
> fields.  These fields are different than Scenario #1 and perform much
> better with enum hence that method is used instead.
> 4.8.1 (with deletes): 38 ms
> 5.5.2 (with deletes): 49 ms
> 5.5.2 (without deletes): 42 ms
> 5.5.2 (1 segment without deletes): 34 ms
>
>
>
> On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> abenedetti@apache.org> wrote:
>
> > Interesting developments :
> >
> > https://issues.apache.org/jira/browse/SOLR-9176
> >
> > I think we found why term Enum seems slower in recent Solr !
> > In our case it is likely to be related to the commit I mention in the
> Jira.
> > Have a check Joel !
> >
> > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > abenedetti@apache.org> wrote:
> >
> > > I am investigating this scenario right now.
> > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > And I agree with Joel, it seems to be un-related with the famous
> faceting
> > > regression :(
> > >
> > > Furthermore with the legacy facet approach, if you set docValues for
> the
> > > field you are not going to be able to try the enum approach anymore.
> > >
> > > org/apache/solr/request/SimpleFacets.java:448
> > >
> > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > >   // only fc can handle docvalues types
> > >   method = FacetMethod.FC;
> > > }
> > >
> > >
> > > I got really horrible regressions simply using term enum in both Solr 4
> > > and Solr 6.
> > >
> > > And even the most optimized fcs approach with docValues and
> > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > >
> > > i.e.
> > >
> > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > I think we should open an issue if we can confirm it is not related
> with
> > > the other.
> > > A lot of people will continue using the legacy approach for a while...
> > >
> > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <jo...@gmail.com>
> > > wrote:
> > >
> > >> The enum slowness is interesting. It would appear on the surface to
> not
> > be
> > >> related to the FieldCache issue. I don't think the main emphasis of
> the
> > >> JSON facet API has been the enum approach. You may find using the JSON
> > >> facet API and eliminating the use of enum meets your performance
> needs.
> > >>
> > >> With the CollapsingQParserPlugin top_fc is definitely faster during
> > >> queries. The tradeoff is slower warming times and increased memory
> usage
> > >> if
> > >> the collapse fields are used in faceting, as faceting will load the
> > field
> > >> into a different cache.
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Wed, May 18, 2016 at 5:28 PM, Solr User <so...@gmail.com> wrote:
> > >>
> > >> > Joel,
> > >> >
> > >> > Thank you for taking the time to respond to my question.  I tried
> the
> > >> JSON
> > >> > Facet API for one query that uses facet.method=enum (since this one
> > has
> > >> a
> > >> > ton of unique values and performed better with enum) but this was
> way
> > >> > slower than even the slower Solr 5 times.  I did not try the new API
> > >> with
> > >> > the non-enum queries though so I will give that a go.  It looks like
> > >> Solr
> > >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> > >> >
> > >> > If these do not prove helpful, it looks like I will need to wait for
> > >> > SOLR-8096 to be resolved before upgrading.
> > >> >
> > >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> > use
> > >> > collapse/expand for some queries but traditional grouping for others
> > >> due to
> > >> > performance.  It will be interesting to see if those grouping
> queries
> > >> > perform better now using CollapsingQParser with top_fc.
> > >> >
> > >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <
> joelsolr@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Yes, SOLR-8096 is the issue here.
> > >> > >
> > >> > > I don't believe indexing with docValues is going to help too much
> > with
> > >> > > this. The enum slowness may not be related, but I'm not positive
> > about
> > >> > > that.
> > >> > >
> > >> > > The major slowdowns are likely due to the removal of the top level
> > >> > > FieldCache from general use and the removal of the
> FieldValuesCache
> > >> which
> > >> > > was used for multi-value field faceting.
> > >> > >
> > >> > > The JSON facet API covers all the functionality in the traditional
> > >> > > faceting, and it has been developed to be very performant.
> > >> > >
> > >> > > You may also want to see if Collapse/Expand can meet your
> > applications
> > >> > > needs rather Grouping. It allows you to specify using a top level
> > >> > > FieldCache if performance is a blocker without it.
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > Joel Bernstein
> > >> > > http://joelsolr.blogspot.com/
> > >> > >
> > >> > > On Wed, May 18, 2016 at 10:42 AM, Solr User <so...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > Does anyone know the answer to this?
> > >> > > >
> > >> > > > On Wed, May 4, 2016 at 2:19 PM, Solr User <so...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr
> > 5.4.1
> > >> > but
> > >> > > > had
> > >> > > > > to abort due to average response times degraded from a
> baseline
> > >> > volume
> > >> > > > > performance test.  The affected queries involved faceting
> (both
> > >> enum
> > >> > > > method
> > >> > > > > and default) and grouping.  There is a critical bug
> > >> > > > > https://issues.apache.org/jira/browse/SOLR-8096 currently
> open
> > >> > which I
> > >> > > > > gather is the cause of the slower response times.  One
> concern I
> > >> have
> > >> > > is
> > >> > > > > that discussions around the issue offer the suggestion of
> > indexing
> > >> > with
> > >> > > > > docValues which alleviated the problem in at least that one
> > >> reported
> > >> > > > case.
> > >> > > > > However, indexing with docValues did not improve the
> performance
> > >> in
> > >> > my
> > >> > > > case.
> > >> > > > >
> > >> > > > > Can someone please confirm or correct my understanding that
> this
> > >> > issue
> > >> > > > has
> > >> > > > > no path forward at this time and specifically that it is
> already
> > >> > known
> > >> > > > that
> > >> > > > > docValues does not necessarily solve this?
> > >> > > > >
> > >> > > > > Thanks in advance!
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England