You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Ishan Chattopadhyaya <ic...@gmail.com> on 2022/11/09 06:39:11 UTC

Continuous performance testing and regressions

I'm working on automating performance testing, details in
https://issues.apache.org/jira/browse/SOLR-16525.

Even before I could complete the automation, I observed massive slowdown in
restart performance, now attributable to
https://issues.apache.org/jira/browse/SOLR-16414. This affected 9.1 release
candidate RC1, but is now fixed in 9.1 and 9x branches.

However, while performance was back to original levels on 9.1 branch, there
was a 80-100% slowdown on the 9x branch even after this fix.
Please see: http://mostly.cool/cluster-test.json.html
The test is here:
https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json

In order to investigate the slowdown, I retroactively applied the patch
that fixed the performance problem in SOLR-16414 (removing use of
parallelStream) to the intermediate commits and plotted the graph:
http://mostly.cool/cluster-test-with-patch.html

And now, two more commits with potential slowdowns are observed. Here are
the JIRA issues I've opened for both:
https://issues.apache.org/jira/browse/SOLR-16530
https://issues.apache.org/jira/browse/SOLR-16531

In a week of working on this automation, I was able to catch 3 slowdowns on
the first thing I automated. It might be good to keep this running and test
other aspects. Going forward, I'll be automating more performance suites
and open blocker JIRA issues on significant performance degradation,
whenever observed. I'll make it easy for all of us to add suites to the
framework and have their personal branches/PRs tested through this.

Please let me know about any thoughts / concerns / suggestions.

Thanks,
Ishan

Re: Continuous performance testing and regressions

Posted by Noble Paul <no...@gmail.com>.
Hi,
I have reverted the changes that caused the perf degradations


   1. SOLR-16530 <https://issues.apache.org/jira/browse/SOLR-16530>Performance
   degradation due to eliminating noggit Writeable
   I'll investigate further and test and introduce this later

thanks @Ishan Chattopadhyaya <ic...@gmail.com>  for
discovering and reporting this

On Thu, Nov 10, 2022 at 7:31 AM David Smiley <ds...@apache.org> wrote:

> A big THANK YOU for working on this Ishan!
>
> Can we add regular benchmarks based on /solr/benchmark that Mark
> contributed not too long ago?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Nov 9, 2022 at 1:39 AM Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
> > I'm working on automating performance testing, details in
> > https://issues.apache.org/jira/browse/SOLR-16525.
> >
> > Even before I could complete the automation, I observed massive slowdown
> in
> > restart performance, now attributable to
> > https://issues.apache.org/jira/browse/SOLR-16414. This affected 9.1
> > release
> > candidate RC1, but is now fixed in 9.1 and 9x branches.
> >
> > However, while performance was back to original levels on 9.1 branch,
> there
> > was a 80-100% slowdown on the 9x branch even after this fix.
> > Please see: http://mostly.cool/cluster-test.json.html
> > The test is here:
> >
> >
> https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json
> >
> > In order to investigate the slowdown, I retroactively applied the patch
> > that fixed the performance problem in SOLR-16414 (removing use of
> > parallelStream) to the intermediate commits and plotted the graph:
> > http://mostly.cool/cluster-test-with-patch.html
> >
> > And now, two more commits with potential slowdowns are observed. Here are
> > the JIRA issues I've opened for both:
> > https://issues.apache.org/jira/browse/SOLR-16530
> > https://issues.apache.org/jira/browse/SOLR-16531
> >
> > In a week of working on this automation, I was able to catch 3 slowdowns
> on
> > the first thing I automated. It might be good to keep this running and
> test
> > other aspects. Going forward, I'll be automating more performance suites
> > and open blocker JIRA issues on significant performance degradation,
> > whenever observed. I'll make it easy for all of us to add suites to the
> > framework and have their personal branches/PRs tested through this.
> >
> > Please let me know about any thoughts / concerns / suggestions.
> >
> > Thanks,
> > Ishan
> >
>


-- 
-----------------------------------------------------
Noble Paul

Re: Continuous performance testing and regressions

Posted by David Smiley <ds...@apache.org>.
A big THANK YOU for working on this Ishan!

Can we add regular benchmarks based on /solr/benchmark that Mark
contributed not too long ago?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Nov 9, 2022 at 1:39 AM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> I'm working on automating performance testing, details in
> https://issues.apache.org/jira/browse/SOLR-16525.
>
> Even before I could complete the automation, I observed massive slowdown in
> restart performance, now attributable to
> https://issues.apache.org/jira/browse/SOLR-16414. This affected 9.1
> release
> candidate RC1, but is now fixed in 9.1 and 9x branches.
>
> However, while performance was back to original levels on 9.1 branch, there
> was a 80-100% slowdown on the 9x branch even after this fix.
> Please see: http://mostly.cool/cluster-test.json.html
> The test is here:
>
> https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json
>
> In order to investigate the slowdown, I retroactively applied the patch
> that fixed the performance problem in SOLR-16414 (removing use of
> parallelStream) to the intermediate commits and plotted the graph:
> http://mostly.cool/cluster-test-with-patch.html
>
> And now, two more commits with potential slowdowns are observed. Here are
> the JIRA issues I've opened for both:
> https://issues.apache.org/jira/browse/SOLR-16530
> https://issues.apache.org/jira/browse/SOLR-16531
>
> In a week of working on this automation, I was able to catch 3 slowdowns on
> the first thing I automated. It might be good to keep this running and test
> other aspects. Going forward, I'll be automating more performance suites
> and open blocker JIRA issues on significant performance degradation,
> whenever observed. I'll make it easy for all of us to add suites to the
> framework and have their personal branches/PRs tested through this.
>
> Please let me know about any thoughts / concerns / suggestions.
>
> Thanks,
> Ishan
>

Re: Continuous performance testing and regressions

Posted by Noble Paul <no...@gmail.com>.
I'm glad that the performance regressions are reported before it made it to
a release and it did not cause our users to suffer.

thanks

On Wed, Nov 9, 2022 at 5:39 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> I'm working on automating performance testing, details in
> https://issues.apache.org/jira/browse/SOLR-16525.
>
> Even before I could complete the automation, I observed massive slowdown in
> restart performance, now attributable to
> https://issues.apache.org/jira/browse/SOLR-16414. This affected 9.1
> release
> candidate RC1, but is now fixed in 9.1 and 9x branches.
>
> However, while performance was back to original levels on 9.1 branch, there
> was a 80-100% slowdown on the 9x branch even after this fix.
> Please see: http://mostly.cool/cluster-test.json.html
> The test is here:
>
> https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json
>
> In order to investigate the slowdown, I retroactively applied the patch
> that fixed the performance problem in SOLR-16414 (removing use of
> parallelStream) to the intermediate commits and plotted the graph:
> http://mostly.cool/cluster-test-with-patch.html
>
> And now, two more commits with potential slowdowns are observed. Here are
> the JIRA issues I've opened for both:
> https://issues.apache.org/jira/browse/SOLR-16530
> https://issues.apache.org/jira/browse/SOLR-16531
>
> In a week of working on this automation, I was able to catch 3 slowdowns on
> the first thing I automated. It might be good to keep this running and test
> other aspects. Going forward, I'll be automating more performance suites
> and open blocker JIRA issues on significant performance degradation,
> whenever observed. I'll make it easy for all of us to add suites to the
> framework and have their personal branches/PRs tested through this.
>
> Please let me know about any thoughts / concerns / suggestions.
>
> Thanks,
> Ishan
>


-- 
-----------------------------------------------------
Noble Paul

Re: Continuous performance testing and regressions

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
Hi Jan,

> Can you explain what "restart performance" means, and what the y axis
number is?
Yes, that refers to the time it took to restart 7 nodes, 2 at a time, while
waiting for all replicas on those nodes to be active before proceeding to
the next batch (of 2).
https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json#L33-L47

The number on the y-axis is the total time it took for the entire operation
(of restarting those 7 nodes).
https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/createGraph.py#L25-L31
Here's a sample results file from a run:
{task1=[{start-time=0.175, total-time=238334, end-time=238.509}],
task2=[{start-time=238.523, total-time=109.801, end-time=348.324},
{start-time=238.523, total-time=113.048, end-time=351.571},
{start-time=348.324, total-time=122.317, end-time=470.641},
{start-time=351.572, total-time=130.111, end-time=481.683},
{start-time=470.642, total-time=106.068, end-time=576.71},
{start-time=481.683, total-time=107.827, end-time=589.51},
{start-time=576.711, total-time=89.179, end-time=665.89}]}

Thanks,
Ishan

On Wed, Nov 9, 2022 at 6:27 PM Jan Høydahl <ja...@cominvent.com> wrote:

> Thanks for putting this together Ishan,
>
> Can you explain what "restart performance" means, and what the y axis
> number is?
>
> Jan
>
> > 9. nov. 2022 kl. 07:39 skrev Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com>:
> >
> > I'm working on automating performance testing, details in
> > https://issues.apache.org/jira/browse/SOLR-16525.
> >
> > Even before I could complete the automation, I observed massive slowdown
> in
> > restart performance, now attributable to
> > https://issues.apache.org/jira/browse/SOLR-16414. This affected 9.1
> release
> > candidate RC1, but is now fixed in 9.1 and 9x branches.
> >
> > However, while performance was back to original levels on 9.1 branch,
> there
> > was a 80-100% slowdown on the 9x branch even after this fix.
> > Please see: http://mostly.cool/cluster-test.json.html
> > The test is here:
> >
> https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json
> >
> > In order to investigate the slowdown, I retroactively applied the patch
> > that fixed the performance problem in SOLR-16414 (removing use of
> > parallelStream) to the intermediate commits and plotted the graph:
> > http://mostly.cool/cluster-test-with-patch.html
> >
> > And now, two more commits with potential slowdowns are observed. Here are
> > the JIRA issues I've opened for both:
> > https://issues.apache.org/jira/browse/SOLR-16530
> > https://issues.apache.org/jira/browse/SOLR-16531
> >
> > In a week of working on this automation, I was able to catch 3 slowdowns
> on
> > the first thing I automated. It might be good to keep this running and
> test
> > other aspects. Going forward, I'll be automating more performance suites
> > and open blocker JIRA issues on significant performance degradation,
> > whenever observed. I'll make it easy for all of us to add suites to the
> > framework and have their personal branches/PRs tested through this.
> >
> > Please let me know about any thoughts / concerns / suggestions.
> >
> > Thanks,
> > Ishan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: Continuous performance testing and regressions

Posted by Jan Høydahl <ja...@cominvent.com>.
Thanks for putting this together Ishan,

Can you explain what "restart performance" means, and what the y axis number is?

Jan

> 9. nov. 2022 kl. 07:39 skrev Ishan Chattopadhyaya <ic...@gmail.com>:
> 
> I'm working on automating performance testing, details in
> https://issues.apache.org/jira/browse/SOLR-16525.
> 
> Even before I could complete the automation, I observed massive slowdown in
> restart performance, now attributable to
> https://issues.apache.org/jira/browse/SOLR-16414. This affected 9.1 release
> candidate RC1, but is now fixed in 9.1 and 9x branches.
> 
> However, while performance was back to original levels on 9.1 branch, there
> was a 80-100% slowdown on the 9x branch even after this fix.
> Please see: http://mostly.cool/cluster-test.json.html
> The test is here:
> https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json
> 
> In order to investigate the slowdown, I retroactively applied the patch
> that fixed the performance problem in SOLR-16414 (removing use of
> parallelStream) to the intermediate commits and plotted the graph:
> http://mostly.cool/cluster-test-with-patch.html
> 
> And now, two more commits with potential slowdowns are observed. Here are
> the JIRA issues I've opened for both:
> https://issues.apache.org/jira/browse/SOLR-16530
> https://issues.apache.org/jira/browse/SOLR-16531
> 
> In a week of working on this automation, I was able to catch 3 slowdowns on
> the first thing I automated. It might be good to keep this running and test
> other aspects. Going forward, I'll be automating more performance suites
> and open blocker JIRA issues on significant performance degradation,
> whenever observed. I'll make it easy for all of us to add suites to the
> framework and have their personal branches/PRs tested through this.
> 
> Please let me know about any thoughts / concerns / suggestions.
> 
> Thanks,
> Ishan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org