You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Ishan Chattopadhyaya (Jira)" <ji...@apache.org> on 2022/11/09 18:25:00 UTC
[jira] [Comment Edited] (SOLR-16531) Performance degradation due to introduction of JAX-RS

    [ https://issues.apache.org/jira/browse/SOLR-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631201#comment-17631201 ] 

Ishan Chattopadhyaya edited comment on SOLR-16531 at 11/9/22 6:24 PM:
----------------------------------------------------------------------

bq.    So the screenshot and associated webpage show that the red line jumps from a little over 325-ish on the prior commit (3ceae7 - "Pin OS of docker image...") to a little over 350-ish with the JAX-RS commit. (Do you have access to the specific numbers there, Ishan?). So the delta of whatever this test is doing is ~25s.

Yes, roughly. Attached the raw results above.

bq.    The performance test involves two tasks. According to the cluster-test.json file linked above, the first task involves collection creation (solely), and the second task is a restart of each node after all the collection creation is done.

Correct.

bq.    Again, going from cluster-test.json, it looks like task 1 creates 1000 collections, but doesn't specify how many shards or replicas each collection has? Does that mean 1s, 1r, or are there other defaults?

There's a file that contains all the collections: https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json#L5

bq.    Going from cluster-test.json, the cluster either has 8 or 7 nodes (not sure how to understand/reconcile the properties here and here.).

There are 8 nodes, but restarting just 7 of them (to avoid restarting the overseer node).

bq.    Assuming 8 nodes total going forward, each node would host roughly 1000 * replicasPerShard * shardsPerCollection / 8 cores. Or 125 * replicasPerShard * shardsPerCollection.

I remember there being around 750 cores/node in this test. It can be calculated from that cluster state file I linked to above.

bq.    Now, "task 2" itself involves restarting these loaded nodes 2 at a time and waiting for everything to be healthy between batches of restarts. If each node is restarted once, that means "task 2" would kick off restarts 4 times (again, doing 2 in parallel each time).

Right.

bq.    So the "before" performance of ~325s translates to a restart of a node with 125 * replicasPerShard * shardsPerCollection total cores taking about 81s...

I'm not sure I follow. Here's how I think about this. 750 replicas/node, total time 325s, so approx around 81s to restart a node with ~750 replicas (assuming one more node is being restarted at the same time).

bq.    And the "after" performance of ~350s translates to a similar restart now taking about 87s

Yes.

bq.    So, ultimately, this perf test is telling us that JAX-RS makes restarts of heavily loaded nodes take ~7-8% longer i.e. (87.5 - 81.25)/81.25

Assuming the 325 and 350 as correct, this seems right to me.


was (Author: ichattopadhyaya):
bq.    So the screenshot and associated webpage show that the red line jumps from a little over 325-ish on the prior commit (3ceae7 - "Pin OS of docker image...") to a little over 350-ish with the JAX-RS commit. (Do you have access to the specific numbers there, Ishan?). So the delta of whatever this test is doing is ~25s.

Yes, roughly. Attached the raw results above.

bq.    The performance test involves two tasks. According to the cluster-test.json file linked above, the first task involves collection creation (solely), and the second task is a restart of each node after all the collection creation is done.

Correct.

bq.    Again, going from cluster-test.json, it looks like task 1 creates 1000 collections, but doesn't specify how many shards or replicas each collection has? Does that mean 1s, 1r, or are there other defaults?

There's a file that contains all the collections: https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json#L5

bq.    Going from cluster-test.json, the cluster either has 8 or 7 nodes (not sure how to understand/reconcile the properties here and here.).

There are 8 nodes, but restarting just 7 of them (to avoid restarting the overseer node).

bq.    Assuming 8 nodes total going forward, each node would host roughly 1000 * replicasPerShard * shardsPerCollection / 8 cores. Or 125 * replicasPerShard * shardsPerCollection.

I remember there being around 750 cores/node in this test. It can be calculated from that cluster state file I linked to above.

bq.    Now, "task 2" itself involves restarting these loaded nodes 2 at a time and waiting for everything to be healthy between batches of restarts. If each node is restarted once, that means "task 2" would kick off restarts 4 times (again, doing 2 in parallel each time).

Right.

bq.    So the "before" performance of ~325s translates to a restart of a node with 125 * replicasPerShard * shardsPerCollection total cores taking about 81s...

I'm not sure I follow. Here's how I think about this. 750 replicas/node, total time 325s, so approx around 81s to restart a node with ~750 replicas (assuming one more node is being restarted at the same time).

    And the "after" performance of ~350s translates to a similar restart now taking about 87s

Yes.

    So, ultimately, this perf test is telling us that JAX-RS makes restarts of heavily loaded nodes take ~7-8% longer i.e. (87.5 - 81.25)/81.25

Assuming the 325 and 350 as correct, this seems right to me.

> Performance degradation due to introduction of JAX-RS
> -----------------------------------------------------
>
>                 Key: SOLR-16531
>                 URL: https://issues.apache.org/jira/browse/SOLR-16531
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: 9.2
>
>         Attachments: Screenshot from 2022-11-09 11-20-44.png, results-with-patch.tar.gz
>
>
> During performance benchmarking on branch_9x, I observed a slowdown in restart performance since commits in SOLR-16347. See attached screenshot.
> CC [~gerlowskija].
> http://mostly.cool/cluster-test-with-patch.html
> The benchmark is here: https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json. This suite was run after retro-actively applying the parallelStream patch from SOLR-16414: https://github.com/apache/solr/commit/b33161d0cdd976fc0c3dc78c4afafceb4db671cf.diff 
> Effort to automate these benchmarks is WIP and tracked here: SOLR-16525.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org