You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2018/09/07 02:49:00 UTC

[jira] [Commented] (SOLR-12642) SolrCmdDistributor should send updates in batch when use Http2SolrClient?

    [ https://issues.apache.org/jira/browse/SOLR-12642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606648#comment-16606648 ] 

Cao Manh Dat commented on SOLR-12642:
-------------------------------------

Hi guys, thanks to [~shalinmangar] works on [https://github.com/shalinmangar/solr-perf-tools]. I was able to test the performance between jira/http2 branch and master branch. The log results are attached. But I will summary it here.

There are 4 tests, all of them are testing the perfomance of indexing. Only the last test shows difference between branches since it is the only test using SolrCloud setup.

The 4th test using CloudSolrClient to index 33M wiki documents on a collection with one shard with 1 leader and 1 NRT replica.

 
|Documents indexed: 33332620|
|Bytes indexed: 32244883917.0|
| |*jira/http2 branch*|*master branch*|
|Time taken (total) in sec|1,572.90|2415.1|
|Garbage generated by replica node (in MB)|266,847.40|1,131,187.50|
|Garbage generated by leader node (in MB)|1,006,244.00|1,351,830.70|
|Time in GC for replica (ms)|13.3|90.9|
|Time in GC for leader (ms)|88.2|99|
|Average System Load|10.157|13.525|
|Average CPU Time of replica node (800 total)|78.812|332.467|
|Average CPU Time of leader node (800 total)|513.968|369.281|
|Average CPU Load of replica node (%)|10.657|41.28|
|Average CPU Load of leader node (%)|64.048|46.359|

Note: 800 in CPU time means, the total power of 8 threads per second

As we can see the significant improvement on jira/http2 branch. The only downside here is CPU Time seems increased by 40% on leader node. I think that by solving this issue the CPU Time will decrease in leader node, but I'm not sure how much it will decrease. May be the CPU increased because the rate of indexing documents in master is much faster in jira/http2 branch. Furthermore I tried to do this issue but it is quite complex and hidden errors can happen. 

*Therefore I think that this issue is not a must for jira/http2 for merging into master branch.*

> SolrCmdDistributor should send updates in batch when use Http2SolrClient?
> -------------------------------------------------------------------------
>
>                 Key: SOLR-12642
>                 URL: https://issues.apache.org/jira/browse/SOLR-12642
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Priority: Major
>
> In the past, batch updates are sent in a single stream from the leader, the replica will create a single thread to parse all the updates. For the simplicity of {{SOLR-12605}}, the leader is now sending individual updates to replicas, therefore they are now parsing updates in different threads which increase the usage of memory and CPU.
> In the past, this is an unacceptable approach, because, for every update, we must create different connections to replicas. But with the support of HTTP/2, all updates will be sent in a single connection from leader to a replica. Therefore the cost is not as high as it used to be.
> On the other hand, sending individual updates will improve the indexing performance and better error-handling for failures of a single update in a batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org