You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/12/02 08:48:04 UTC

[GitHub] [kafka] Montyleo opened a new pull request #9675: fix Replica leader election is too slow in the case of too many parti…

Montyleo opened a new pull request #9675:
URL: https://github.com/apache/kafka/pull/9675


   There is more than 6000 topics and 300 brokers in my kafka cluster, and we frequently run kafka-preferred-replica-election.sh to rebalance our cluster. But the reblance process spendes too more time and cpu resource like the picture blow.
   
   We find that the function:'controllerContext.allPartitions' is invoked too many times.
   Thr jira link is https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-10794?filter=allissues
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] chia7712 commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
chia7712 commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-738140790


   @lqjack good question!
   
   >  I find the only differences is that controllerContext.allPartitions can be invoked once or the number of partition times .
   
   ```controllerContext.allPartitions``` does not return a constant value. It create a new collection and the overhead could be high if there are a lot of partitions. This PR makes ```controllerContext.allPartitions``` be called only once to reduce the cost of getting "all partitions".
   
   > does the patch can resolve the issue ?
   
   @Montyleo It seems to me the optimization of this PR is good enough. However, it would be better to show the improvement on your env by this patch.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] chia7712 commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
chia7712 commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737244340


   > I'll find the reason.
   
   Is there a existent ticket? If not, could you file a jira to log it? Also, you can assign the ticket to yourself ( I have given the permission to you) if you have free cycle to trace it.
   
   I will merge this PR tomorrow if no objection.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] chia7712 commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
chia7712 commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737189029


   @Montyleo Is the failed test related to this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Montyleo commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
Montyleo commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737187007


   @huxihx   Please help me review the code, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Montyleo commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
Montyleo commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737185013


   > @Montyleo nice finding. LGTM
   
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] chia7712 merged pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
chia7712 merged pull request #9675:
URL: https://github.com/apache/kafka/pull/9675


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Montyleo commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
Montyleo commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-747942928


   > @chia7712 does the patch can resolve the issue ? I find the only differences is that controllerContext.allPartitions can be invoked once or the number of partition times . please correct me if I am wrong. thanks.
   
   Hi,lqjack
   
   Thanks for your question.
   There is a saying that: quantitative change leads to qualitative change.
   when  the function controllerContext.allPartitions was called too many time, the rebalance will become too slow.
   I'll show you the effect after the PR published,1.3ms VS 35541ms
   
   ![clipboard_image_1608279734070](https://user-images.githubusercontent.com/8037560/102591724-723fc080-414d-11eb-84ed-c3e1ca704f8c.png)
   
   ![image](https://user-images.githubusercontent.com/8037560/102591822-9ac7ba80-414d-11eb-9a15-e01b6b261892.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Montyleo commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
Montyleo commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737283188


   > > I'll find the reason.
   > 
   > Is there a existent ticket? If not, could you file a jira to log it? Also, you can assign the ticket to yourself ( I have given the permission to you) if you have free cycle to trace it.
   > 
   > I will merge this PR tomorrow if no objection.
   
   Ok,I have no objection. I have created a jira to log it, [https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-10797?filter=allissues]. I'll trace it in my local environment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] lqjack commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
lqjack commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737874013


   @chia7712  does the patch can resolve the issue ? I find the only differences is that controllerContext.allPartitions can be invoked once or the number of partition times . please correct me if I am wrong. thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Montyleo commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
Montyleo commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737199358


   > @Montyleo Is the failed test related to this PR?
   Hi, chia7712 
     Thanks for your reply. The  failed test is about SaslAuthenticator, not related to this PR, even not related the component:kafkacontroller.  It seems that the jdk 8 version is too low. I'll find the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] chia7712 commented on pull request #9675: KAFKA-10794 Replica leader election is too slow in the case of too many partitions

Posted by GitBox <gi...@apache.org>.
chia7712 commented on pull request #9675:
URL: https://github.com/apache/kafka/pull/9675#issuecomment-737624669


   @Montyleo Thanks for your contribution!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org