You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/09/06 00:51:01 UTC

[jira] [Commented] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

    [ https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154646#comment-16154646 ] 

Todd Lipcon commented on KUDU-2086:
-----------------------------------

[~sailesh] and I chatted about this a bit this afternoon by IM. I don't think it's an issue with the hash code -- even with a "perfect" hash code (ie exactly random) we are likely to see skew.

The reason here is that we are defining skew as max(# connections in a reactor) / average(# connections in a reactor). The "# connections in a reactor" variable has a binomial distribution. If you sample a bunch of times from a binomial distribution and take the max over those samples, that max is likely to be much higher than the mean (see "order statistics" on wikipedia for more details).

I ran a simple Python simulation as well:

{code}
import numpy as np
import pandas as pd
import random
from collections import Counter

num_reactors = 24
num_nodes = 100
num_trials = 5000

trial_results = []
for trial in xrange(num_trials):
  assignments = [random.randint(0, num_reactors) for x in xrange(num_nodes)]
  reactor_counts = Counter(assignments).values()
  worst_to_avg = max(reactor_counts) / np.average(reactor_counts)
  trial_results.append(worst_to_avg)

pd.Series(trial_results).hist(bins=40)
{code}

which runs a lot of simulated trials with a perfect hash function and plots the distribution of observed skew (max/mean). The resulting distribution looks like:

!https://ibin.co/3ZOmzYwLIzeq.png!

ie most of the time, we expect to see a skew around 2x, which more or less matches what we see experimentally in the Impala use case.

So, if we want to reduce skew, we need to do explicit assignment/balancing rather than random stateless assignment using hashes.

> Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput
> -----------------------------------------------------------------------------------------------
>
>                 Key: KUDU-2086
>                 URL: https://issues.apache.org/jira/browse/KUDU-2086
>             Project: Kudu
>          Issue Type: Bug
>          Components: rpc
>    Affects Versions: 1.4.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Michael Ho
>
> Uneven assignment of connections to Reactor threads causes a couple of reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?        00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?        00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?        00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?        00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?        00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?        00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?        00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?        00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?        00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?        00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?        00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?        00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?        00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?        00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?        00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?        00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?        00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?        00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?        00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?        00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?        00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?        00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?        00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?        00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?        00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?        00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?        00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?        00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?        00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)