You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/09/06 00:51:01 UTC
[jira] [Commented] (KUDU-2086) Uneven assignment of connections to
Reactor threads creates skew and limits transfer throughput
[ https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154646#comment-16154646 ]
Todd Lipcon commented on KUDU-2086:
-----------------------------------
[~sailesh] and I chatted about this a bit this afternoon by IM. I don't think it's an issue with the hash code -- even with a "perfect" hash code (ie exactly random) we are likely to see skew.
The reason here is that we are defining skew as max(# connections in a reactor) / average(# connections in a reactor). The "# connections in a reactor" variable has a binomial distribution. If you sample a bunch of times from a binomial distribution and take the max over those samples, that max is likely to be much higher than the mean (see "order statistics" on wikipedia for more details).
I ran a simple Python simulation as well:
{code}
import numpy as np
import pandas as pd
import random
from collections import Counter
num_reactors = 24
num_nodes = 100
num_trials = 5000
trial_results = []
for trial in xrange(num_trials):
assignments = [random.randint(0, num_reactors) for x in xrange(num_nodes)]
reactor_counts = Counter(assignments).values()
worst_to_avg = max(reactor_counts) / np.average(reactor_counts)
trial_results.append(worst_to_avg)
pd.Series(trial_results).hist(bins=40)
{code}
which runs a lot of simulated trials with a perfect hash function and plots the distribution of observed skew (max/mean). The resulting distribution looks like:
!https://ibin.co/3ZOmzYwLIzeq.png!
ie most of the time, we expect to see a skew around 2x, which more or less matches what we see experimentally in the Impala use case.
So, if we want to reduce skew, we need to do explicit assignment/balancing rather than random stateless assignment using hashes.
> Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput
> -----------------------------------------------------------------------------------------------
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
> Issue Type: Bug
> Components: rpc
> Affects Versions: 1.4.0
> Reporter: Mostafa Mokhtar
> Assignee: Michael Ho
>
> Uneven assignment of connections to Reactor threads causes a couple of reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc | grep -v "00:00" | awk '{print $4,$0}' | sort
> 00:03:17 69387 69596 ? 00:03:17 rpc reactor-695
> 00:03:20 69387 69632 ? 00:03:20 rpc reactor-696
> 00:03:21 69387 69607 ? 00:03:21 rpc reactor-696
> 00:03:25 69387 69629 ? 00:03:25 rpc reactor-696
> 00:03:26 69387 69594 ? 00:03:26 rpc reactor-695
> 00:03:34 69387 69595 ? 00:03:34 rpc reactor-695
> 00:03:35 69387 69625 ? 00:03:35 rpc reactor-696
> 00:03:38 69387 69570 ? 00:03:38 rpc reactor-695
> 00:03:38 69387 69620 ? 00:03:38 rpc reactor-696
> 00:03:47 69387 69639 ? 00:03:47 rpc reactor-696
> 00:03:48 69387 69593 ? 00:03:48 rpc reactor-695
> 00:03:49 69387 69591 ? 00:03:49 rpc reactor-695
> 00:04:04 69387 69600 ? 00:04:04 rpc reactor-696
> 00:07:16 69387 69640 ? 00:07:16 rpc reactor-696
> 00:07:39 69387 69616 ? 00:07:39 rpc reactor-696
> 00:07:54 69387 69572 ? 00:07:54 rpc reactor-695
> 00:09:10 69387 69613 ? 00:09:10 rpc reactor-696
> 00:09:28 69387 69567 ? 00:09:28 rpc reactor-695
> 00:09:39 69387 69603 ? 00:09:39 rpc reactor-696
> 00:09:42 69387 69641 ? 00:09:42 rpc reactor-696
> 00:09:59 69387 69604 ? 00:09:59 rpc reactor-696
> 00:10:06 69387 69623 ? 00:10:06 rpc reactor-696
> 00:10:43 69387 69636 ? 00:10:43 rpc reactor-696
> 00:10:59 69387 69642 ? 00:10:59 rpc reactor-696
> 00:11:28 69387 69585 ? 00:11:28 rpc reactor-695
> 00:12:43 69387 69598 ? 00:12:43 rpc reactor-695
> 00:15:42 69387 69578 ? 00:15:42 rpc reactor-695
> 00:16:10 69387 69614 ? 00:16:10 rpc reactor-696
> 00:17:43 69387 69575 ? 00:17:43 rpc reactor-695
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)