You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Sean Zhong <cl...@gmail.com> on 2014/03/26 12:27:50 UTC

Storm Netty Performance

When running benchmark developed by Bobby(
http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty),
I found neither the CPU, memory, network can be satured when the message
size is small(10bytes - 100 bytes).


message sizespout throughput (MB/s) 103207 4016.25 8032.88100 43.1320080.13
400138 800186.381000 196.3810000234.75

I have 4 nodes, each node have very powerful CPU, E5-2680(32 cores). The
throughput reachs peak when only 30% CPU of each machine is used, and only
1/6 of network bandwidth is used.

So I guess this may relate to netty performance.

My questions:
1. Seems we are using synchronized way to transfer message in netty client
worker, We are sending message only after we receive response of last
message request from the netty server, can this hurt performance?
2. Although we have batched the message when sending it through netty
channel.send, but the batch size varies. In my test, I found the batch size
varies from tens of bytes to a few KB. Will a bigger and constant batch
size help here?


The following part are the steps I tried to trouble shooting the problem.
----------------------------------------------------------------
1. Considering the CPU is not fully usd, I tried to scale out by adding
more workers or increasing the parallelism, but the throughput doesn't
improve.

2. By checking profiling tool like visualvm, I found the spout/bolt only
have 60% - 70% time waiting, blocked on disruptor queue, spout spends 70%
sleeping, acker spends 40% time waiting, while Netty boss and worker, and
zookeeper threads threads are busy.

3. I have tried to tune all possible enumations of spout.max.pending,
transfer.size, receiver.size, executor.input.size, executor.output.size,
 but it doesn't works out.


Sean

Re: Storm Netty Performance

Posted by Bobby Evans <ev...@yahoo-inc.com>.
Also I honestly did not try to optimize the throughput/latency at all.  My
goal was to profile it in comparison to the default zeromq implementation
to be sure that there was no regression.

—Bobby

On 4/1/14, 8:17 AM, "Sean Zhong" <cl...@gmail.com> wrote:

>I did some experiment, and are able to double the max throughput for small
>message (100 bytes) by changing the netty glue code. But even after that,
>there are still scability problem, the resource cannot be fully used.
>I realized that in your test, you only have 8 core CPU, so for you CPU is
>a
>bottleneck. While for me, I have 32core(virtual 64) core CPU, so this
>problem exposed.
>
>You are right, it is a "latency vs throughput" problem. Especially if we
>buffer too much, it is possible that it can cache all spout.max.pending,
>and there will no traffic and the  topology will just wait for nothing. I
>am still doing experiment to make a better balance between "latency vs
>throughput"
>
>
>Sean
>
>
>On Tue, Apr 1, 2014 at 4:16 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:
>
>> You are correct that we do not send a new batch of messages until the
>> current batch has been acked.  It should not be too difficult to switch
>>to
>> pipelining the messages so more then one batch is in flight at any point
>> in time, but we wanted to get accuracy before digging more deeply into
>> performance.
>>
>> As for the fixed batch size that is a latency vs throughput question,
>>and
>> is likely to vary depending on the use case you have.
>>
>> The bigger problem that I have seen is with the number of threads that
>> Netty is using for larger topologies.  I think we have a fix for that,
>>but
>> Andy and I have not had the time to put together a patch for the
>>community
>> yet.  I will try to get to it this week.
>>
>> ‹Bobby
>>
>> On 3/26/14, 6:27 AM, "Sean Zhong" <cl...@gmail.com> wrote:
>>
>> >When running benchmark developed by Bobby(
>> 
>>>http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty)
>>>,
>> >I found neither the CPU, memory, network can be satured when the
>>message
>> >size is small(10bytes - 100 bytes).
>> >
>> >
>> >message sizespout throughput (MB/s) 103207 4016.25 8032.88100
>> >43.1320080.13
>> >400138 800186.381000 196.3810000234.75
>> >
>> >I have 4 nodes, each node have very powerful CPU, E5-2680(32 cores).
>>The
>> >throughput reachs peak when only 30% CPU of each machine is used, and
>>only
>> >1/6 of network bandwidth is used.
>> >
>> >So I guess this may relate to netty performance.
>> >
>> >My questions:
>> >1. Seems we are using synchronized way to transfer message in netty
>>client
>> >worker, We are sending message only after we receive response of last
>> >message request from the netty server, can this hurt performance?
>> >2. Although we have batched the message when sending it through netty
>> >channel.send, but the batch size varies. In my test, I found the batch
>> >size
>> >varies from tens of bytes to a few KB. Will a bigger and constant batch
>> >size help here?
>> >
>> >
>> >The following part are the steps I tried to trouble shooting the
>>problem.
>> >----------------------------------------------------------------
>> >1. Considering the CPU is not fully usd, I tried to scale out by adding
>> >more workers or increasing the parallelism, but the throughput doesn't
>> >improve.
>> >
>> >2. By checking profiling tool like visualvm, I found the spout/bolt
>>only
>> >have 60% - 70% time waiting, blocked on disruptor queue, spout spends
>>70%
>> >sleeping, acker spends 40% time waiting, while Netty boss and worker,
>>and
>> >zookeeper threads threads are busy.
>> >
>> >3. I have tried to tune all possible enumations of spout.max.pending,
>> >transfer.size, receiver.size, executor.input.size,
>>executor.output.size,
>> > but it doesn't works out.
>> >
>> >
>> >Sean
>>
>>


Re: Storm Netty Performance

Posted by Sean Zhong <cl...@gmail.com>.
I did some experiment, and are able to double the max throughput for small
message (100 bytes) by changing the netty glue code. But even after that,
there are still scability problem, the resource cannot be fully used.
I realized that in your test, you only have 8 core CPU, so for you CPU is a
bottleneck. While for me, I have 32core(virtual 64) core CPU, so this
problem exposed.

You are right, it is a "latency vs throughput" problem. Especially if we
buffer too much, it is possible that it can cache all spout.max.pending,
and there will no traffic and the  topology will just wait for nothing. I
am still doing experiment to make a better balance between "latency vs
throughput"


Sean


On Tue, Apr 1, 2014 at 4:16 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> You are correct that we do not send a new batch of messages until the
> current batch has been acked.  It should not be too difficult to switch to
> pipelining the messages so more then one batch is in flight at any point
> in time, but we wanted to get accuracy before digging more deeply into
> performance.
>
> As for the fixed batch size that is a latency vs throughput question, and
> is likely to vary depending on the use case you have.
>
> The bigger problem that I have seen is with the number of threads that
> Netty is using for larger topologies.  I think we have a fix for that, but
> Andy and I have not had the time to put together a patch for the community
> yet.  I will try to get to it this week.
>
> ‹Bobby
>
> On 3/26/14, 6:27 AM, "Sean Zhong" <cl...@gmail.com> wrote:
>
> >When running benchmark developed by Bobby(
> >http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty),
> >I found neither the CPU, memory, network can be satured when the message
> >size is small(10bytes - 100 bytes).
> >
> >
> >message sizespout throughput (MB/s) 103207 4016.25 8032.88100
> >43.1320080.13
> >400138 800186.381000 196.3810000234.75
> >
> >I have 4 nodes, each node have very powerful CPU, E5-2680(32 cores). The
> >throughput reachs peak when only 30% CPU of each machine is used, and only
> >1/6 of network bandwidth is used.
> >
> >So I guess this may relate to netty performance.
> >
> >My questions:
> >1. Seems we are using synchronized way to transfer message in netty client
> >worker, We are sending message only after we receive response of last
> >message request from the netty server, can this hurt performance?
> >2. Although we have batched the message when sending it through netty
> >channel.send, but the batch size varies. In my test, I found the batch
> >size
> >varies from tens of bytes to a few KB. Will a bigger and constant batch
> >size help here?
> >
> >
> >The following part are the steps I tried to trouble shooting the problem.
> >----------------------------------------------------------------
> >1. Considering the CPU is not fully usd, I tried to scale out by adding
> >more workers or increasing the parallelism, but the throughput doesn't
> >improve.
> >
> >2. By checking profiling tool like visualvm, I found the spout/bolt only
> >have 60% - 70% time waiting, blocked on disruptor queue, spout spends 70%
> >sleeping, acker spends 40% time waiting, while Netty boss and worker, and
> >zookeeper threads threads are busy.
> >
> >3. I have tried to tune all possible enumations of spout.max.pending,
> >transfer.size, receiver.size, executor.input.size, executor.output.size,
> > but it doesn't works out.
> >
> >
> >Sean
>
>

Re: Storm Netty Performance

Posted by Bobby Evans <ev...@yahoo-inc.com>.
You are correct that we do not send a new batch of messages until the
current batch has been acked.  It should not be too difficult to switch to
pipelining the messages so more then one batch is in flight at any point
in time, but we wanted to get accuracy before digging more deeply into
performance.

As for the fixed batch size that is a latency vs throughput question, and
is likely to vary depending on the use case you have.

The bigger problem that I have seen is with the number of threads that
Netty is using for larger topologies.  I think we have a fix for that, but
Andy and I have not had the time to put together a patch for the community
yet.  I will try to get to it this week.

‹Bobby

On 3/26/14, 6:27 AM, "Sean Zhong" <cl...@gmail.com> wrote:

>When running benchmark developed by Bobby(
>http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty),
>I found neither the CPU, memory, network can be satured when the message
>size is small(10bytes - 100 bytes).
>
>
>message sizespout throughput (MB/s) 103207 4016.25 8032.88100
>43.1320080.13
>400138 800186.381000 196.3810000234.75
>
>I have 4 nodes, each node have very powerful CPU, E5-2680(32 cores). The
>throughput reachs peak when only 30% CPU of each machine is used, and only
>1/6 of network bandwidth is used.
>
>So I guess this may relate to netty performance.
>
>My questions:
>1. Seems we are using synchronized way to transfer message in netty client
>worker, We are sending message only after we receive response of last
>message request from the netty server, can this hurt performance?
>2. Although we have batched the message when sending it through netty
>channel.send, but the batch size varies. In my test, I found the batch
>size
>varies from tens of bytes to a few KB. Will a bigger and constant batch
>size help here?
>
>
>The following part are the steps I tried to trouble shooting the problem.
>----------------------------------------------------------------
>1. Considering the CPU is not fully usd, I tried to scale out by adding
>more workers or increasing the parallelism, but the throughput doesn't
>improve.
>
>2. By checking profiling tool like visualvm, I found the spout/bolt only
>have 60% - 70% time waiting, blocked on disruptor queue, spout spends 70%
>sleeping, acker spends 40% time waiting, while Netty boss and worker, and
>zookeeper threads threads are busy.
>
>3. I have tried to tune all possible enumations of spout.max.pending,
>transfer.size, receiver.size, executor.input.size, executor.output.size,
> but it doesn't works out.
>
>
>Sean