You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "yangxun@bupt.edu.cn" <ya...@bupt.edu.cn> on 2015/05/12 03:44:33 UTC

How much is the overhead of time to deploy a system on Storm ?

Hi and thanks .

I'm working on a parrallel algorithm, which is to count massive items in data streams. The previous researches on the parallelism of this algorithm were focusing on muti-core CPU, however, I want to take advantage of Storm.

Processing latency is extremly important for this algorithm, and I did some evaluation of the perfomance.

Firstly,  I implemented the algorithm in java(one thread, with no parallelism) and I get the performance : it could process 3 million items per second.

Secondly,  I wrapped this implement of the algorithm into Storm(just one Spout to process) and I get the perfomance: it could process only 0.75 million items per second. I changes a little bit of my impletment to adapt Storm structure, but in the end the perfomance is still not good....

ps. I didn't take the network overhead into consideration because I just run the program in the single Spout node so that there is no emit or transfer.(so I don't care how storm emits messages between nodes for now ) The program on Spout is actually doing the same thing as the former one.(I just copy the program into the NextTuple() method with some necessary changes)

1. The degration(1/4 of the speed) is inevitable? 
2. What incurred the degration?
3. How can I reduce the degration?

Thank you all.




yangxun@bupt.edu.cn

RE: Re: How much is the overhead of time to deploy a system on Storm ?

Posted by Nathan Leung <nc...@gmail.com>.

Maybe I will make an analogy. Think of spout executors as people wrapping
presents. Think of spout tasks as tables where people can wrap presents.

If you have 10 tasks and 1 executor, then you have 10 tasks and 1 person.
The person will wrap a present at one table, then go to the next, wrap a
present, etc. If you have 10 tasks and 10 executors then you have 1 person
at each table.

Adding spout tasks to handle i/o blocking will not help unless you use
asynchronous i/o from multiple sources. Personally I find it easier to
understand more executors that are blocked synchronously.
On May 14, 2015 9:25 AM, <Ra...@dellteam.com> wrote:

> *Dell - Internal Use - Confidential *
>
> Nathan,
>
>
>
> Can you explain in a little more detail what you mean by *“When you have
> more tasks than executors, the spout thread does the same logic, it just
> does it for more tasks during its main loop.” * I thought the spout
> thread emits tuples based on the max spout pending and how quickly the
> downstream bolts are processing the incoming tuples.
>
>
>
> +1 for setting number of tasks of a bolt to a higher number so that you
> can rebalance later on based on the need.
>
>
>
> The other time I would consider having more than 1 task per executor
> thread  is when the task is IO intensive and you are waiting on the
> response coming back most of the time instead of being CPU intensive.
>
>
>
>
>
> *From:* Nathan Leung [mailto:ncleung@gmail.com]
> *Sent:* Thursday, May 14, 2015 8:05 AM
> *To:* yangxun@bupt.edu.cn
> *Cc:* user
> *Subject:* Re: Re: How much is the overhead of time to deploy a system on
> Storm ?
>
>
>
> I would expect that it depends on how many executors you have. In storm,
> an executor corresponds to an OS thread while a task is more of a logical
> unit of work. The only situation where I would personally use more tasks
> than executors is if I wanted to over provision the tasks so that I can
> rebalance to use more executors in the future (you cannot change number of
> tasks in rebalance).
>
> When you have more tasks than executors, the spout thread does the same
> logic, it just does it for more tasks during its main loop. I'm not sure
> why that would increase your per thread throughput.
>
> On May 13, 2015 10:13 PM, "yangxun@bupt.edu.cn" <ya...@bupt.edu.cn>
> wrote:
>
> hi,Nathan
>
> actually I tried many ways to make my program fit the Storm
>
> 1: a 'while(true)' in the nextTuple()
>
> 2: execute n times in one nextTuple.
>
> I don't need to batch messages because what I really care is the speed it
> processes(emit phase is not the bottleneck).
>
> I want to mention this: I only created one single spout task in one
> machine node.
>
> and I read some papers about storm evaluation, they did some parrallelism
> to some extent. So I tried to add some parrallelism(10tasks per executor
> per node),and I got a pretty good result(the same throughout with the java
> program).
>
> I wonder if this is the design pattern we should pick in storm?
>
>
> ------------------------------
>
> yangxun@bupt.edu.cn
>
>
>
> *From:* Nathan Leung <nc...@gmail.com>
>
> *Date:* 2015-05-12 20:57
>
> *To:* user <us...@storm.apache.org>
>
> *Subject:* Re: How much is the overhead of time to deploy a system on
> Storm ?
>
> I'm not very surprised. See for example published single machine
> benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan
> Marz though that figure is a little old). This is spout to bolt and matches
> my observations for trivial cases. With some processing logic and only one
> spout I can see how it's lower.
>
> You can reduce the overhead by batching your work differently, eg by doing
> more work in each call to nextTuple.
>
> On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
> wrote:
>
> Can you share your code?
>
> Do you process a single tuple each time nextTuple() is called? If a
> spout does not emit anything, Storm applies a waiting-penalty to avoid
> busy waiting. That might slow down your code.
>
> You can configure the waiting strategy:
> https://storm.apache.org/2012/09/06/storm081-released.html
>
> -Matthias
>
>
> On 05/12/2015 09:31 AM, Daniel Compton wrote:
> > I'm also interested on the answers to this question, but to add to the
> > discussion, take a look at
> >
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
> .
> > I suspect Storm is still introducing coordination overhead even running
> > on a single machine.
> > On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn
> > <ma...@bupt.edu.cn> <yangxun@bupt.edu.cn
> > <ma...@bupt.edu.cn>> wrote:
> >
> >     __
> >     Hi and thanks .
> >
> >     I'm working on a parrallel algorithm, which is to count massive
> >     items in data streams. The previous researches on the parallelism of
> >     this algorithm were focusing on muti-core CPU, however, I want to
> >     take advantage of Storm.
> >
> >     Processing latency is extremly important for this algorithm, and I
> >     did some evaluation of the perfomance.
> >
> >     Firstly,  I implemented the algorithm in java(one thread, with no
> >     parallelism) and I get the performance : it could process 3 million
> >     items per second.
> >
> >     Secondly,  I wrapped this implement of the algorithm into Storm(just
> >     one Spout to process) and I get the perfomance: it could process
> >     only 0.75 million items per second. I changes a little bit of my
> >     impletment to adapt Storm structure, but in the end the perfomance
> >     is still not good....
> >
> >     ps. I didn't take the network overhead into consideration because I
> >     just run the program in the single Spout node so that there is no
> >     emit or transfer.(so I don't care how storm emits messages between
> >     nodes for now ) The program on Spout is actually doing the same
> >     thing as the former one.(I just copy the program into the
> >     NextTuple() method with some necessary changes)
> >
> >     1. The degration(1/4 of the speed) is inevitable?
> >     2. What incurred the degration?
> >     3. How can I reduce the degration?
> >
> >     Thank you all.
> >
> >
>  ------------------------------------------------------------------------
> >     yangxun@bupt.edu.cn <ma...@bupt.edu.cn>
> >
>

RE: Re: How much is the overhead of time to deploy a system on Storm ?

Posted by Ra...@DellTeam.com.

Dell - Internal Use - Confidential
Nathan,

Can you explain in a little more detail what you mean by “When you have more tasks than executors, the spout thread does the same logic, it just does it for more tasks during its main loop.”  I thought the spout thread emits tuples based on the max spout pending and how quickly the downstream bolts are processing the incoming tuples.

+1 for setting number of tasks of a bolt to a higher number so that you can rebalance later on based on the need.

The other time I would consider having more than 1 task per executor thread  is when the task is IO intensive and you are waiting on the response coming back most of the time instead of being CPU intensive.

From: Nathan Leung [mailto:ncleung@gmail.com]
Sent: Thursday, May 14, 2015 8:05 AM
To: yangxun@bupt.edu.cn
Cc: user
Subject: Re: Re: How much is the overhead of time to deploy a system on Storm ?

I would expect that it depends on how many executors you have. In storm, an executor corresponds to an OS thread while a task is more of a logical unit of work. The only situation where I would personally use more tasks than executors is if I wanted to over provision the tasks so that I can rebalance to use more executors in the future (you cannot change number of tasks in rebalance).

When you have more tasks than executors, the spout thread does the same logic, it just does it for more tasks during its main loop. I'm not sure why that would increase your per thread throughput.
On May 13, 2015 10:13 PM, "yangxun@bupt.edu.cn<ma...@bupt.edu.cn>" <ya...@bupt.edu.cn>> wrote:
hi,Nathan
actually I tried many ways to make my program fit the Storm
1: a 'while(true)' in the nextTuple()
2: execute n times in one nextTuple.
I don't need to batch messages because what I really care is the speed it processes(emit phase is not the bottleneck).
I want to mention this: I only created one single spout task in one machine node.
and I read some papers about storm evaluation, they did some parrallelism to some extent. So I tried to add some parrallelism(10tasks per executor per node),and I got a pretty good result(the same throughout with the java program).
I wonder if this is the design pattern we should pick in storm?

________________________________
yangxun@bupt.edu.cn<ma...@bupt.edu.cn>

From: Nathan Leung<ma...@gmail.com>
Date: 2015-05-12 20:57
To: user<ma...@storm.apache.org>
Subject: Re: How much is the overhead of time to deploy a system on Storm ?

I'm not very surprised. See for example published single machine benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan Marz though that figure is a little old). This is spout to bolt and matches my observations for trivial cases. With some processing logic and only one spout I can see how it's lower.

You can reduce the overhead by batching your work differently, eg by doing more work in each call to nextTuple.
On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>> wrote:
Can you share your code?

Do you process a single tuple each time nextTuple() is called? If a
spout does not emit anything, Storm applies a waiting-penalty to avoid
busy waiting. That might slow down your code.

You can configure the waiting strategy:
https://storm.apache.org/2012/09/06/storm081-released.html

-Matthias

On 05/12/2015 09:31 AM, Daniel Compton wrote:
> I'm also interested on the answers to this question, but to add to the
> discussion, take a look at
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html.
> I suspect Storm is still introducing coordination overhead even running
> on a single machine.
> On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn<ma...@bupt.edu.cn>
> <ma...@bupt.edu.cn>> <ya...@bupt.edu.cn>
> <ma...@bupt.edu.cn>>> wrote:
>
>     __
>     Hi and thanks .
>
>     I'm working on a parrallel algorithm, which is to count massive
>     items in data streams. The previous researches on the parallelism of
>     this algorithm were focusing on muti-core CPU, however, I want to
>     take advantage of Storm.
>
>     Processing latency is extremly important for this algorithm, and I
>     did some evaluation of the perfomance.
>
>     Firstly,  I implemented the algorithm in java(one thread, with no
>     parallelism) and I get the performance : it could process 3 million
>     items per second.
>
>     Secondly,  I wrapped this implement of the algorithm into Storm(just
>     one Spout to process) and I get the perfomance: it could process
>     only 0.75 million items per second. I changes a little bit of my
>     impletment to adapt Storm structure, but in the end the perfomance
>     is still not good....
>
>     ps. I didn't take the network overhead into consideration because I
>     just run the program in the single Spout node so that there is no
>     emit or transfer.(so I don't care how storm emits messages between
>     nodes for now ) The program on Spout is actually doing the same
>     thing as the former one.(I just copy the program into the
>     NextTuple() method with some necessary changes)
>
>     1. The degration(1/4 of the speed) is inevitable?
>     2. What incurred the degration?
>     3. How can I reduce the degration?
>
>     Thank you all.
>
>     ------------------------------------------------------------------------
>     yangxun@bupt.edu.cn<ma...@bupt.edu.cn> <ma...@bupt.edu.cn>>
>

Re: Re: How much is the overhead of time to deploy a system on Storm ?

Posted by Nathan Leung <nc...@gmail.com>.

I would expect that it depends on how many executors you have. In storm, an
executor corresponds to an OS thread while a task is more of a logical unit
of work. The only situation where I would personally use more tasks than
executors is if I wanted to over provision the tasks so that I can
rebalance to use more executors in the future (you cannot change number of
tasks in rebalance).

When you have more tasks than executors, the spout thread does the same
logic, it just does it for more tasks during its main loop. I'm not sure
why that would increase your per thread throughput.
On May 13, 2015 10:13 PM, "yangxun@bupt.edu.cn" <ya...@bupt.edu.cn> wrote:

>  hi,Nathan
>  actually I tried many ways to make my program fit the Storm
>  1: a 'while(true)' in the nextTuple()
> 2: execute n times in one nextTuple.
>  I don't need to batch messages because what I really care is the speed
> it processes(emit phase is not the bottleneck).
>  I want to mention this: I only created one single spout task in one
> machine node.
> and I read some papers about storm evaluation, they did some parrallelism
> to some extent. So I tried to add some parrallelism(10tasks per executor
> per node),and I got a pretty good result(the same throughout with the java
> program).
>  I wonder if this is the design pattern we should pick in storm?
>
> ------------------------------
>  yangxun@bupt.edu.cn
>
>  *From:* Nathan Leung <nc...@gmail.com>
> *Date:* 2015-05-12 20:57
> *To:* user <us...@storm.apache.org>
> *Subject:* Re: How much is the overhead of time to deploy a system on
> Storm ?
>
> I'm not very surprised. See for example published single machine
> benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan
> Marz though that figure is a little old). This is spout to bolt and matches
> my observations for trivial cases. With some processing logic and only one
> spout I can see how it's lower.
>
> You can reduce the overhead by batching your work differently, eg by doing
> more work in each call to nextTuple.
> On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
> wrote:
>
>> Can you share your code?
>>
>> Do you process a single tuple each time nextTuple() is called? If a
>> spout does not emit anything, Storm applies a waiting-penalty to avoid
>> busy waiting. That might slow down your code.
>>
>> You can configure the waiting strategy:
>> https://storm.apache.org/2012/09/06/storm081-released.html
>>
>> -Matthias
>>
>>
>> On 05/12/2015 09:31 AM, Daniel Compton wrote:
>> > I'm also interested on the answers to this question, but to add to the
>> > discussion, take a look at
>> >
>> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
>> .
>> > I suspect Storm is still introducing coordination overhead even running
>> > on a single machine.
>> > On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn
>> > <ma...@bupt.edu.cn> <yangxun@bupt.edu.cn
>> > <ma...@bupt.edu.cn>> wrote:
>> >
>> >     __
>> >     Hi and thanks .
>> >
>> >     I'm working on a parrallel algorithm, which is to count massive
>> >     items in data streams. The previous researches on the parallelism of
>> >     this algorithm were focusing on muti-core CPU, however, I want to
>> >     take advantage of Storm.
>> >
>> >     Processing latency is extremly important for this algorithm, and I
>> >     did some evaluation of the perfomance.
>> >
>> >     Firstly,  I implemented the algorithm in java(one thread, with no
>> >     parallelism) and I get the performance : it could process 3 million
>> >     items per second.
>> >
>> >     Secondly,  I wrapped this implement of the algorithm into Storm(just
>> >     one Spout to process) and I get the perfomance: it could process
>> >     only 0.75 million items per second. I changes a little bit of my
>> >     impletment to adapt Storm structure, but in the end the perfomance
>> >     is still not good....
>> >
>> >     ps. I didn't take the network overhead into consideration because I
>> >     just run the program in the single Spout node so that there is no
>> >     emit or transfer.(so I don't care how storm emits messages between
>> >     nodes for now ) The program on Spout is actually doing the same
>> >     thing as the former one.(I just copy the program into the
>> >     NextTuple() method with some necessary changes)
>> >
>> >     1. The degration(1/4 of the speed) is inevitable?
>> >     2. What incurred the degration?
>> >     3. How can I reduce the degration?
>> >
>> >     Thank you all.
>> >
>> >
>>  ------------------------------------------------------------------------
>> >     yangxun@bupt.edu.cn <ma...@bupt.edu.cn>
>> >
>>
>>

Re: Re: How much is the overhead of time to deploy a system on Storm ?

Posted by "yangxun@bupt.edu.cn" <ya...@bupt.edu.cn>.

I'm not very surprised. See for example published single machine benchmarks
(iirc 1.6 million tuples / s is the official figure from Nathan Marz though
that figure is a little old). This is spout to bolt and matches my
observations for trivial cases. With some processing logic and only one
spout I can see how it's lower.

You can reduce the overhead by batching your work differently, eg by doing
more work in each call to nextTuple.
On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
wrote:

> Can you share your code?
>
> Do you process a single tuple each time nextTuple() is called? If a
> spout does not emit anything, Storm applies a waiting-penalty to avoid
> busy waiting. That might slow down your code.
>
> You can configure the waiting strategy:
> https://storm.apache.org/2012/09/06/storm081-released.html
>
> -Matthias
>
>
> On 05/12/2015 09:31 AM, Daniel Compton wrote:
> > I'm also interested on the answers to this question, but to add to the
> > discussion, take a look at
> >
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
> .
> > I suspect Storm is still introducing coordination overhead even running
> > on a single machine.
> > On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn
> > <ma...@bupt.edu.cn> <yangxun@bupt.edu.cn
> > <ma...@bupt.edu.cn>> wrote:
> >
> >     __
> >     Hi and thanks .
> >
> >     I'm working on a parrallel algorithm, which is to count massive
> >     items in data streams. The previous researches on the parallelism of
> >     this algorithm were focusing on muti-core CPU, however, I want to
> >     take advantage of Storm.
> >
> >     Processing latency is extremly important for this algorithm, and I
> >     did some evaluation of the perfomance.
> >
> >     Firstly,  I implemented the algorithm in java(one thread, with no
> >     parallelism) and I get the performance : it could process 3 million
> >     items per second.
> >
> >     Secondly,  I wrapped this implement of the algorithm into Storm(just
> >     one Spout to process) and I get the perfomance: it could process
> >     only 0.75 million items per second. I changes a little bit of my
> >     impletment to adapt Storm structure, but in the end the perfomance
> >     is still not good....
> >
> >     ps. I didn't take the network overhead into consideration because I
> >     just run the program in the single Spout node so that there is no
> >     emit or transfer.(so I don't care how storm emits messages between
> >     nodes for now ) The program on Spout is actually doing the same
> >     thing as the former one.(I just copy the program into the
> >     NextTuple() method with some necessary changes)
> >
> >     1. The degration(1/4 of the speed) is inevitable?
> >     2. What incurred the degration?
> >     3. How can I reduce the degration?
> >
> >     Thank you all.
> >
> >
>  ------------------------------------------------------------------------
> >     yangxun@bupt.edu.cn <ma...@bupt.edu.cn>
> >
>
>

Re: How much is the overhead of time to deploy a system on Storm ?

Posted by Nathan Leung <nc...@gmail.com>.

Actually that figure is from a Nathan Marz tweet, but he also cites the
million mark here: http://nathanmarz.com/blog/storms-1st-birthday.html

When I saw this type of throughout it was with a canned example that I
created solely for testing throughput.  Also it was run on pretty beefy
hardware so ymmv.
On May 13, 2015 9:24 AM, "Jeffery Maass" <ma...@gmail.com> wrote:

> Nathan:
>
> Where can I find this?
> "See for example published single machine benchmarks"
>
> Thank you for your time!
>
> +++++++++++++++++++++
> Jeff Maass <ma...@gmail.com>
> linkedin.com/in/jeffmaass
> stackoverflow.com/users/373418/maassql
> +++++++++++++++++++++
>
>
> On Tue, May 12, 2015 at 7:57 AM, Nathan Leung <nc...@gmail.com> wrote:
>
>> I'm not very surprised. See for example published single machine
>> benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan
>> Marz though that figure is a little old). This is spout to bolt and matches
>> my observations for trivial cases. With some processing logic and only one
>> spout I can see how it's lower.
>>
>> You can reduce the overhead by batching your work differently, eg by
>> doing more work in each call to nextTuple.
>> On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
>> wrote:
>>
>>> Can you share your code?
>>>
>>> Do you process a single tuple each time nextTuple() is called? If a
>>> spout does not emit anything, Storm applies a waiting-penalty to avoid
>>> busy waiting. That might slow down your code.
>>>
>>> You can configure the waiting strategy:
>>> https://storm.apache.org/2012/09/06/storm081-released.html
>>>
>>> -Matthias
>>>
>>>
>>> On 05/12/2015 09:31 AM, Daniel Compton wrote:
>>> > I'm also interested on the answers to this question, but to add to the
>>> > discussion, take a look at
>>> >
>>> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
>>> .
>>> > I suspect Storm is still introducing coordination overhead even running
>>> > on a single machine.
>>> > On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn
>>> > <ma...@bupt.edu.cn> <yangxun@bupt.edu.cn
>>> > <ma...@bupt.edu.cn>> wrote:
>>> >
>>> >     __
>>> >     Hi and thanks .
>>> >
>>> >     I'm working on a parrallel algorithm, which is to count massive
>>> >     items in data streams. The previous researches on the parallelism
>>> of
>>> >     this algorithm were focusing on muti-core CPU, however, I want to
>>> >     take advantage of Storm.
>>> >
>>> >     Processing latency is extremly important for this algorithm, and I
>>> >     did some evaluation of the perfomance.
>>> >
>>> >     Firstly,  I implemented the algorithm in java(one thread, with no
>>> >     parallelism) and I get the performance : it could process 3 million
>>> >     items per second.
>>> >
>>> >     Secondly,  I wrapped this implement of the algorithm into
>>> Storm(just
>>> >     one Spout to process) and I get the perfomance: it could process
>>> >     only 0.75 million items per second. I changes a little bit of my
>>> >     impletment to adapt Storm structure, but in the end the perfomance
>>> >     is still not good....
>>> >
>>> >     ps. I didn't take the network overhead into consideration because I
>>> >     just run the program in the single Spout node so that there is no
>>> >     emit or transfer.(so I don't care how storm emits messages between
>>> >     nodes for now ) The program on Spout is actually doing the same
>>> >     thing as the former one.(I just copy the program into the
>>> >     NextTuple() method with some necessary changes)
>>> >
>>> >     1. The degration(1/4 of the speed) is inevitable?
>>> >     2. What incurred the degration?
>>> >     3. How can I reduce the degration?
>>> >
>>> >     Thank you all.
>>> >
>>> >
>>>  ------------------------------------------------------------------------
>>> >     yangxun@bupt.edu.cn <ma...@bupt.edu.cn>
>>> >
>>>
>>>
>

Re: How much is the overhead of time to deploy a system on Storm ?

Posted by Jeffery Maass <ma...@gmail.com>.

Nathan:

Where can I find this?
"See for example published single machine benchmarks"

Thank you for your time!

+++++++++++++++++++++
Jeff Maass <ma...@gmail.com>
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++


On Tue, May 12, 2015 at 7:57 AM, Nathan Leung <nc...@gmail.com> wrote:

> I'm not very surprised. See for example published single machine
> benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan
> Marz though that figure is a little old). This is spout to bolt and matches
> my observations for trivial cases. With some processing logic and only one
> spout I can see how it's lower.
>
> You can reduce the overhead by batching your work differently, eg by doing
> more work in each call to nextTuple.
> On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
> wrote:
>
>> Can you share your code?
>>
>> Do you process a single tuple each time nextTuple() is called? If a
>> spout does not emit anything, Storm applies a waiting-penalty to avoid
>> busy waiting. That might slow down your code.
>>
>> You can configure the waiting strategy:
>> https://storm.apache.org/2012/09/06/storm081-released.html
>>
>> -Matthias
>>
>>
>> On 05/12/2015 09:31 AM, Daniel Compton wrote:
>> > I'm also interested on the answers to this question, but to add to the
>> > discussion, take a look at
>> >
>> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
>> .
>> > I suspect Storm is still introducing coordination overhead even running
>> > on a single machine.
>> > On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn
>> > <ma...@bupt.edu.cn> <yangxun@bupt.edu.cn
>> > <ma...@bupt.edu.cn>> wrote:
>> >
>> >     __
>> >     Hi and thanks .
>> >
>> >     I'm working on a parrallel algorithm, which is to count massive
>> >     items in data streams. The previous researches on the parallelism of
>> >     this algorithm were focusing on muti-core CPU, however, I want to
>> >     take advantage of Storm.
>> >
>> >     Processing latency is extremly important for this algorithm, and I
>> >     did some evaluation of the perfomance.
>> >
>> >     Firstly,  I implemented the algorithm in java(one thread, with no
>> >     parallelism) and I get the performance : it could process 3 million
>> >     items per second.
>> >
>> >     Secondly,  I wrapped this implement of the algorithm into Storm(just
>> >     one Spout to process) and I get the perfomance: it could process
>> >     only 0.75 million items per second. I changes a little bit of my
>> >     impletment to adapt Storm structure, but in the end the perfomance
>> >     is still not good....
>> >
>> >     ps. I didn't take the network overhead into consideration because I
>> >     just run the program in the single Spout node so that there is no
>> >     emit or transfer.(so I don't care how storm emits messages between
>> >     nodes for now ) The program on Spout is actually doing the same
>> >     thing as the former one.(I just copy the program into the
>> >     NextTuple() method with some necessary changes)
>> >
>> >     1. The degration(1/4 of the speed) is inevitable?
>> >     2. What incurred the degration?
>> >     3. How can I reduce the degration?
>> >
>> >     Thank you all.
>> >
>> >
>>  ------------------------------------------------------------------------
>> >     yangxun@bupt.edu.cn <ma...@bupt.edu.cn>
>> >
>>
>>

Re: How much is the overhead of time to deploy a system on Storm ?

Posted by Nathan Leung <nc...@gmail.com>.

I'm not very surprised. See for example published single machine benchmarks
(iirc 1.6 million tuples / s is the official figure from Nathan Marz though
that figure is a little old). This is spout to bolt and matches my
observations for trivial cases. With some processing logic and only one
spout I can see how it's lower.

You can reduce the overhead by batching your work differently, eg by doing
more work in each call to nextTuple.
On May 12, 2015 4:56 AM, "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
wrote:

> Can you share your code?
>
> Do you process a single tuple each time nextTuple() is called? If a
> spout does not emit anything, Storm applies a waiting-penalty to avoid
> busy waiting. That might slow down your code.
>
> You can configure the waiting strategy:
> https://storm.apache.org/2012/09/06/storm081-released.html
>
> -Matthias
>
>
> On 05/12/2015 09:31 AM, Daniel Compton wrote:
> > I'm also interested on the answers to this question, but to add to the
> > discussion, take a look at
> >
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
> .
> > I suspect Storm is still introducing coordination overhead even running
> > on a single machine.
> > On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn
> > <ma...@bupt.edu.cn> <yangxun@bupt.edu.cn
> > <ma...@bupt.edu.cn>> wrote:
> >
> >     __
> >     Hi and thanks .
> >
> >     I'm working on a parrallel algorithm, which is to count massive
> >     items in data streams. The previous researches on the parallelism of
> >     this algorithm were focusing on muti-core CPU, however, I want to
> >     take advantage of Storm.
> >
> >     Processing latency is extremly important for this algorithm, and I
> >     did some evaluation of the perfomance.
> >
> >     Firstly,  I implemented the algorithm in java(one thread, with no
> >     parallelism) and I get the performance : it could process 3 million
> >     items per second.
> >
> >     Secondly,  I wrapped this implement of the algorithm into Storm(just
> >     one Spout to process) and I get the perfomance: it could process
> >     only 0.75 million items per second. I changes a little bit of my
> >     impletment to adapt Storm structure, but in the end the perfomance
> >     is still not good....
> >
> >     ps. I didn't take the network overhead into consideration because I
> >     just run the program in the single Spout node so that there is no
> >     emit or transfer.(so I don't care how storm emits messages between
> >     nodes for now ) The program on Spout is actually doing the same
> >     thing as the former one.(I just copy the program into the
> >     NextTuple() method with some necessary changes)
> >
> >     1. The degration(1/4 of the speed) is inevitable?
> >     2. What incurred the degration?
> >     3. How can I reduce the degration?
> >
> >     Thank you all.
> >
> >
>  ------------------------------------------------------------------------
> >     yangxun@bupt.edu.cn <ma...@bupt.edu.cn>
> >
>
>

Re: How much is the overhead of time to deploy a system on Storm ?

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.

Can you share your code?

Do you process a single tuple each time nextTuple() is called? If a
spout does not emit anything, Storm applies a waiting-penalty to avoid
busy waiting. That might slow down your code.

You can configure the waiting strategy:
https://storm.apache.org/2012/09/06/storm081-released.html

-Matthias


On 05/12/2015 09:31 AM, Daniel Compton wrote:
> I'm also interested on the answers to this question, but to add to the
> discussion, take a look at
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html.
> I suspect Storm is still introducing coordination overhead even running
> on a single machine.
> On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn
> <ma...@bupt.edu.cn> <yangxun@bupt.edu.cn
> <ma...@bupt.edu.cn>> wrote:
> 
>     __
>     Hi and thanks .
>      
>     I'm working on a parrallel algorithm, which is to count massive
>     items in data streams. The previous researches on the parallelism of
>     this algorithm were focusing on muti-core CPU, however, I want to
>     take advantage of Storm.
>      
>     Processing latency is extremly important for this algorithm, and I
>     did some evaluation of the perfomance.
>      
>     Firstly,  I implemented the algorithm in java(one thread, with no
>     parallelism) and I get the performance : it could process 3 million
>     items per second.
>      
>     Secondly,  I wrapped this implement of the algorithm into Storm(just
>     one Spout to process) and I get the perfomance: it could process
>     only 0.75 million items per second. I changes a little bit of my
>     impletment to adapt Storm structure, but in the end the perfomance
>     is still not good....
>      
>     ps. I didn't take the network overhead into consideration because I
>     just run the program in the single Spout node so that there is no
>     emit or transfer.(so I don't care how storm emits messages between
>     nodes for now ) The program on Spout is actually doing the same
>     thing as the former one.(I just copy the program into the
>     NextTuple() method with some necessary changes)
>      
>     1. The degration(1/4 of the speed) is inevitable?
>     2. What incurred the degration?
>     3. How can I reduce the degration?
>      
>     Thank you all.
>      
>     ------------------------------------------------------------------------
>     yangxun@bupt.edu.cn <ma...@bupt.edu.cn>
>

Re: How much is the overhead of time to deploy a system on Storm ?

Posted by Daniel Compton <da...@gmail.com>.

I'm also interested on the answers to this question, but to add to the
discussion, take a look at
http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html.
I suspect Storm is still introducing coordination overhead even running on
a single machine.
On Tue, 12 May 2015 at 1:39 pm yangxun@bupt.edu.cn <ya...@bupt.edu.cn>
wrote:

>   Hi and thanks .
>
> I'm working on a parrallel algorithm, which is to count massive items in
> data streams. The previous researches on the parallelism of this algorithm
> were focusing on muti-core CPU, however, I want to take advantage of Storm.
>
> Processing latency is extremly important for this algorithm, and I did
> some evaluation of the perfomance.
>
> Firstly,  I implemented the algorithm in java(one thread, with no
> parallelism) and I get the performance : it could process 3 million items
> per second.
>
> Secondly,  I wrapped this implement of the algorithm into Storm(just one
> Spout to process) and I get the perfomance: it could process only 0.75
> million items per second. I changes a little bit of my impletment to adapt
> Storm structure, but in the end the perfomance is still not good....
>
> ps. I didn't take the network overhead into consideration because I just
> run the program in the single Spout node so that there is no emit or
> transfer.(so I don't care how storm emits messages between nodes for now
> ) The program on Spout is actually doing the same thing as the former
> one.(I just copy the program into the NextTuple() method with some
> necessary changes)
>
> 1. The degration(1/4 of the speed) is inevitable?
> 2. What incurred the degration?
> 3. How can I reduce the degration?
>
> Thank you all.
>
> ------------------------------
>  yangxun@bupt.edu.cn
>