You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Wilson Akio Higashino <vi...@gmail.com> on 2015/02/24 21:37:03 UTC

Spouts tuple generation rate

Dear all,

I have a simple topology composed of a spout followed by three bolts, and I
want to measure the processing latency as a function of the tuple incoming
rate.

To execute this test, I created a Spout that from time to time "create" a
new tuple and emit it to the topology. In order to control the generation
rate, I simply sleep for a configurable period. The code follows the
general idea present in some of the "storm-starter" topologies:

   public void nextTuple() {
        Utils.sleep(SLEEP_TIME);

        // Create test tuple and emit
   }


For "slow" rates the spout can generate tuples with good accuracy. For
example, if I sleep for 10 milliseconds, the rate should be around 100
tuples/second - and I get around 92 tuples/second.
However, if I increase the rate, the error becomes very large (for example,
for 1 millisecond sleep, I get only 650 tuples/second instead of the
theoretical 1000 tuples/second).

In addition:

- Everything is running on a single Worker.

- Generally, there are no tuples waiting on any of the receiving / sending
queues.

- The code generating the tuple is not a bottleneck, because when I remove
the Utils.sleep line I get a generation rate of over 10,000 tuples /
second. This result also shows me that the topology can handle larger rates
without problems.


I understand that the way I am programming the "nextTuple" method only
guarantees an upper bound on the generation rate, but I would like to have
better control over it.

My questions are:

- Is there anything on Storm internals that justify this behaviour? I
thought it could be related to the "SpoutWaitStrategy" associated with the
Spout, but I switched to other strategies and didn't have any effect.

- Any ideas / thoughts on how I could better control the tuple generation
rate other than using this sleep / awake pattern?


I appreciate your help.

Regards,

Wilson

Re: Spouts tuple generation rate

Posted by Wilson Akio Higashino <vi...@gmail.com>.

Hi Nathan, thanks for your response.

I hit the Send button and realized I should have looked at the sleep precision.

Anyway, my Linux already had the high precision clocks enabled. 
I fixed my problem by using a ScheduledExecutorService instead of Thread.sleep. Now I am getting 1ms precision!

Regards,

Wilson


> On Feb 24, 2015, at 4:13 PM, Nathan Leung <nc...@gmail.com> wrote:
> 
> firstly, sleep is imprecise, if you say "sleep(1)" this means "sleep for at least 1 millisecond".
> 
> next, I would check to see if high resolution timers are supported and enabled on your system (see for example http://linux.die.net/man/7/time <http://linux.die.net/man/7/time>).
> 
> If you are running Linux and don't have high resolution timers enabled your sleep resolution is limited to the duration of a "jiffy", which on most modern systems is 1ms.  This means that if you sleep(1), it will on average sleep 1.5ms, which yields just over 660 tuples / s, roughly matching your observation.
> 
> 
> On Tue, Feb 24, 2015 at 3:37 PM, Wilson Akio Higashino <virsox@gmail.com <ma...@gmail.com>> wrote:
> Dear all,
> 
> I have a simple topology composed of a spout followed by three bolts, and I want to measure the processing latency as a function of the tuple incoming rate.
> 
> To execute this test, I created a Spout that from time to time "create" a new tuple and emit it to the topology. In order to control the generation rate, I simply sleep for a configurable period. The code follows the general idea present in some of the "storm-starter" topologies:
> 
>    public void nextTuple() {
>         Utils.sleep(SLEEP_TIME);
> 
>         // Create test tuple and emit
>    }
> 
> 
> For "slow" rates the spout can generate tuples with good accuracy. For example, if I sleep for 10 milliseconds, the rate should be around 100 tuples/second - and I get around 92 tuples/second.
> However, if I increase the rate, the error becomes very large (for example, for 1 millisecond sleep, I get only 650 tuples/second instead of the theoretical 1000 tuples/second).
> 
> In addition:
> 
> - Everything is running on a single Worker.
> 
> - Generally, there are no tuples waiting on any of the receiving / sending queues.
> 
> - The code generating the tuple is not a bottleneck, because when I remove the Utils.sleep line I get a generation rate of over 10,000 tuples / second. This result also shows me that the topology can handle larger rates without problems.
> 
> 
> I understand that the way I am programming the "nextTuple" method only guarantees an upper bound on the generation rate, but I would like to have better control over it.
> 
> My questions are:
> 
> - Is there anything on Storm internals that justify this behaviour? I thought it could be related to the "SpoutWaitStrategy" associated with the Spout, but I switched to other strategies and didn't have any effect.
> 
> - Any ideas / thoughts on how I could better control the tuple generation rate other than using this sleep / awake pattern? 
> 
> 
> I appreciate your help.
> 
> Regards,
> 
> Wilson
> 
> 
> 
>

Re: Spouts tuple generation rate

Posted by Nathan Leung <nc...@gmail.com>.

firstly, sleep is imprecise, if you say "sleep(1)" this means "sleep for at
least 1 millisecond".

next, I would check to see if high resolution timers are supported and
enabled on your system (see for example http://linux.die.net/man/7/time).

If you are running Linux and don't have high resolution timers enabled your
sleep resolution is limited to the duration of a "jiffy", which on most
modern systems is 1ms.  This means that if you sleep(1), it will on average
sleep 1.5ms, which yields just over 660 tuples / s, roughly matching your
observation.


On Tue, Feb 24, 2015 at 3:37 PM, Wilson Akio Higashino <vi...@gmail.com>
wrote:

> Dear all,
>
> I have a simple topology composed of a spout followed by three bolts, and
> I want to measure the processing latency as a function of the tuple
> incoming rate.
>
> To execute this test, I created a Spout that from time to time "create" a
> new tuple and emit it to the topology. In order to control the generation
> rate, I simply sleep for a configurable period. The code follows the
> general idea present in some of the "storm-starter" topologies:
>
>    public void nextTuple() {
>         Utils.sleep(SLEEP_TIME);
>
>         // Create test tuple and emit
>    }
>
>
> For "slow" rates the spout can generate tuples with good accuracy. For
> example, if I sleep for 10 milliseconds, the rate should be around 100
> tuples/second - and I get around 92 tuples/second.
> However, if I increase the rate, the error becomes very large (for
> example, for 1 millisecond sleep, I get only 650 tuples/second instead of
> the theoretical 1000 tuples/second).
>
> In addition:
>
> - Everything is running on a single Worker.
>
> - Generally, there are no tuples waiting on any of the receiving / sending
> queues.
>
> - The code generating the tuple is not a bottleneck, because when I remove
> the Utils.sleep line I get a generation rate of over 10,000 tuples /
> second. This result also shows me that the topology can handle larger rates
> without problems.
>
>
> I understand that the way I am programming the "nextTuple" method only
> guarantees an upper bound on the generation rate, but I would like to have
> better control over it.
>
> My questions are:
>
> - Is there anything on Storm internals that justify this behaviour? I
> thought it could be related to the "SpoutWaitStrategy" associated with the
> Spout, but I switched to other strategies and didn't have any effect.
>
> - Any ideas / thoughts on how I could better control the tuple generation
> rate other than using this sleep / awake pattern?
>
>
> I appreciate your help.
>
> Regards,
>
> Wilson
>
>
>
>