You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Tian Guo <ti...@epfl.ch> on 2014/03/05 21:28:40 UTC

Questions about the nextTuple method in the Spout of Storm stream processing

I am developing some data analysis algorithms on top of Storm and have some
questions about the internal design of Storm.

I want to simulate a sensor data yielding and processing in Storm, and
therefore I use Spout to push sensor data into the succeeding bolts at a
constant time interval via setting a sleep method in nextTuple method of
Spout. But from the experiment results, it appeared that spout didn't push
data at the specified rate. In the experiment, there was no bottleneck bolt
in the system.

Therefore, my doubt is if the nextTuple method is called only when the
previous tuples are fully processed and acked in the ack method?

If this is true, does it means that I cannot set a fixed time interval to
emit data?

Thx a lot!

Re: Questions about the nextTuple method in the Spout of Storm stream processing

Posted by James Xu <xu...@gmail.com>.
Determined by these ifs:
./daemon/executor.clj-            (if (and (.isEmpty overflow-buffer)
./daemon/executor.clj-                     (or (not max-spout-pending)
./daemon/executor.clj-                         (< (.size pending) max-spout-pending)))
./daemon/executor.clj-              (if active?
./daemon/executor.clj-                (do
./daemon/executor.clj-                  (when-not @last-active
./daemon/executor.clj-                    (reset! last-active true)
./daemon/executor.clj-                    (log-message "Activating spout " component-id ":" (keys task-datas))
./daemon/executor.clj-                    (fast-list-iter [^ISpout spout spouts] (.activate spout)))
./daemon/executor.clj-
./daemon/executor.clj:                  (fast-list-iter [^ISpout spout spouts] (.nextTuple spout)))

On 2014年3月6日, at 下午9:06, Tian Guo <ti...@epfl.ch> wrote:

> But what factors determine when the nextTuple() should be invoked?
> 
> Thx!
> 
> 
> 2014-03-06 13:36 GMT+01:00 James Xu <xu...@gmail.com>:
> The caking of previous tuple has nothing to do with the invocation of nextTuple().
> 
> On 2014年3月6日, at 下午8:19, Tian Guo <ti...@epfl.ch> wrote:
> 
>> 
>> Thanks for your advice!
>> 
>> But my double still remains. Is the nextTuple method called only when the previous tuples are acked in the ack method? Anyone knows the internal strategy?
>> 
>> Thx!
>> 
>> Best,
>> 
>> 2014-03-06 8:14 GMT+01:00 James Xu <xu...@gmail.com>:
>> use Tick Tuple.
>> 
>> On 2014年3月6日, at 上午4:28, Tian Guo <ti...@epfl.ch> wrote:
>> 
>>> I am developing some data analysis algorithms on top of Storm and have some questions about the internal design of Storm. 
>>> 
>>> I want to simulate a sensor data yielding and processing in Storm, and therefore I use Spout to push sensor data into the succeeding bolts at a constant time interval via setting a sleep method in nextTuple method of Spout. But from the experiment results, it appeared that spout didn't push data at the specified rate. In the experiment, there was no bottleneck bolt in the system. 
>>> 
>>> Therefore, my doubt is if the nextTuple method is called only when the previous tuples are fully processed and acked in the ack method? 
>>> 
>>> If this is true, does it means that I cannot set a fixed time interval to emit data?
>>> 
>>> Thx a lot!
>> 
>> 
> 
> 


Re: Questions about the nextTuple method in the Spout of Storm stream processing

Posted by Tian Guo <ti...@epfl.ch>.
But what factors determine when the nextTuple() should be invoked?

Thx!


2014-03-06 13:36 GMT+01:00 James Xu <xu...@gmail.com>:

> The caking of previous tuple has nothing to do with the invocation of
> nextTuple().
>
> On 2014年3月6日, at 下午8:19, Tian Guo <ti...@epfl.ch> wrote:
>
>
> Thanks for your advice!
>
> But my double still remains. Is the nextTuple method called only when the
> previous tuples are acked in the ack method? Anyone knows the internal
> strategy?
>
> Thx!
>
> Best,
>
> 2014-03-06 8:14 GMT+01:00 James Xu <xu...@gmail.com>:
>
>> use Tick Tuple.
>>
>> On 2014年3月6日, at 上午4:28, Tian Guo <ti...@epfl.ch> wrote:
>>
>> I am developing some data analysis algorithms on top of Storm and have
>> some questions about the internal design of Storm.
>>
>> I want to simulate a sensor data yielding and processing in Storm, and
>> therefore I use Spout to push sensor data into the succeeding bolts at a
>> constant time interval via setting a sleep method in nextTuple method of
>> Spout. But from the experiment results, it appeared that spout didn't push
>> data at the specified rate. In the experiment, there was no bottleneck bolt
>> in the system.
>>
>> Therefore, my doubt is if the nextTuple method is called only when the
>> previous tuples are fully processed and acked in the ack method?
>>
>> If this is true, does it means that I cannot set a fixed time interval to
>> emit data?
>>
>> Thx a lot!
>>
>>
>>
>
>

Re: Questions about the nextTuple method in the Spout of Storm stream processing

Posted by James Xu <xu...@gmail.com>.
The caking of previous tuple has nothing to do with the invocation of nextTuple().

On 2014年3月6日, at 下午8:19, Tian Guo <ti...@epfl.ch> wrote:

> 
> Thanks for your advice!
> 
> But my double still remains. Is the nextTuple method called only when the previous tuples are acked in the ack method? Anyone knows the internal strategy?
> 
> Thx!
> 
> Best,
> 
> 2014-03-06 8:14 GMT+01:00 James Xu <xu...@gmail.com>:
> use Tick Tuple.
> 
> On 2014年3月6日, at 上午4:28, Tian Guo <ti...@epfl.ch> wrote:
> 
>> I am developing some data analysis algorithms on top of Storm and have some questions about the internal design of Storm. 
>> 
>> I want to simulate a sensor data yielding and processing in Storm, and therefore I use Spout to push sensor data into the succeeding bolts at a constant time interval via setting a sleep method in nextTuple method of Spout. But from the experiment results, it appeared that spout didn't push data at the specified rate. In the experiment, there was no bottleneck bolt in the system. 
>> 
>> Therefore, my doubt is if the nextTuple method is called only when the previous tuples are fully processed and acked in the ack method? 
>> 
>> If this is true, does it means that I cannot set a fixed time interval to emit data?
>> 
>> Thx a lot!
> 
> 


Re: Questions about the nextTuple method in the Spout of Storm stream processing

Posted by Tian Guo <ti...@epfl.ch>.
Thanks for your advice!

But my double still remains. Is the nextTuple method called only when the
previous tuples are acked in the ack method? Anyone knows the internal
strategy?

Thx!

Best,

2014-03-06 8:14 GMT+01:00 James Xu <xu...@gmail.com>:

> use Tick Tuple.
>
> On 2014年3月6日, at 上午4:28, Tian Guo <ti...@epfl.ch> wrote:
>
> I am developing some data analysis algorithms on top of Storm and have
> some questions about the internal design of Storm.
>
> I want to simulate a sensor data yielding and processing in Storm, and
> therefore I use Spout to push sensor data into the succeeding bolts at a
> constant time interval via setting a sleep method in nextTuple method of
> Spout. But from the experiment results, it appeared that spout didn't push
> data at the specified rate. In the experiment, there was no bottleneck bolt
> in the system.
>
> Therefore, my doubt is if the nextTuple method is called only when the
> previous tuples are fully processed and acked in the ack method?
>
> If this is true, does it means that I cannot set a fixed time interval to
> emit data?
>
> Thx a lot!
>
>
>

Re: Questions about the nextTuple method in the Spout of Storm stream processing

Posted by James Xu <xu...@gmail.com>.
use Tick Tuple.
On 2014年3月6日, at 上午4:28, Tian Guo <ti...@epfl.ch> wrote:

> I am developing some data analysis algorithms on top of Storm and have some questions about the internal design of Storm. 
> 
> I want to simulate a sensor data yielding and processing in Storm, and therefore I use Spout to push sensor data into the succeeding bolts at a constant time interval via setting a sleep method in nextTuple method of Spout. But from the experiment results, it appeared that spout didn't push data at the specified rate. In the experiment, there was no bottleneck bolt in the system. 
> 
> Therefore, my doubt is if the nextTuple method is called only when the previous tuples are fully processed and acked in the ack method? 
> 
> If this is true, does it means that I cannot set a fixed time interval to emit data?
> 
> Thx a lot!


Reg: groupby

Posted by "Jahagirdar, Madhu" <ma...@philips.com>.
I was trying to debugging and I accidentally commented out the groupBy for debugging purpose and I found interesting output. So thought of mailing the user group to understand things better.

The following statements  instead of counting the words like ( [test,2],[healthcare, 3] ) , it does global grouping with the global key in this case ([$REDIS-MAP-STATE-GLOBAL, 100]). The count per word is not done. I am not sure why.

stream
.each(new Fields("tweet"), new Split() ,new Fields("words"))

//.groupBy(new Fields("words"))

.persistentAggregate(redisStateFactory, new Fields("words"),new CustomCount(), new Fields("invindex"))

.newValuesStream()
.partitionPersist(redisPublisher, new RedisPublisher("WordCount"));

if I uncomment groupby then it all works ok. So, does the groupby replaces the gloablkey and produces key per word ?

Regards,
Madhu Jahagirdar


________________________________
The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.

Re: Questions about the nextTuple method in the Spout of Storm stream processing

Posted by Tian Guo <ti...@epfl.ch>.
Thank you very much for the above replies! They are very helpful.


2014-03-08 15:30 GMT+01:00 Sean Allen <se...@monkeysnatchbanana.com>:

> You can run into trouble trying to get a fixed time interval.
>
> If you haven't processed enough of the previous tuples, you can bring your
> topology to a halt.
> Proceed with caution.
>
>
> On Wed, Mar 5, 2014 at 2:28 PM, Tian Guo <ti...@epfl.ch> wrote:
>
>> I am developing some data analysis algorithms on top of Storm and have
>> some questions about the internal design of Storm.
>>
>> I want to simulate a sensor data yielding and processing in Storm, and
>> therefore I use Spout to push sensor data into the succeeding bolts at a
>> constant time interval via setting a sleep method in nextTuple method of
>> Spout. But from the experiment results, it appeared that spout didn't push
>> data at the specified rate. In the experiment, there was no bottleneck bolt
>> in the system.
>>
>> Therefore, my doubt is if the nextTuple method is called only when the
>> previous tuples are fully processed and acked in the ack method?
>>
>> If this is true, does it means that I cannot set a fixed time interval to
>> emit data?
>>
>> Thx a lot!
>>
>
>
>
> --
>
> Ce n'est pas une signature
>

Re: Questions about the nextTuple method in the Spout of Storm stream processing

Posted by Sean Allen <se...@monkeysnatchbanana.com>.
You can run into trouble trying to get a fixed time interval.

If you haven't processed enough of the previous tuples, you can bring your
topology to a halt.
Proceed with caution.


On Wed, Mar 5, 2014 at 2:28 PM, Tian Guo <ti...@epfl.ch> wrote:

> I am developing some data analysis algorithms on top of Storm and have
> some questions about the internal design of Storm.
>
> I want to simulate a sensor data yielding and processing in Storm, and
> therefore I use Spout to push sensor data into the succeeding bolts at a
> constant time interval via setting a sleep method in nextTuple method of
> Spout. But from the experiment results, it appeared that spout didn't push
> data at the specified rate. In the experiment, there was no bottleneck bolt
> in the system.
>
> Therefore, my doubt is if the nextTuple method is called only when the
> previous tuples are fully processed and acked in the ack method?
>
> If this is true, does it means that I cannot set a fixed time interval to
> emit data?
>
> Thx a lot!
>



-- 

Ce n'est pas une signature