You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Nick R. Katsipoulakis" <ni...@gmail.com> on 2015/07/19 19:40:21 UTC

Is Storm doing any micro-batching of tuples?

Hello all,

I have a topology in which a Spout (A) emits tuples to a Bolt (B) and in
turn, B emits tuples to a Bolt (C).

In order to perform some measurements in my topology I have Spout A send
some two types of tuples: normal data tuples and latency-measure tuples.

After sending a user-defined number of data tuples, A initiates a sequence
by sending a latency-tuple, with a 1 second time difference between them.
So, after sending the first latency-measure tuple, it sends data tuples
until one 1 second has passed, and then sends the next latency-measure
tuple. So, the input stream of B would look something like the following:

DDDDD(L1)DDD--for 1 second--DDD(L2)DDDD....

The strange thing I see in Bolt B is that the time difference between the
arrival times of L1 and L2 are not >= 1 second, which is the time gap that
I expect to see.

Why is the above happening? Does Storm do some kind of micro-batching so
that the two tuples L1 and L2 appear in B with time difference less than 1
second?

Thanks,
Nikos

Re: Is Storm doing any micro-batching of tuples?

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

Hello Niels,

I am pretty sure I ack every tuple, since I can see in my bolts/spouts that
the emitted tuples are equal to the acked tuples (in the Storm UI). I will
try to look more into the settings you mentioned.

However, I think that the micro-batching I think is happening, is done by
Netty and not by Storm itself.

Anyway, thank you very much for your time.

Sincerely,
Nikos

2015-07-24 10:29 GMT-04:00 Niels Basjes <Ni...@basjes.nl>:

> As far as my knowledge goes this means storm is doing "immediate"
> processing.
> Something you must remember is that if you have the tuples acknowledged
> then there are settings that have to do with timeouts and maximum number of
> tuples "in flight".
> Set these "wrong" and you may see the effects you have.
> Or perhaps you forget to ack all tuples?
>
> Niels Basjes
>
> On Sun, 19 Jul 2015 22:46 Nick R. Katsipoulakis <ni...@gmail.com>
> wrote:
>
>> Hello,
>>
>> No, I am not. Also, I am using direct-grouping for sending tuples between
>> the spout and the bolts.
>>
>> Nikos
>>
>> 2015-07-19 14:40 GMT-04:00 Niels Basjes <Ni...@basjes.nl>:
>>
>>> Do you use Trident or the more low level API?
>>>
>>> Niels
>>>
>>> On Sun, Jul 19, 2015 at 7:40 PM, Nick R. Katsipoulakis
>>> <ni...@gmail.com> wrote:
>>> > Hello all,
>>> >
>>> > I have a topology in which a Spout (A) emits tuples to a Bolt (B) and
>>> in
>>> > turn, B emits tuples to a Bolt (C).
>>> >
>>> > In order to perform some measurements in my topology I have Spout A
>>> send
>>> > some two types of tuples: normal data tuples and latency-measure
>>> tuples.
>>> >
>>> > After sending a user-defined number of data tuples, A initiates a
>>> sequence
>>> > by sending a latency-tuple, with a 1 second time difference between
>>> them.
>>> > So, after sending the first latency-measure tuple, it sends data tuples
>>> > until one 1 second has passed, and then sends the next latency-measure
>>> > tuple. So, the input stream of B would look something like the
>>> following:
>>> >
>>> > DDDDD(L1)DDD--for 1 second--DDD(L2)DDDD....
>>> >
>>> > The strange thing I see in Bolt B is that the time difference between
>>> the
>>> > arrival times of L1 and L2 are not >= 1 second, which is the time gap
>>> that I
>>> > expect to see.
>>> >
>>> > Why is the above happening? Does Storm do some kind of micro-batching
>>> so
>>> > that the two tuples L1 and L2 appear in B with time difference less
>>> than 1
>>> > second?
>>> >
>>> > Thanks,
>>> > Nikos
>>> >
>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>>
>> --
>> Nikolaos Romanos Katsipoulakis,
>> University of Pittsburgh, PhD candidate
>>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Is Storm doing any micro-batching of tuples?

Posted by Niels Basjes <Ni...@basjes.nl>.

As far as my knowledge goes this means storm is doing "immediate"
processing.
Something you must remember is that if you have the tuples acknowledged
then there are settings that have to do with timeouts and maximum number of
tuples "in flight".
Set these "wrong" and you may see the effects you have.
Or perhaps you forget to ack all tuples?

Niels Basjes

On Sun, 19 Jul 2015 22:46 Nick R. Katsipoulakis <ni...@gmail.com>
wrote:

> Hello,
>
> No, I am not. Also, I am using direct-grouping for sending tuples between
> the spout and the bolts.
>
> Nikos
>
> 2015-07-19 14:40 GMT-04:00 Niels Basjes <Ni...@basjes.nl>:
>
>> Do you use Trident or the more low level API?
>>
>> Niels
>>
>> On Sun, Jul 19, 2015 at 7:40 PM, Nick R. Katsipoulakis
>> <ni...@gmail.com> wrote:
>> > Hello all,
>> >
>> > I have a topology in which a Spout (A) emits tuples to a Bolt (B) and in
>> > turn, B emits tuples to a Bolt (C).
>> >
>> > In order to perform some measurements in my topology I have Spout A send
>> > some two types of tuples: normal data tuples and latency-measure tuples.
>> >
>> > After sending a user-defined number of data tuples, A initiates a
>> sequence
>> > by sending a latency-tuple, with a 1 second time difference between
>> them.
>> > So, after sending the first latency-measure tuple, it sends data tuples
>> > until one 1 second has passed, and then sends the next latency-measure
>> > tuple. So, the input stream of B would look something like the
>> following:
>> >
>> > DDDDD(L1)DDD--for 1 second--DDD(L2)DDDD....
>> >
>> > The strange thing I see in Bolt B is that the time difference between
>> the
>> > arrival times of L1 and L2 are not >= 1 second, which is the time gap
>> that I
>> > expect to see.
>> >
>> > Why is the above happening? Does Storm do some kind of micro-batching so
>> > that the two tuples L1 and L2 appear in B with time difference less
>> than 1
>> > second?
>> >
>> > Thanks,
>> > Nikos
>> >
>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate
>

Re: Is Storm doing any micro-batching of tuples?

Posted by "Nick R. Katsipoulakis" <ni...@gmail.com>.

Hello,

No, I am not. Also, I am using direct-grouping for sending tuples between
the spout and the bolts.

Nikos

2015-07-19 14:40 GMT-04:00 Niels Basjes <Ni...@basjes.nl>:

> Do you use Trident or the more low level API?
>
> Niels
>
> On Sun, Jul 19, 2015 at 7:40 PM, Nick R. Katsipoulakis
> <ni...@gmail.com> wrote:
> > Hello all,
> >
> > I have a topology in which a Spout (A) emits tuples to a Bolt (B) and in
> > turn, B emits tuples to a Bolt (C).
> >
> > In order to perform some measurements in my topology I have Spout A send
> > some two types of tuples: normal data tuples and latency-measure tuples.
> >
> > After sending a user-defined number of data tuples, A initiates a
> sequence
> > by sending a latency-tuple, with a 1 second time difference between them.
> > So, after sending the first latency-measure tuple, it sends data tuples
> > until one 1 second has passed, and then sends the next latency-measure
> > tuple. So, the input stream of B would look something like the following:
> >
> > DDDDD(L1)DDD--for 1 second--DDD(L2)DDDD....
> >
> > The strange thing I see in Bolt B is that the time difference between the
> > arrival times of L1 and L2 are not >= 1 second, which is the time gap
> that I
> > expect to see.
> >
> > Why is the above happening? Does Storm do some kind of micro-batching so
> > that the two tuples L1 and L2 appear in B with time difference less than
> 1
> > second?
> >
> > Thanks,
> > Nikos
> >
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Re: Is Storm doing any micro-batching of tuples?

Posted by Niels Basjes <Ni...@basjes.nl>.

Do you use Trident or the more low level API?

Niels

On Sun, Jul 19, 2015 at 7:40 PM, Nick R. Katsipoulakis
<ni...@gmail.com> wrote:
> Hello all,
>
> I have a topology in which a Spout (A) emits tuples to a Bolt (B) and in
> turn, B emits tuples to a Bolt (C).
>
> In order to perform some measurements in my topology I have Spout A send
> some two types of tuples: normal data tuples and latency-measure tuples.
>
> After sending a user-defined number of data tuples, A initiates a sequence
> by sending a latency-tuple, with a 1 second time difference between them.
> So, after sending the first latency-measure tuple, it sends data tuples
> until one 1 second has passed, and then sends the next latency-measure
> tuple. So, the input stream of B would look something like the following:
>
> DDDDD(L1)DDD--for 1 second--DDD(L2)DDDD....
>
> The strange thing I see in Bolt B is that the time difference between the
> arrival times of L1 and L2 are not >= 1 second, which is the time gap that I
> expect to see.
>
> Why is the above happening? Does Storm do some kind of micro-batching so
> that the two tuples L1 and L2 appear in B with time difference less than 1
> second?
>
> Thanks,
> Nikos
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes