You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Kashyap Mhaisekar <ka...@gmail.com> on 2015/07/29 00:24:53 UTC

Max Spout Pending - Question

Hi,
Does Max Spout Topology limitation apply to tuples emitted out of bolts too?
For E.g.,.
1. MAX_SPOUT_PENDING value is 1000
2. My Spout calls a bolt which emits 1000 tuples

Does this mean there can be 1000X1000 tuples in the topology? Or does it
mean that only one tuple is emitted from Spout because each bolt emits 1000
tuples?

Thanks
kashyap

Re: Max Spout Pending - Question

Posted by Kashyap Mhaisekar <ka...@gmail.com>.

Thanks Nathan.
Will go with assumption that if latency at spout is much higher than
combined latencies at rest of bolts, then the additional latencies at spout
are due to messages piling on out bound queue..

Regards
Kashyap
On Jul 29, 2015 1:49 PM, "Nathan Leung" <nc...@gmail.com> wrote:

> Roughly speaking (disregarding any network latencies and any benefits from
> having multiple threads servicing the output queues vs having 1), having 1
> spout with high max pending should be similar to many spouts with low max
> pending.  Your total number of tuples pending in the topology is the same
> either way.
>
> Spout latency includes processing time for the for the entire tuple tree.
> https://storm.apache.org/documentation/Guaranteeing-message-processing.html
>
> Also I would review
> https://storm.apache.org/documentation/Acking-framework-implementation.html
> with regards to your timeout question.
>
> On Wed, Jul 29, 2015 at 2:16 PM, Kashyap Mhaisekar <ka...@gmail.com>
> wrote:
>
>> Nathan,
>> So the following is true? -
>> Spout Latency = (Time spent output queues *[A]*)+(time difference
>> between emit from NextTuple and time spent in acking *[B]*)
>>
>> So does it mean that if the complete latency at spout level is high but
>> the bolts have very low latencies, then instead of increasing the Max Spout
>> Pending, we can keep the max spout pending to a low number but increase the
>> parallelism in the Spout so that the overall messages in the topology could
>> be higher but the load on individual spout instance is low.
>>
>> I was using a RedisSpout that gets message from a Redis Publish and then
>> populates a in memory queue. The nextTuple feeds off this queue. I am
>> constrained to use only one instance of Spout as multiple instances meant
>> all of them listen to the same message and call topology.
>>
>> Thanks
>> Kashyap
>>
>> On Wed, Jul 29, 2015 at 11:59 AM, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> 1 second is too short.  Spout latency includes time spent in the output
>>> queue from the spout (increasing max spout pending potentially increases
>>> your end-to-end latency, depending on whether you have anything buffered in
>>> the spout output queues).
>>>
>>> On Wed, Jul 29, 2015 at 12:40 PM, Kashyap Mhaisekar <kashyap.m@gmail.com
>>> > wrote:
>>>
>>>> Thanks Nathan. But in this case how should the Spout Latency be
>>>> interpreted. In the same example you quoted above -
>>>> spout a -> bolt b (emits 10 tuples per msg) -> bolt c
>>>>
>>>> I see the process latency and execute latencies under 5 ms both for B
>>>> and C. While the spout is at 1500ms. The Bolts dont don anything much other
>>>> than appending to an existing string. From what I understand, the complete
>>>> latency at a spout level is the time spent from nextTuple() to the time
>>>> ack() is called (if successful) and does not include the time the message
>>>> is spent waiting because of the Max Spout Topology property. To add to the
>>>> mysterry, i set the messasge timeout is at 1 sec. I dont see any failures
>>>> (fail() not called) but the spout latency is at 1.5 seconds.
>>>>
>>>> Regards,
>>>> Kashyap
>>>>
>>>> On Wed, Jul 29, 2015 at 10:35 AM, Nathan Leung <nc...@gmail.com>
>>>> wrote:
>>>>
>>>>> No.  You need to consider your system more carefully.  As a trivial
>>>>> example, imagine you have spout a -> bolt b -> bolt c, with bolt b
>>>>> splitting tuple into 10 tuples.  Each component has 1 task.  If each
>>>>> component takes 1ms, your latency will not be the sum of the per bolt
>>>>> latency because of your fan out.
>>>>> On Jul 29, 2015 11:25 AM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Nathan.
>>>>>>
>>>>>> If I see the complete latency at spout is greater than the process
>>>>>> latencies of all bolts put together, does it mean that the ACKERS are a
>>>>>> problem and need to be increased?
>>>>>>
>>>>>> thanks
>>>>>> kashyap
>>>>>>
>>>>>> On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <nc...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The count is tracked from each spout task and does not include bolt
>>>>>>> fan out. If the setting is 100 and you have 8 spout tasks you can have 800
>>>>>>> tuples from the spout in your system.
>>>>>>> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> Does Max Spout Topology limitation apply to tuples emitted out of
>>>>>>>> bolts too?
>>>>>>>> For E.g.,.
>>>>>>>> 1. MAX_SPOUT_PENDING value is 1000
>>>>>>>> 2. My Spout calls a bolt which emits 1000 tuples
>>>>>>>>
>>>>>>>> Does this mean there can be 1000X1000 tuples in the topology? Or
>>>>>>>> does it mean that only one tuple is emitted from Spout because each bolt
>>>>>>>> emits 1000 tuples?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> kashyap
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Max Spout Pending - Question

Posted by Nathan Leung <nc...@gmail.com>.

Roughly speaking (disregarding any network latencies and any benefits from
having multiple threads servicing the output queues vs having 1), having 1
spout with high max pending should be similar to many spouts with low max
pending.  Your total number of tuples pending in the topology is the same
either way.

Spout latency includes processing time for the for the entire tuple tree.
https://storm.apache.org/documentation/Guaranteeing-message-processing.html

Also I would review
https://storm.apache.org/documentation/Acking-framework-implementation.html
with regards to your timeout question.

On Wed, Jul 29, 2015 at 2:16 PM, Kashyap Mhaisekar <ka...@gmail.com>
wrote:

> Nathan,
> So the following is true? -
> Spout Latency = (Time spent output queues *[A]*)+(time difference between
> emit from NextTuple and time spent in acking *[B]*)
>
> So does it mean that if the complete latency at spout level is high but
> the bolts have very low latencies, then instead of increasing the Max Spout
> Pending, we can keep the max spout pending to a low number but increase the
> parallelism in the Spout so that the overall messages in the topology could
> be higher but the load on individual spout instance is low.
>
> I was using a RedisSpout that gets message from a Redis Publish and then
> populates a in memory queue. The nextTuple feeds off this queue. I am
> constrained to use only one instance of Spout as multiple instances meant
> all of them listen to the same message and call topology.
>
> Thanks
> Kashyap
>
> On Wed, Jul 29, 2015 at 11:59 AM, Nathan Leung <nc...@gmail.com> wrote:
>
>> 1 second is too short.  Spout latency includes time spent in the output
>> queue from the spout (increasing max spout pending potentially increases
>> your end-to-end latency, depending on whether you have anything buffered in
>> the spout output queues).
>>
>> On Wed, Jul 29, 2015 at 12:40 PM, Kashyap Mhaisekar <ka...@gmail.com>
>> wrote:
>>
>>> Thanks Nathan. But in this case how should the Spout Latency be
>>> interpreted. In the same example you quoted above -
>>> spout a -> bolt b (emits 10 tuples per msg) -> bolt c
>>>
>>> I see the process latency and execute latencies under 5 ms both for B
>>> and C. While the spout is at 1500ms. The Bolts dont don anything much other
>>> than appending to an existing string. From what I understand, the complete
>>> latency at a spout level is the time spent from nextTuple() to the time
>>> ack() is called (if successful) and does not include the time the message
>>> is spent waiting because of the Max Spout Topology property. To add to the
>>> mysterry, i set the messasge timeout is at 1 sec. I dont see any failures
>>> (fail() not called) but the spout latency is at 1.5 seconds.
>>>
>>> Regards,
>>> Kashyap
>>>
>>> On Wed, Jul 29, 2015 at 10:35 AM, Nathan Leung <nc...@gmail.com>
>>> wrote:
>>>
>>>> No.  You need to consider your system more carefully.  As a trivial
>>>> example, imagine you have spout a -> bolt b -> bolt c, with bolt b
>>>> splitting tuple into 10 tuples.  Each component has 1 task.  If each
>>>> component takes 1ms, your latency will not be the sum of the per bolt
>>>> latency because of your fan out.
>>>> On Jul 29, 2015 11:25 AM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Nathan.
>>>>>
>>>>> If I see the complete latency at spout is greater than the process
>>>>> latencies of all bolts put together, does it mean that the ACKERS are a
>>>>> problem and need to be increased?
>>>>>
>>>>> thanks
>>>>> kashyap
>>>>>
>>>>> On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <nc...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The count is tracked from each spout task and does not include bolt
>>>>>> fan out. If the setting is 100 and you have 8 spout tasks you can have 800
>>>>>> tuples from the spout in your system.
>>>>>> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> Does Max Spout Topology limitation apply to tuples emitted out of
>>>>>>> bolts too?
>>>>>>> For E.g.,.
>>>>>>> 1. MAX_SPOUT_PENDING value is 1000
>>>>>>> 2. My Spout calls a bolt which emits 1000 tuples
>>>>>>>
>>>>>>> Does this mean there can be 1000X1000 tuples in the topology? Or
>>>>>>> does it mean that only one tuple is emitted from Spout because each bolt
>>>>>>> emits 1000 tuples?
>>>>>>>
>>>>>>> Thanks
>>>>>>> kashyap
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Max Spout Pending - Question

Posted by Kashyap Mhaisekar <ka...@gmail.com>.

Nathan,
So the following is true? -
Spout Latency = (Time spent output queues *[A]*)+(time difference between
emit from NextTuple and time spent in acking *[B]*)

So does it mean that if the complete latency at spout level is high but the
bolts have very low latencies, then instead of increasing the Max Spout
Pending, we can keep the max spout pending to a low number but increase the
parallelism in the Spout so that the overall messages in the topology could
be higher but the load on individual spout instance is low.

I was using a RedisSpout that gets message from a Redis Publish and then
populates a in memory queue. The nextTuple feeds off this queue. I am
constrained to use only one instance of Spout as multiple instances meant
all of them listen to the same message and call topology.

Thanks
Kashyap

On Wed, Jul 29, 2015 at 11:59 AM, Nathan Leung <nc...@gmail.com> wrote:

> 1 second is too short.  Spout latency includes time spent in the output
> queue from the spout (increasing max spout pending potentially increases
> your end-to-end latency, depending on whether you have anything buffered in
> the spout output queues).
>
> On Wed, Jul 29, 2015 at 12:40 PM, Kashyap Mhaisekar <ka...@gmail.com>
> wrote:
>
>> Thanks Nathan. But in this case how should the Spout Latency be
>> interpreted. In the same example you quoted above -
>> spout a -> bolt b (emits 10 tuples per msg) -> bolt c
>>
>> I see the process latency and execute latencies under 5 ms both for B and
>> C. While the spout is at 1500ms. The Bolts dont don anything much other
>> than appending to an existing string. From what I understand, the complete
>> latency at a spout level is the time spent from nextTuple() to the time
>> ack() is called (if successful) and does not include the time the message
>> is spent waiting because of the Max Spout Topology property. To add to the
>> mysterry, i set the messasge timeout is at 1 sec. I dont see any failures
>> (fail() not called) but the spout latency is at 1.5 seconds.
>>
>> Regards,
>> Kashyap
>>
>> On Wed, Jul 29, 2015 at 10:35 AM, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> No.  You need to consider your system more carefully.  As a trivial
>>> example, imagine you have spout a -> bolt b -> bolt c, with bolt b
>>> splitting tuple into 10 tuples.  Each component has 1 task.  If each
>>> component takes 1ms, your latency will not be the sum of the per bolt
>>> latency because of your fan out.
>>> On Jul 29, 2015 11:25 AM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Nathan.
>>>>
>>>> If I see the complete latency at spout is greater than the process
>>>> latencies of all bolts put together, does it mean that the ACKERS are a
>>>> problem and need to be increased?
>>>>
>>>> thanks
>>>> kashyap
>>>>
>>>> On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <nc...@gmail.com>
>>>> wrote:
>>>>
>>>>> The count is tracked from each spout task and does not include bolt
>>>>> fan out. If the setting is 100 and you have 8 spout tasks you can have 800
>>>>> tuples from the spout in your system.
>>>>> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> Does Max Spout Topology limitation apply to tuples emitted out of
>>>>>> bolts too?
>>>>>> For E.g.,.
>>>>>> 1. MAX_SPOUT_PENDING value is 1000
>>>>>> 2. My Spout calls a bolt which emits 1000 tuples
>>>>>>
>>>>>> Does this mean there can be 1000X1000 tuples in the topology? Or does
>>>>>> it mean that only one tuple is emitted from Spout because each bolt emits
>>>>>> 1000 tuples?
>>>>>>
>>>>>> Thanks
>>>>>> kashyap
>>>>>>
>>>>>
>>>>
>>
>

Re: Max Spout Pending - Question

Posted by Nathan Leung <nc...@gmail.com>.

1 second is too short.  Spout latency includes time spent in the output
queue from the spout (increasing max spout pending potentially increases
your end-to-end latency, depending on whether you have anything buffered in
the spout output queues).

On Wed, Jul 29, 2015 at 12:40 PM, Kashyap Mhaisekar <ka...@gmail.com>
wrote:

> Thanks Nathan. But in this case how should the Spout Latency be
> interpreted. In the same example you quoted above -
> spout a -> bolt b (emits 10 tuples per msg) -> bolt c
>
> I see the process latency and execute latencies under 5 ms both for B and
> C. While the spout is at 1500ms. The Bolts dont don anything much other
> than appending to an existing string. From what I understand, the complete
> latency at a spout level is the time spent from nextTuple() to the time
> ack() is called (if successful) and does not include the time the message
> is spent waiting because of the Max Spout Topology property. To add to the
> mysterry, i set the messasge timeout is at 1 sec. I dont see any failures
> (fail() not called) but the spout latency is at 1.5 seconds.
>
> Regards,
> Kashyap
>
> On Wed, Jul 29, 2015 at 10:35 AM, Nathan Leung <nc...@gmail.com> wrote:
>
>> No.  You need to consider your system more carefully.  As a trivial
>> example, imagine you have spout a -> bolt b -> bolt c, with bolt b
>> splitting tuple into 10 tuples.  Each component has 1 task.  If each
>> component takes 1ms, your latency will not be the sum of the per bolt
>> latency because of your fan out.
>> On Jul 29, 2015 11:25 AM, "Kashyap Mhaisekar" <ka...@gmail.com>
>> wrote:
>>
>>> Thanks Nathan.
>>>
>>> If I see the complete latency at spout is greater than the process
>>> latencies of all bolts put together, does it mean that the ACKERS are a
>>> problem and need to be increased?
>>>
>>> thanks
>>> kashyap
>>>
>>> On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> The count is tracked from each spout task and does not include bolt fan
>>>> out. If the setting is 100 and you have 8 spout tasks you can have 800
>>>> tuples from the spout in your system.
>>>> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> Does Max Spout Topology limitation apply to tuples emitted out of
>>>>> bolts too?
>>>>> For E.g.,.
>>>>> 1. MAX_SPOUT_PENDING value is 1000
>>>>> 2. My Spout calls a bolt which emits 1000 tuples
>>>>>
>>>>> Does this mean there can be 1000X1000 tuples in the topology? Or does
>>>>> it mean that only one tuple is emitted from Spout because each bolt emits
>>>>> 1000 tuples?
>>>>>
>>>>> Thanks
>>>>> kashyap
>>>>>
>>>>
>>>
>

Re: Max Spout Pending - Question

Posted by Kashyap Mhaisekar <ka...@gmail.com>.

Thanks Nathan. But in this case how should the Spout Latency be
interpreted. In the same example you quoted above -
spout a -> bolt b (emits 10 tuples per msg) -> bolt c

I see the process latency and execute latencies under 5 ms both for B and
C. While the spout is at 1500ms. The Bolts dont don anything much other
than appending to an existing string. From what I understand, the complete
latency at a spout level is the time spent from nextTuple() to the time
ack() is called (if successful) and does not include the time the message
is spent waiting because of the Max Spout Topology property. To add to the
mysterry, i set the messasge timeout is at 1 sec. I dont see any failures
(fail() not called) but the spout latency is at 1.5 seconds.

Regards,
Kashyap

On Wed, Jul 29, 2015 at 10:35 AM, Nathan Leung <nc...@gmail.com> wrote:

> No.  You need to consider your system more carefully.  As a trivial
> example, imagine you have spout a -> bolt b -> bolt c, with bolt b
> splitting tuple into 10 tuples.  Each component has 1 task.  If each
> component takes 1ms, your latency will not be the sum of the per bolt
> latency because of your fan out.
> On Jul 29, 2015 11:25 AM, "Kashyap Mhaisekar" <ka...@gmail.com> wrote:
>
>> Thanks Nathan.
>>
>> If I see the complete latency at spout is greater than the process
>> latencies of all bolts put together, does it mean that the ACKERS are a
>> problem and need to be increased?
>>
>> thanks
>> kashyap
>>
>> On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> The count is tracked from each spout task and does not include bolt fan
>>> out. If the setting is 100 and you have 8 spout tasks you can have 800
>>> tuples from the spout in your system.
>>> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> Does Max Spout Topology limitation apply to tuples emitted out of bolts
>>>> too?
>>>> For E.g.,.
>>>> 1. MAX_SPOUT_PENDING value is 1000
>>>> 2. My Spout calls a bolt which emits 1000 tuples
>>>>
>>>> Does this mean there can be 1000X1000 tuples in the topology? Or does
>>>> it mean that only one tuple is emitted from Spout because each bolt emits
>>>> 1000 tuples?
>>>>
>>>> Thanks
>>>> kashyap
>>>>
>>>
>>

Re: Max Spout Pending - Question

Posted by Nathan Leung <nc...@gmail.com>.

No.  You need to consider your system more carefully.  As a trivial
example, imagine you have spout a -> bolt b -> bolt c, with bolt b
splitting tuple into 10 tuples.  Each component has 1 task.  If each
component takes 1ms, your latency will not be the sum of the per bolt
latency because of your fan out.
On Jul 29, 2015 11:25 AM, "Kashyap Mhaisekar" <ka...@gmail.com> wrote:

> Thanks Nathan.
>
> If I see the complete latency at spout is greater than the process
> latencies of all bolts put together, does it mean that the ACKERS are a
> problem and need to be increased?
>
> thanks
> kashyap
>
> On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> The count is tracked from each spout task and does not include bolt fan
>> out. If the setting is 100 and you have 8 spout tasks you can have 800
>> tuples from the spout in your system.
>> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com> wrote:
>>
>>> Hi,
>>> Does Max Spout Topology limitation apply to tuples emitted out of bolts
>>> too?
>>> For E.g.,.
>>> 1. MAX_SPOUT_PENDING value is 1000
>>> 2. My Spout calls a bolt which emits 1000 tuples
>>>
>>> Does this mean there can be 1000X1000 tuples in the topology? Or does it
>>> mean that only one tuple is emitted from Spout because each bolt emits 1000
>>> tuples?
>>>
>>> Thanks
>>> kashyap
>>>
>>
>

Re: Max Spout Pending - Question

Posted by Kashyap Mhaisekar <ka...@gmail.com>.

Thanks Nathan.

If I see the complete latency at spout is greater than the process
latencies of all bolts put together, does it mean that the ACKERS are a
problem and need to be increased?

thanks
kashyap

On Tue, Jul 28, 2015 at 7:30 PM, Nathan Leung <nc...@gmail.com> wrote:

> The count is tracked from each spout task and does not include bolt fan
> out. If the setting is 100 and you have 8 spout tasks you can have 800
> tuples from the spout in your system.
> On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com> wrote:
>
>> Hi,
>> Does Max Spout Topology limitation apply to tuples emitted out of bolts
>> too?
>> For E.g.,.
>> 1. MAX_SPOUT_PENDING value is 1000
>> 2. My Spout calls a bolt which emits 1000 tuples
>>
>> Does this mean there can be 1000X1000 tuples in the topology? Or does it
>> mean that only one tuple is emitted from Spout because each bolt emits 1000
>> tuples?
>>
>> Thanks
>> kashyap
>>
>

Re: Max Spout Pending - Question

Posted by Nathan Leung <nc...@gmail.com>.

The count is tracked from each spout task and does not include bolt fan
out. If the setting is 100 and you have 8 spout tasks you can have 800
tuples from the spout in your system.
On Jul 28, 2015 6:25 PM, "Kashyap Mhaisekar" <ka...@gmail.com> wrote:

> Hi,
> Does Max Spout Topology limitation apply to tuples emitted out of bolts
> too?
> For E.g.,.
> 1. MAX_SPOUT_PENDING value is 1000
> 2. My Spout calls a bolt which emits 1000 tuples
>
> Does this mean there can be 1000X1000 tuples in the topology? Or does it
> mean that only one tuple is emitted from Spout because each bolt emits 1000
> tuples?
>
> Thanks
> kashyap
>