You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Le, Binh T." <bi...@accenture.com> on 2021/11/08 19:31:38 UTC

Storm 2 Spout Not Acking, Failing Tuples

Hi,

We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar to this<https://www.mail-archive.com/user@storm.apache.org/msg10013.html>, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting "dropped", exceeding the max retries configured.

In storm 1.2.1, the same set of storm configs work fine. It's only after upgrading that we're seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following:

  1.  Increasing the topology message timeout
  2.  Increasing max spout pending
  3.  Increasing number of workers
  4.  Increasing executor send and transfer buffer size
  5.  Extending the back pressure interval check (since it can't be disabled, which is the old behavior in storm 1)
  6.  Disabling load aware messaging

Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams.

Thanks,
Binh


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com

Re: Storm 2 Spout Not Acking, Failing Tuples

Posted by Bipin Prasad <bi...@yahoo.com>.
 If you have raised a Jira can you also attach your screen image.
    On Friday, November 19, 2021, 10:30:51 AM CST, Le, Binh T. <bi...@accenture.com> wrote:  
 
 
In troubleshooting in this issue, we’ve also noticed 2 strange behaviors with the bolt executor metrics shown in Storm UI.
 
  
    
   - Executed column shown is zero, but Acked column is greater than 0. How is that possible as acking is done within execute().
   - Executed column shown is zero, but Emitted/Transferred column is greater than 0.
 
  
 
Can anyone help explain this behavior?
 
  
 
From: Le, Binh T. 
Sent: Tuesday, November 16, 2021 1:04 PM
To: user@storm.apache.org
Subject: RE: Storm 2 Spout Not Acking, Failing Tuples
 
  
 
To follow up, we’ve looked at Storm UI and it shows that the tuples are not evenly distributed. We know this has to do with the grouping. We use shuffle grouping (and even disabled load aware messaging), but the tuples are still not evenly distributed.
 
  
 
It seems like others have the same experience when upgrading to storm 2.2 from storm 1.2. Here are a few of those posts. We sense a theme here with this issue and are surprised there’s no documentation or response to address this.
 
  
    
   - https://www.mail-archive.com/user@storm.apache.org/msg10070.html
   - https://www.mail-archive.com/user@storm.apache.org/msg10013.html
   - https://www.mail-archive.com/user@storm.apache.org/msg10114.html
 
  
 
Also, for us, when we include other bolts to the topology, we’ve even seen the tuples not making it to the bolts at all, at least that is what Storm UI is showing (no tuples executed or acked and bolt capacity is always zero).
 
  
 
Please help advise us on what should be done to troubleshoot/fix this. Any help is greatly appreciated.
 
  
 
Thanks!
 
  
 
From: Le, Binh T. 
Sent: Monday, November 8, 2021 2:32 PM
To: 'user@storm.apache.org' <us...@storm.apache.org>
Subject: Storm 2 Spout Not Acking, Failing Tuples
 
  
 
Hi,
 
  
 
We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar tothis, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting “dropped”, exceeding the max retries configured.
 
  
 
In storm 1.2.1, the same set of storm configs work fine. It’s only after upgrading that we’re seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following:
    
   - Increasing the topology message timeout
   - Increasing max spout pending
   - Increasing number of workers
   - Increasing executor send and transfer buffer size
   - Extending the back pressure interval check (since it can’t be disabled, which is the old behavior in storm 1)
   - Disabling load aware messaging
 
  
 
Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams.
 
  
 
Thanks,
 
Binh
 
  
 

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy. 
______________________________________________________________________________________

www.accenture.com
  

RE: Storm 2 Spout Not Acking, Failing Tuples

Posted by "Le, Binh T." <bi...@accenture.com>.
In troubleshooting in this issue, we've also noticed 2 strange behaviors with the bolt executor metrics shown in Storm UI.


  1.  Executed column shown is zero, but Acked column is greater than 0. How is that possible as acking is done within execute().
  2.  Executed column shown is zero, but Emitted/Transferred column is greater than 0.

Can anyone help explain this behavior?

From: Le, Binh T.
Sent: Tuesday, November 16, 2021 1:04 PM
To: user@storm.apache.org
Subject: RE: Storm 2 Spout Not Acking, Failing Tuples

To follow up, we've looked at Storm UI and it shows that the tuples are not evenly distributed. We know this has to do with the grouping. We use shuffle grouping (and even disabled load aware messaging), but the tuples are still not evenly distributed.

It seems like others have the same experience when upgrading to storm 2.2 from storm 1.2. Here are a few of those posts. We sense a theme here with this issue and are surprised there's no documentation or response to address this.


  1.  https://www.mail-archive.com/user@storm.apache.org/msg10070.html
  2.  https://www.mail-archive.com/user@storm.apache.org/msg10013.html
  3.  https://www.mail-archive.com/user@storm.apache.org/msg10114.html

Also, for us, when we include other bolts to the topology, we've even seen the tuples not making it to the bolts at all, at least that is what Storm UI is showing (no tuples executed or acked and bolt capacity is always zero).

Please help advise us on what should be done to troubleshoot/fix this. Any help is greatly appreciated.

Thanks!

From: Le, Binh T.
Sent: Monday, November 8, 2021 2:32 PM
To: 'user@storm.apache.org' <us...@storm.apache.org>>
Subject: Storm 2 Spout Not Acking, Failing Tuples

Hi,

We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar to this<https://www.mail-archive.com/user@storm.apache.org/msg10013.html>, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting "dropped", exceeding the max retries configured.

In storm 1.2.1, the same set of storm configs work fine. It's only after upgrading that we're seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following:

  1.  Increasing the topology message timeout
  2.  Increasing max spout pending
  3.  Increasing number of workers
  4.  Increasing executor send and transfer buffer size
  5.  Extending the back pressure interval check (since it can't be disabled, which is the old behavior in storm 1)
  6.  Disabling load aware messaging

Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams.

Thanks,
Binh


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com

Re: Storm 2 Spout Not Acking, Failing Tuples

Posted by PiPo Sweet Baby <ng...@gmail.com>.
unsubscribe

On Wed, Nov 17, 2021 at 1:04 AM Le, Binh T. <bi...@accenture.com> wrote:

> To follow up, we’ve looked at Storm UI and it shows that the tuples are
> not evenly distributed. We know this has to do with the grouping. We use
> shuffle grouping (and even disabled load aware messaging), but the tuples
> are still not evenly distributed.
>
>
>
> It seems like others have the same experience when upgrading to storm 2.2
> from storm 1.2. Here are a few of those posts. We sense a theme here with
> this issue and are surprised there’s no documentation or response to
> address this.
>
>
>
>    1. https://www.mail-archive.com/user@storm.apache.org/msg10070.html
>    2. https://www.mail-archive.com/user@storm.apache.org/msg10013.html
>    3. https://www.mail-archive.com/user@storm.apache.org/msg10114.html
>
>
>
> Also, for us, when we include other bolts to the topology, we’ve even seen
> the tuples not making it to the bolts at all, at least that is what Storm
> UI is showing (no tuples executed or acked and bolt capacity is always
> zero).
>
>
>
> Please help advise us on what should be done to troubleshoot/fix this. Any
> help is greatly appreciated.
>
>
>
> Thanks!
>
>
>
> *From:* Le, Binh T.
> *Sent:* Monday, November 8, 2021 2:32 PM
> *To:* 'user@storm.apache.org' <us...@storm.apache.org>
> *Subject:* Storm 2 Spout Not Acking, Failing Tuples
>
>
>
> Hi,
>
>
>
> We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue
> similar to this
> <https://www.mail-archive.com/user@storm.apache.org/msg10013.html>, where
> all bolts are acking but the spout does not, causing latency to be high and
> increasing. FYI, we anchor tuples. We can see that tuples are consistently
> timing out, causing them to fail and be retried over and over again and
> eventually getting “dropped”, exceeding the max retries configured.
>
>
>
> In storm 1.2.1, the same set of storm configs work fine. It’s only after
> upgrading that we’re seeing this behavior. We have tried a number of
> things, all of which did not help. They include, but not limited to, the
> following:
>
>    1. Increasing the topology message timeout
>    2. Increasing max spout pending
>    3. Increasing number of workers
>    4. Increasing executor send and transfer buffer size
>    5. Extending the back pressure interval check (since it can’t be
>    disabled, which is the old behavior in storm 1)
>    6. Disabling load aware messaging
>
>
>
> Can you please let us know how we can go about troubleshooting this issue,
> finding where the root cause / bottleneck is, and possibly a fix? In case
> it matters, our storm topologies are reading from AWS Kinesis Data Streams.
>
>
>
> Thanks,
>
> Binh
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy. Your privacy is important to us.
> Accenture uses your personal data only in compliance with data protection
> laws. For further information on how Accenture processes your personal
> data, please see our privacy statement at
> https://www.accenture.com/us-en/privacy-policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>

RE: Storm 2 Spout Not Acking, Failing Tuples

Posted by "Le, Binh T." <bi...@accenture.com>.
To follow up, we've looked at Storm UI and it shows that the tuples are not evenly distributed. We know this has to do with the grouping. We use shuffle grouping (and even disabled load aware messaging), but the tuples are still not evenly distributed.

It seems like others have the same experience when upgrading to storm 2.2 from storm 1.2. Here are a few of those posts. We sense a theme here with this issue and are surprised there's no documentation or response to address this.


  1.  https://www.mail-archive.com/user@storm.apache.org/msg10070.html
  2.  https://www.mail-archive.com/user@storm.apache.org/msg10013.html
  3.  https://www.mail-archive.com/user@storm.apache.org/msg10114.html

Also, for us, when we include other bolts to the topology, we've even seen the tuples not making it to the bolts at all, at least that is what Storm UI is showing (no tuples executed or acked and bolt capacity is always zero).

Please help advise us on what should be done to troubleshoot/fix this. Any help is greatly appreciated.

Thanks!

From: Le, Binh T.
Sent: Monday, November 8, 2021 2:32 PM
To: 'user@storm.apache.org' <us...@storm.apache.org>
Subject: Storm 2 Spout Not Acking, Failing Tuples

Hi,

We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar to this<https://www.mail-archive.com/user@storm.apache.org/msg10013.html>, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting "dropped", exceeding the max retries configured.

In storm 1.2.1, the same set of storm configs work fine. It's only after upgrading that we're seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following:

  1.  Increasing the topology message timeout
  2.  Increasing max spout pending
  3.  Increasing number of workers
  4.  Increasing executor send and transfer buffer size
  5.  Extending the back pressure interval check (since it can't be disabled, which is the old behavior in storm 1)
  6.  Disabling load aware messaging

Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams.

Thanks,
Binh


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com

RE: [External] Re: Storm 2 Spout Not Acking, Failing Tuples

Posted by "Le, Binh T." <bi...@accenture.com>.
Thanks for the response. We understand the stream grouping concept. For the latter response re: one of our bolts, we are using direct grouping specifically for that bolt. However, what we don't understand is this has been running perfectly find in Storm 1.2.1 and why it's not working now in Storm 2.2.0.

The call out to that one specific bolt and what we've seen in Storm UI was whether to confirm or not that that is the original cause of the spout not acking and retrying. From what you've seen, is that the case here? It seems there is something with Storm 2 that is causing the behavior we're seeing. Maybe it's a config we haven't tried, but we've tried a lot already. And now it's coming down to more like trial and error.

From: Rui Abreu <ru...@gmail.com>
Sent: Tuesday, November 9, 2021 5:25 PM
To: user@storm.apache.org
Subject: [External] Re: Storm 2 Spout Not Acking, Failing Tuples

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
________________________________

Internal tuple sharding depends on the type of Stream grouping you are using.

https://storm.apache.org/releases/current/Concepts.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_Concepts.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=4Cnhm6tVl8naHFK8hKxzj54zU7Mzc08mZUvvG_P_fTk&e=>


There are eight built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing the CustomStreamGrouping<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_grouping_CustomStreamGrouping.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=AtoRmln3Wsm6X4XdEdQVjZlYSnH8Bz4jCVNx3idoH_M&e=> interface:

  1.  Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
  2.  Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.
  3.  Partial Key grouping: The stream is partitioned by the fields specified in the grouping, like the Fields grouping, but are load balanced between two downstream bolts, which provides better utilization of resources when the incoming data is skewed. This paper<https://urldefense.proofpoint.com/v2/url?u=https-3A__melmeric.files.wordpress.com_2014_11_the-2Dpower-2Dof-2Dboth-2Dchoices-2Dpractical-2Dload-2Dbalancing-2Dfor-2Ddistributed-2Dstream-2Dprocessing-2Dengines.pdf&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=K2zTiIwDeqHAYOvv8YUU48XHxAeW91KCLhcEVM-M7to&e=> provides a good explanation of how it works and the advantages it provides.
  4.  All grouping: The stream is replicated across all the bolt's tasks. Use this grouping with care.
  5.  Global grouping: The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.
  6.  None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).
  7.  Direct grouping: This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the emitDirect<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_task_OutputCollector.html-23emitDirect-2Dint-2Djava.util.Collection-2Djava.util.List-2D&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=lRpQ9hz6LzP6id0NJwvxn14xmHtLYrP1gj_9Ey8qySg&e=> methods. A bolt can get the task ids of its consumers by either using the provided TopologyContext<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_task_TopologyContext.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=ODGit9bWInHXjjlNW81TLAXwlrSownTUbwx9p0Ral9c&e=> or by keeping track of the output of the emit method in OutputCollector<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_task_OutputCollector.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=LsC_nIbwaPvx16LtdnWCeKE82hsvhtOY7miaRPj1p-w&e=> (which returns the task ids that the tuple was sent to).
  8.  Local or shuffle grouping: If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.


On Tue, 9 Nov 2021 at 19:35, Le, Binh T. <bi...@accenture.com>> wrote:
Re: acking, a look at the code indicates that all bolts are indeed acking. However, in Storm UI, we noticed in one of the bolts, where there are 200+ executors, only one executor is processing any tuples (under Executed column), whereas the others all show zero.
[cid:image001.png@01D7D60C.32BCE1F0]

Could this be the problem? If so, what would cause this and how to fix it?

From: Le, Binh T.
Sent: Monday, November 8, 2021 2:32 PM
To: 'user@storm.apache.org<ma...@storm.apache.org>' <us...@storm.apache.org>>
Subject: Storm 2 Spout Not Acking, Failing Tuples

Hi,

We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar to this<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_user-40storm.apache.org_msg10013.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=Laho-taB4ZBN7khpuF6wsndFR5TyifDKufslEEs4HvI&e=>, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting "dropped", exceeding the max retries configured.

In storm 1.2.1, the same set of storm configs work fine. It's only after upgrading that we're seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following:

  1.  Increasing the topology message timeout
  2.  Increasing max spout pending
  3.  Increasing number of workers
  4.  Increasing executor send and transfer buffer size
  5.  Extending the back pressure interval check (since it can't be disabled, which is the old behavior in storm 1)
  6.  Disabling load aware messaging

Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams.

Thanks,
Binh


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: Storm 2 Spout Not Acking, Failing Tuples

Posted by Rui Abreu <ru...@gmail.com>.
Internal tuple sharding depends on the type of Stream grouping you are
using.

https://storm.apache.org/releases/current/Concepts.html

There are eight built-in stream groupings in Storm, and you can implement a
custom stream grouping by implementing the CustomStreamGrouping
<https://storm.apache.org/releases/current/javadocs/org/apache/storm/grouping/CustomStreamGrouping.html>
 interface:

   1. Shuffle grouping: Tuples are randomly distributed across the bolt's
   tasks in a way such that each bolt is guaranteed to get an equal number of
   tuples.
   2. Fields grouping: The stream is partitioned by the fields specified in
   the grouping. For example, if the stream is grouped by the "user-id" field,
   tuples with the same "user-id" will always go to the same task, but tuples
   with different "user-id"'s may go to different tasks.
   3. Partial Key grouping: The stream is partitioned by the fields
   specified in the grouping, like the Fields grouping, but are load balanced
   between two downstream bolts, which provides better utilization of
   resources when the incoming data is skewed. This paper
   <https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf>
provides
   a good explanation of how it works and the advantages it provides.
   4. All grouping: The stream is replicated across all the bolt's tasks.
   Use this grouping with care.
   5. Global grouping: The entire stream goes to a single one of the bolt's
   tasks. Specifically, it goes to the task with the lowest id.
   6. None grouping: This grouping specifies that you don't care how the
   stream is grouped. Currently, none groupings are equivalent to shuffle
   groupings. Eventually though, Storm will push down bolts with none
   groupings to execute in the same thread as the bolt or spout they subscribe
   from (when possible).
   7. Direct grouping: This is a special kind of grouping. A stream grouped
   this way means that the producer of the tuple decides which task of the
   consumer will receive this tuple. Direct groupings can only be declared on
   streams that have been declared as direct streams. Tuples emitted to a
   direct stream must be emitted using one of the emitDirect
   <https://storm.apache.org/releases/current/javadocs/org/apache/storm/task/OutputCollector.html#emitDirect-int-java.util.Collection-java.util.List->
methods.
   A bolt can get the task ids of its consumers by either using the provided
   TopologyContext
   <https://storm.apache.org/releases/current/javadocs/org/apache/storm/task/TopologyContext.html>
or
   by keeping track of the output of the emit method in OutputCollector
   <https://storm.apache.org/releases/current/javadocs/org/apache/storm/task/OutputCollector.html>
(which
   returns the task ids that the tuple was sent to).
   8. Local or shuffle grouping: If the target bolt has one or more tasks
   in the same worker process, tuples will be shuffled to just those
   in-process tasks. Otherwise, this acts like a normal shuffle grouping.



On Tue, 9 Nov 2021 at 19:35, Le, Binh T. <bi...@accenture.com> wrote:

> Re: acking, a look at the code indicates that all bolts are indeed acking.
> However, in Storm UI, we noticed in one of the bolts, where there are 200+
> executors, only one executor is processing any tuples (under Executed
> column), whereas the others all show zero.
>
>
>
> Could this be the problem? If so, what would cause this and how to fix it?
>
>
>
> *From:* Le, Binh T.
> *Sent:* Monday, November 8, 2021 2:32 PM
> *To:* 'user@storm.apache.org' <us...@storm.apache.org>
> *Subject:* Storm 2 Spout Not Acking, Failing Tuples
>
>
>
> Hi,
>
>
>
> We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue
> similar to this
> <https://www.mail-archive.com/user@storm.apache.org/msg10013.html>, where
> all bolts are acking but the spout does not, causing latency to be high and
> increasing. FYI, we anchor tuples. We can see that tuples are consistently
> timing out, causing them to fail and be retried over and over again and
> eventually getting “dropped”, exceeding the max retries configured.
>
>
>
> In storm 1.2.1, the same set of storm configs work fine. It’s only after
> upgrading that we’re seeing this behavior. We have tried a number of
> things, all of which did not help. They include, but not limited to, the
> following:
>
>    1. Increasing the topology message timeout
>    2. Increasing max spout pending
>    3. Increasing number of workers
>    4. Increasing executor send and transfer buffer size
>    5. Extending the back pressure interval check (since it can’t be
>    disabled, which is the old behavior in storm 1)
>    6. Disabling load aware messaging
>
>
>
> Can you please let us know how we can go about troubleshooting this issue,
> finding where the root cause / bottleneck is, and possibly a fix? In case
> it matters, our storm topologies are reading from AWS Kinesis Data Streams.
>
>
>
> Thanks,
>
> Binh
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy. Your privacy is important to us.
> Accenture uses your personal data only in compliance with data protection
> laws. For further information on how Accenture processes your personal
> data, please see our privacy statement at
> https://www.accenture.com/us-en/privacy-policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>

RE: Storm 2 Spout Not Acking, Failing Tuples

Posted by "Le, Binh T." <bi...@accenture.com>.
Re: acking, a look at the code indicates that all bolts are indeed acking. However, in Storm UI, we noticed in one of the bolts, where there are 200+ executors, only one executor is processing any tuples (under Executed column), whereas the others all show zero.
[cid:image002.png@01D7D576.F022F900]

Could this be the problem? If so, what would cause this and how to fix it?

From: Le, Binh T.
Sent: Monday, November 8, 2021 2:32 PM
To: 'user@storm.apache.org' <us...@storm.apache.org>
Subject: Storm 2 Spout Not Acking, Failing Tuples

Hi,

We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar to this<https://www.mail-archive.com/user@storm.apache.org/msg10013.html>, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting "dropped", exceeding the max retries configured.

In storm 1.2.1, the same set of storm configs work fine. It's only after upgrading that we're seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following:

  1.  Increasing the topology message timeout
  2.  Increasing max spout pending
  3.  Increasing number of workers
  4.  Increasing executor send and transfer buffer size
  5.  Extending the back pressure interval check (since it can't be disabled, which is the old behavior in storm 1)
  6.  Disabling load aware messaging

Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams.

Thanks,
Binh


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com