You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Le, Binh T." <bi...@accenture.com> on 2021/11/10 13:23:20 UTC
RE: [External] Re: Storm 2 Spout Not Acking, Failing Tuples

Thanks for the response. We understand the stream grouping concept. For the latter response re: one of our bolts, we are using direct grouping specifically for that bolt. However, what we don't understand is this has been running perfectly find in Storm 1.2.1 and why it's not working now in Storm 2.2.0.

The call out to that one specific bolt and what we've seen in Storm UI was whether to confirm or not that that is the original cause of the spout not acking and retrying. From what you've seen, is that the case here? It seems there is something with Storm 2 that is causing the behavior we're seeing. Maybe it's a config we haven't tried, but we've tried a lot already. And now it's coming down to more like trial and error.

From: Rui Abreu <ru...@gmail.com>
Sent: Tuesday, November 9, 2021 5:25 PM
To: user@storm.apache.org
Subject: [External] Re: Storm 2 Spout Not Acking, Failing Tuples

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
________________________________

Internal tuple sharding depends on the type of Stream grouping you are using.

https://storm.apache.org/releases/current/Concepts.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_Concepts.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=4Cnhm6tVl8naHFK8hKxzj54zU7Mzc08mZUvvG_P_fTk&e=>


There are eight built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing the CustomStreamGrouping<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_grouping_CustomStreamGrouping.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=AtoRmln3Wsm6X4XdEdQVjZlYSnH8Bz4jCVNx3idoH_M&e=> interface:

  1.  Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
  2.  Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.
  3.  Partial Key grouping: The stream is partitioned by the fields specified in the grouping, like the Fields grouping, but are load balanced between two downstream bolts, which provides better utilization of resources when the incoming data is skewed. This paper<https://urldefense.proofpoint.com/v2/url?u=https-3A__melmeric.files.wordpress.com_2014_11_the-2Dpower-2Dof-2Dboth-2Dchoices-2Dpractical-2Dload-2Dbalancing-2Dfor-2Ddistributed-2Dstream-2Dprocessing-2Dengines.pdf&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=K2zTiIwDeqHAYOvv8YUU48XHxAeW91KCLhcEVM-M7to&e=> provides a good explanation of how it works and the advantages it provides.
  4.  All grouping: The stream is replicated across all the bolt's tasks. Use this grouping with care.
  5.  Global grouping: The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.
  6.  None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).
  7.  Direct grouping: This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the emitDirect<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_task_OutputCollector.html-23emitDirect-2Dint-2Djava.util.Collection-2Djava.util.List-2D&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=lRpQ9hz6LzP6id0NJwvxn14xmHtLYrP1gj_9Ey8qySg&e=> methods. A bolt can get the task ids of its consumers by either using the provided TopologyContext<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_task_TopologyContext.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=ODGit9bWInHXjjlNW81TLAXwlrSownTUbwx9p0Ral9c&e=> or by keeping track of the output of the emit method in OutputCollector<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_javadocs_org_apache_storm_task_OutputCollector.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=LsC_nIbwaPvx16LtdnWCeKE82hsvhtOY7miaRPj1p-w&e=> (which returns the task ids that the tuple was sent to).
  8.  Local or shuffle grouping: If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.


On Tue, 9 Nov 2021 at 19:35, Le, Binh T. <bi...@accenture.com>> wrote:
Re: acking, a look at the code indicates that all bolts are indeed acking. However, in Storm UI, we noticed in one of the bolts, where there are 200+ executors, only one executor is processing any tuples (under Executed column), whereas the others all show zero.
[cid:image001.png@01D7D60C.32BCE1F0]

Could this be the problem? If so, what would cause this and how to fix it?

From: Le, Binh T.
Sent: Monday, November 8, 2021 2:32 PM
To: 'user@storm.apache.org<ma...@storm.apache.org>' <us...@storm.apache.org>>
Subject: Storm 2 Spout Not Acking, Failing Tuples

Hi,

We are upgrading storm from 1.2.1 to 2.2.0 and are experiencing an issue similar to this<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_user-40storm.apache.org_msg10013.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=oYGkkdtMKB5EGmtwpd-OC4ErcQTqqx6soZnXC8Y2nXY&s=Laho-taB4ZBN7khpuF6wsndFR5TyifDKufslEEs4HvI&e=>, where all bolts are acking but the spout does not, causing latency to be high and increasing. FYI, we anchor tuples. We can see that tuples are consistently timing out, causing them to fail and be retried over and over again and eventually getting "dropped", exceeding the max retries configured.

In storm 1.2.1, the same set of storm configs work fine. It's only after upgrading that we're seeing this behavior. We have tried a number of things, all of which did not help. They include, but not limited to, the following:

  1.  Increasing the topology message timeout
  2.  Increasing max spout pending
  3.  Increasing number of workers
  4.  Increasing executor send and transfer buffer size
  5.  Extending the back pressure interval check (since it can't be disabled, which is the old behavior in storm 1)
  6.  Disabling load aware messaging

Can you please let us know how we can go about troubleshooting this issue, finding where the root cause / bottleneck is, and possibly a fix? In case it matters, our storm topologies are reading from AWS Kinesis Data Streams.

Thanks,
Binh


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>