You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Le, Binh T." <bi...@accenture.com> on 2022/04/18 17:17:04 UTC

Storm 2 Topology Stopped Processing

Hi,

We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?

Thanks,
Binh

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com

RE: Storm 2 Topology Stopped Processing

Posted by "Le, Binh T." <bi...@accenture.com>.
Hi,

Does anyone have any thoughts on this issue? This has been very frustrating to troubleshoot. It's basically trial and error for us. Any changes to the following configs have not helped.


  *   Spout/bolt parallelism
  *   Increase executor receive queue
  *   Increase worker transfer queue
  *   Increase backpressure check millis
  *   Increase Netty buffer high watermark

We even restored the state to when the topology was continually processing before, but processing is still stopped. So it seems like something is going on and it is not our topology code/configs.

Any help here is greatly appreciated.

Thanks,
Binh

From: Le, Binh T.
Sent: Monday, April 18, 2022 1:17 PM
To: user@storm.apache.org
Subject: Storm 2 Topology Stopped Processing

Hi,

We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?

Thanks,
Binh

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com

RE: [External] Re: Storm 2 Topology Stopped Processing

Posted by "Le, Binh T." <bi...@accenture.com>.
Thanks Bipin for the referenced Jiras. That's odd that the mentioned issue is fixed but we're still seeing it.

Yes, Storm UI shows 2.2.0 as the version.

Let me see if we can upgrade to 2.4.0 and go from there.

-----Original Message-----
From: Bipin Prasad <bi...@apache.org> 
Sent: Wednesday, April 20, 2022 1:17 PM
To: user@storm.apache.org
Subject: RE: [External] Re: Storm 2 Topology Stopped Processing

Hello Binh,
   I searched for this issue in older Jiras at issues.apache.org and see a few mentions (STORM-3751, STORM-3141 and STORM-3510). However, the pull requests are indicated as merged.
So you are indicating a new issue here. Just to confirm, when you go to the Storm UI page, it shows version 2.2.0?

   When you encounter problems with 2.4.0 upgrade, please raise a Jira. Likely to get more support for this version.

Thanks
--Bipin

On 2022/04/20 16:04:07 "Le, Binh T." wrote:
> Hi Bipin,
> 
> Thanks for your response. Unfortunately, this problem is not easily reproducible. It only manifests itself when we're running our topologies with production volume. It works fine, so far, with low volume. So it would be difficult for me to provide an example topology and all that, to reproduce this.
> 
> Before storm 2.4.0 was released. We did try upgrading to 2.3.0, but unfortunately encountered tons of errors we've not seen before with 2.2.0. So we reverted back to 2.2.0. At this point, I'm not sure if we can upgrade to 2.4.0. We'll see.
> 
> Thanks,
> Binh
> 
> -----Original Message-----
> From: Bipin Prasad <bi...@apache.org> 
> Sent: Wednesday, April 20, 2022 10:32 AM
> To: user@storm.apache.org
> Subject: [External] Re: Storm 2 Topology Stopped Processing
> 
> This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
> 
> Hello Binh,
> 
> Can you can provide a small example that can be duplicated - for example with a storm config file (storm.yaml) and default, and running WordCountTopology (for instance).
> 
> Please check the CLASSPATH and java version, and any Exception in the log files as well (before the NPE).
> 
> For debugging locally, you can set_log_level for a specific class for a specified duration (using storm.py), and/or start in java debug mode and attach a debugger to it.
> 
> From your earlier description of the stack trace, it seems there is a storm internal error when the message is retrieved from the queue.
> 
> Is it possible for you to switch to the latest released version of storm (2.4.0). You might get better support for it. If the problem still persists, please raise a storm-jira at https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_STORM_issues&d=DwIFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=HLhVczPJ4MPWrYsR8Lap37z-NJEPIiIKCwq_thIXP-rlAYKZZpv1VjC7H4Of-1HC&s=5ZSfHRDiQM7q6Z22OTpYVihJx1X1508wu1RadoWQ72k&e= 
> 
> Thanks
> --Bipin
> 
> On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> > Hi,
> > 
> > We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?
> > 
> > Thanks,
> > Binh
> > 
> > ________________________________
> > 
> > This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
> > ______________________________________________________________________________________
> > 
> > http://www.accenture.com
> > 
> 

RE: [External] Re: Storm 2 Topology Stopped Processing

Posted by Bipin Prasad <bi...@apache.org>.
Hello Binh,
   I searched for this issue in older Jiras at issues.apache.org and see a few mentions (STORM-3751, STORM-3141 and STORM-3510). However, the pull requests are indicated as merged.
So you are indicating a new issue here. Just to confirm, when you go to the Storm UI page, it shows version 2.2.0?

   When you encounter problems with 2.4.0 upgrade, please raise a Jira. Likely to get more support for this version.

Thanks
--Bipin

On 2022/04/20 16:04:07 "Le, Binh T." wrote:
> Hi Bipin,
> 
> Thanks for your response. Unfortunately, this problem is not easily reproducible. It only manifests itself when we're running our topologies with production volume. It works fine, so far, with low volume. So it would be difficult for me to provide an example topology and all that, to reproduce this.
> 
> Before storm 2.4.0 was released. We did try upgrading to 2.3.0, but unfortunately encountered tons of errors we've not seen before with 2.2.0. So we reverted back to 2.2.0. At this point, I'm not sure if we can upgrade to 2.4.0. We'll see.
> 
> Thanks,
> Binh
> 
> -----Original Message-----
> From: Bipin Prasad <bi...@apache.org> 
> Sent: Wednesday, April 20, 2022 10:32 AM
> To: user@storm.apache.org
> Subject: [External] Re: Storm 2 Topology Stopped Processing
> 
> This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
> 
> Hello Binh,
> 
> Can you can provide a small example that can be duplicated - for example with a storm config file (storm.yaml) and default, and running WordCountTopology (for instance).
> 
> Please check the CLASSPATH and java version, and any Exception in the log files as well (before the NPE).
> 
> For debugging locally, you can set_log_level for a specific class for a specified duration (using storm.py), and/or start in java debug mode and attach a debugger to it.
> 
> From your earlier description of the stack trace, it seems there is a storm internal error when the message is retrieved from the queue.
> 
> Is it possible for you to switch to the latest released version of storm (2.4.0). You might get better support for it. If the problem still persists, please raise a storm-jira at https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_STORM_issues&d=DwIFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=HLhVczPJ4MPWrYsR8Lap37z-NJEPIiIKCwq_thIXP-rlAYKZZpv1VjC7H4Of-1HC&s=5ZSfHRDiQM7q6Z22OTpYVihJx1X1508wu1RadoWQ72k&e= 
> 
> Thanks
> --Bipin
> 
> On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> > Hi,
> > 
> > We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?
> > 
> > Thanks,
> > Binh
> > 
> > ________________________________
> > 
> > This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
> > ______________________________________________________________________________________
> > 
> > http://www.accenture.com
> > 
> 

RE: [External] Re: Storm 2 Topology Stopped Processing

Posted by "Le, Binh T." <bi...@accenture.com>.
Yes, when this happens everything, CPU, memory, disk utilization & IO, network IO & packets, etc, all dropped. There was no outlier that I can see.

From: Rui Abreu <ru...@gmail.com>
Sent: Wednesday, April 20, 2022 12:43 PM
To: user@storm.apache.org
Subject: Re: [External] Re: Storm 2 Topology Stopped Processing

Have you checked your OS / JVM level  resources consumption when the failure happens? Is there any kind of outlier?

On Wed, 20 Apr 2022 at 17:20, Le, Binh T. <bi...@accenture.com>> wrote:
Yes, I have updated tried both. I stated that in my other post. Maybe you might have not seen it.

Also, to be clear, I didn't think there was a way to disable backpressure. You can only extend the backpressure check time, which is what I did.

From: Rui Abreu <ru...@gmail.com>>
Sent: Wednesday, April 20, 2022 12:13 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: [External] Re: Storm 2 Topology Stopped Processing

As Bipin said, it could be a number of things. Have you tried disabling the backpressure mechanism and limiting the in flight messages with topology.max.spout.pending ?

https://storm.apache.org/releases/current/Performance.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_Performance.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=s47ZhwnlsJfgRfqqRdG5OTTKWEhdHkgLkGtF_SjROaMshzIe6FWZkzm5zgkrjsUm&s=LZzB9UAuLr_8bW2hG700FHwj68JSQHEtSCVG4Rd4lis&e=>

On Wed, 20 Apr 2022 at 17:04, Le, Binh T. <bi...@accenture.com>> wrote:
Hi Bipin,

Thanks for your response. Unfortunately, this problem is not easily reproducible. It only manifests itself when we're running our topologies with production volume. It works fine, so far, with low volume. So it would be difficult for me to provide an example topology and all that, to reproduce this.

Before storm 2.4.0 was released. We did try upgrading to 2.3.0, but unfortunately encountered tons of errors we've not seen before with 2.2.0. So we reverted back to 2.2.0. At this point, I'm not sure if we can upgrade to 2.4.0. We'll see.

Thanks,
Binh

-----Original Message-----
From: Bipin Prasad <bi...@apache.org>>
Sent: Wednesday, April 20, 2022 10:32 AM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: [External] Re: Storm 2 Topology Stopped Processing

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.

Hello Binh,

Can you can provide a small example that can be duplicated - for example with a storm config file (storm.yaml) and default, and running WordCountTopology (for instance).

Please check the CLASSPATH and java version, and any Exception in the log files as well (before the NPE).

For debugging locally, you can set_log_level for a specific class for a specified duration (using storm.py), and/or start in java debug mode and attach a debugger to it.

From your earlier description of the stack trace, it seems there is a storm internal error when the message is retrieved from the queue.

Is it possible for you to switch to the latest released version of storm (2.4.0). You might get better support for it. If the problem still persists, please raise a storm-jira at https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_STORM_issues&d=DwIFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=HLhVczPJ4MPWrYsR8Lap37z-NJEPIiIKCwq_thIXP-rlAYKZZpv1VjC7H4Of-1HC&s=5ZSfHRDiQM7q6Z22OTpYVihJx1X1508wu1RadoWQ72k&e=

Thanks
--Bipin

On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> Hi,
>
> We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?
>
> Thanks,
> Binh
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
> ______________________________________________________________________________________
>
> http://www.accenture.com
>

Re: [External] Re: Storm 2 Topology Stopped Processing

Posted by Rui Abreu <ru...@gmail.com>.
Have you checked your OS / JVM level  resources consumption when the
failure happens? Is there any kind of outlier?

On Wed, 20 Apr 2022 at 17:20, Le, Binh T. <bi...@accenture.com> wrote:

> Yes, I have updated tried both. I stated that in my other post. Maybe you
> might have not seen it.
>
>
>
> Also, to be clear, I didn’t think there was a way to disable backpressure.
> You can only extend the backpressure check time, which is what I did.
>
>
>
> *From:* Rui Abreu <ru...@gmail.com>
> *Sent:* Wednesday, April 20, 2022 12:13 PM
> *To:* user@storm.apache.org
> *Subject:* Re: [External] Re: Storm 2 Topology Stopped Processing
>
>
>
> As Bipin said, it could be a number of things. Have you tried disabling
> the backpressure mechanism and limiting the in flight messages with
> topology.max.spout.pending ?
>
>
>
> https://storm.apache.org/releases/current/Performance.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_Performance.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=s47ZhwnlsJfgRfqqRdG5OTTKWEhdHkgLkGtF_SjROaMshzIe6FWZkzm5zgkrjsUm&s=LZzB9UAuLr_8bW2hG700FHwj68JSQHEtSCVG4Rd4lis&e=>
>
>
>
> On Wed, 20 Apr 2022 at 17:04, Le, Binh T. <bi...@accenture.com> wrote:
>
> Hi Bipin,
>
> Thanks for your response. Unfortunately, this problem is not easily
> reproducible. It only manifests itself when we're running our topologies
> with production volume. It works fine, so far, with low volume. So it would
> be difficult for me to provide an example topology and all that, to
> reproduce this.
>
> Before storm 2.4.0 was released. We did try upgrading to 2.3.0, but
> unfortunately encountered tons of errors we've not seen before with 2.2.0.
> So we reverted back to 2.2.0. At this point, I'm not sure if we can upgrade
> to 2.4.0. We'll see.
>
> Thanks,
> Binh
>
> -----Original Message-----
> From: Bipin Prasad <bi...@apache.org>
> Sent: Wednesday, April 20, 2022 10:32 AM
> To: user@storm.apache.org
> Subject: [External] Re: Storm 2 Topology Stopped Processing
>
> This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with
> links and attachments.
>
> Hello Binh,
>
> Can you can provide a small example that can be duplicated - for example
> with a storm config file (storm.yaml) and default, and running
> WordCountTopology (for instance).
>
> Please check the CLASSPATH and java version, and any Exception in the log
> files as well (before the NPE).
>
> For debugging locally, you can set_log_level for a specific class for a
> specified duration (using storm.py), and/or start in java debug mode and
> attach a debugger to it.
>
> From your earlier description of the stack trace, it seems there is a
> storm internal error when the message is retrieved from the queue.
>
> Is it possible for you to switch to the latest released version of storm
> (2.4.0). You might get better support for it. If the problem still
> persists, please raise a storm-jira at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_STORM_issues&d=DwIFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=HLhVczPJ4MPWrYsR8Lap37z-NJEPIiIKCwq_thIXP-rlAYKZZpv1VjC7H4Of-1HC&s=5ZSfHRDiQM7q6Z22OTpYVihJx1X1508wu1RadoWQ72k&e=
>
> Thanks
> --Bipin
>
> On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> > Hi,
> >
> > We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange
> behavior. The topology is processing millions of messages fine. However,
> all of a sudden, processing just stopped. Capacity for all bolts got
> reduced to zero and stays at zero. Does anyone know what could possibly be
> going on here and how we can troubleshoot this issue?
> >
> > Thanks,
> > Binh
> >
> > ________________________________
> >
> > This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy. Your privacy is important to us.
> Accenture uses your personal data only in compliance with data protection
> laws. For further information on how Accenture processes your personal
> data, please see our privacy statement at
> https://www.accenture.com/us-en/privacy-policy.
> >
> ______________________________________________________________________________________
> >
> > http://www.accenture.com
> >
>
>

RE: [External] Re: Storm 2 Topology Stopped Processing

Posted by "Le, Binh T." <bi...@accenture.com>.
Yes, I have updated tried both. I stated that in my other post. Maybe you might have not seen it.

Also, to be clear, I didn’t think there was a way to disable backpressure. You can only extend the backpressure check time, which is what I did.

From: Rui Abreu <ru...@gmail.com>
Sent: Wednesday, April 20, 2022 12:13 PM
To: user@storm.apache.org
Subject: Re: [External] Re: Storm 2 Topology Stopped Processing

As Bipin said, it could be a number of things. Have you tried disabling the backpressure mechanism and limiting the in flight messages with topology.max.spout.pending ?

https://storm.apache.org/releases/current/Performance.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__storm.apache.org_releases_current_Performance.html&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=s47ZhwnlsJfgRfqqRdG5OTTKWEhdHkgLkGtF_SjROaMshzIe6FWZkzm5zgkrjsUm&s=LZzB9UAuLr_8bW2hG700FHwj68JSQHEtSCVG4Rd4lis&e=>

On Wed, 20 Apr 2022 at 17:04, Le, Binh T. <bi...@accenture.com>> wrote:
Hi Bipin,

Thanks for your response. Unfortunately, this problem is not easily reproducible. It only manifests itself when we're running our topologies with production volume. It works fine, so far, with low volume. So it would be difficult for me to provide an example topology and all that, to reproduce this.

Before storm 2.4.0 was released. We did try upgrading to 2.3.0, but unfortunately encountered tons of errors we've not seen before with 2.2.0. So we reverted back to 2.2.0. At this point, I'm not sure if we can upgrade to 2.4.0. We'll see.

Thanks,
Binh

-----Original Message-----
From: Bipin Prasad <bi...@apache.org>>
Sent: Wednesday, April 20, 2022 10:32 AM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: [External] Re: Storm 2 Topology Stopped Processing

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.

Hello Binh,

Can you can provide a small example that can be duplicated - for example with a storm config file (storm.yaml) and default, and running WordCountTopology (for instance).

Please check the CLASSPATH and java version, and any Exception in the log files as well (before the NPE).

For debugging locally, you can set_log_level for a specific class for a specified duration (using storm.py), and/or start in java debug mode and attach a debugger to it.

From your earlier description of the stack trace, it seems there is a storm internal error when the message is retrieved from the queue.

Is it possible for you to switch to the latest released version of storm (2.4.0). You might get better support for it. If the problem still persists, please raise a storm-jira at https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_STORM_issues&d=DwIFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=HLhVczPJ4MPWrYsR8Lap37z-NJEPIiIKCwq_thIXP-rlAYKZZpv1VjC7H4Of-1HC&s=5ZSfHRDiQM7q6Z22OTpYVihJx1X1508wu1RadoWQ72k&e=

Thanks
--Bipin

On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> Hi,
>
> We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?
>
> Thanks,
> Binh
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
> ______________________________________________________________________________________
>
> http://www.accenture.com
>

Re: [External] Re: Storm 2 Topology Stopped Processing

Posted by Rui Abreu <ru...@gmail.com>.
As Bipin said, it could be a number of things. Have you tried disabling the
backpressure mechanism and limiting the in flight messages with
topology.max.spout.pending ?

https://storm.apache.org/releases/current/Performance.html

On Wed, 20 Apr 2022 at 17:04, Le, Binh T. <bi...@accenture.com> wrote:

> Hi Bipin,
>
> Thanks for your response. Unfortunately, this problem is not easily
> reproducible. It only manifests itself when we're running our topologies
> with production volume. It works fine, so far, with low volume. So it would
> be difficult for me to provide an example topology and all that, to
> reproduce this.
>
> Before storm 2.4.0 was released. We did try upgrading to 2.3.0, but
> unfortunately encountered tons of errors we've not seen before with 2.2.0.
> So we reverted back to 2.2.0. At this point, I'm not sure if we can upgrade
> to 2.4.0. We'll see.
>
> Thanks,
> Binh
>
> -----Original Message-----
> From: Bipin Prasad <bi...@apache.org>
> Sent: Wednesday, April 20, 2022 10:32 AM
> To: user@storm.apache.org
> Subject: [External] Re: Storm 2 Topology Stopped Processing
>
> This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with
> links and attachments.
>
> Hello Binh,
>
> Can you can provide a small example that can be duplicated - for example
> with a storm config file (storm.yaml) and default, and running
> WordCountTopology (for instance).
>
> Please check the CLASSPATH and java version, and any Exception in the log
> files as well (before the NPE).
>
> For debugging locally, you can set_log_level for a specific class for a
> specified duration (using storm.py), and/or start in java debug mode and
> attach a debugger to it.
>
> From your earlier description of the stack trace, it seems there is a
> storm internal error when the message is retrieved from the queue.
>
> Is it possible for you to switch to the latest released version of storm
> (2.4.0). You might get better support for it. If the problem still
> persists, please raise a storm-jira at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_STORM_issues&d=DwIFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=HLhVczPJ4MPWrYsR8Lap37z-NJEPIiIKCwq_thIXP-rlAYKZZpv1VjC7H4Of-1HC&s=5ZSfHRDiQM7q6Z22OTpYVihJx1X1508wu1RadoWQ72k&e=
>
> Thanks
> --Bipin
>
> On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> > Hi,
> >
> > We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange
> behavior. The topology is processing millions of messages fine. However,
> all of a sudden, processing just stopped. Capacity for all bolts got
> reduced to zero and stays at zero. Does anyone know what could possibly be
> going on here and how we can troubleshoot this issue?
> >
> > Thanks,
> > Binh
> >
> > ________________________________
> >
> > This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy. Your privacy is important to us.
> Accenture uses your personal data only in compliance with data protection
> laws. For further information on how Accenture processes your personal
> data, please see our privacy statement at
> https://www.accenture.com/us-en/privacy-policy.
> >
> ______________________________________________________________________________________
> >
> > http://www.accenture.com
> >
>

RE: [External] Re: Storm 2 Topology Stopped Processing

Posted by "Le, Binh T." <bi...@accenture.com>.
Hi Bipin,

Thanks for your response. Unfortunately, this problem is not easily reproducible. It only manifests itself when we're running our topologies with production volume. It works fine, so far, with low volume. So it would be difficult for me to provide an example topology and all that, to reproduce this.

Before storm 2.4.0 was released. We did try upgrading to 2.3.0, but unfortunately encountered tons of errors we've not seen before with 2.2.0. So we reverted back to 2.2.0. At this point, I'm not sure if we can upgrade to 2.4.0. We'll see.

Thanks,
Binh

-----Original Message-----
From: Bipin Prasad <bi...@apache.org> 
Sent: Wednesday, April 20, 2022 10:32 AM
To: user@storm.apache.org
Subject: [External] Re: Storm 2 Topology Stopped Processing

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.

Hello Binh,

Can you can provide a small example that can be duplicated - for example with a storm config file (storm.yaml) and default, and running WordCountTopology (for instance).

Please check the CLASSPATH and java version, and any Exception in the log files as well (before the NPE).

For debugging locally, you can set_log_level for a specific class for a specified duration (using storm.py), and/or start in java debug mode and attach a debugger to it.

From your earlier description of the stack trace, it seems there is a storm internal error when the message is retrieved from the queue.

Is it possible for you to switch to the latest released version of storm (2.4.0). You might get better support for it. If the problem still persists, please raise a storm-jira at https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_STORM_issues&d=DwIFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BxRQQZWnE3D8_17HHQE0-XVUNXeS_tqliMsO6ITEvoM&m=HLhVczPJ4MPWrYsR8Lap37z-NJEPIiIKCwq_thIXP-rlAYKZZpv1VjC7H4Of-1HC&s=5ZSfHRDiQM7q6Z22OTpYVihJx1X1508wu1RadoWQ72k&e= 

Thanks
--Bipin

On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> Hi,
> 
> We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?
> 
> Thanks,
> Binh
> 
> ________________________________
> 
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
> ______________________________________________________________________________________
> 
> http://www.accenture.com
> 

Re: Storm 2 Topology Stopped Processing

Posted by Bipin Prasad <bi...@apache.org>.
Hello Binh,

Can you can provide a small example that can be duplicated - for example with a storm config file (storm.yaml) and default, and running WordCountTopology (for instance).

Please check the CLASSPATH and java version, and any Exception in the log files as well (before the NPE).

For debugging locally, you can set_log_level for a specific class for a specified duration (using storm.py), and/or start in java debug mode and attach a debugger to it.

From your earlier description of the stack trace, it seems there is a storm internal error when the message is retrieved from the queue.

Is it possible for you to switch to the latest released version of storm (2.4.0). You might get better support for it. If the problem still persists, please raise a storm-jira at https://issues.apache.org/jira/projects/STORM/issues

Thanks
--Bipin

On 2022/04/18 17:17:04 "Le, Binh T." wrote:
> Hi,
> 
> We are upgrading our Storm 1 cluster to 2.2.0 and noticed a strange behavior. The topology is processing millions of messages fine. However, all of a sudden, processing just stopped. Capacity for all bolts got reduced to zero and stays at zero. Does anyone know what could possibly be going on here and how we can troubleshoot this issue?
> 
> Thanks,
> Binh
> 
> ________________________________
> 
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
> ______________________________________________________________________________________
> 
> www.accenture.com
>