You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Abhishek Raj <ab...@yahoo.com> on 2020/04/24 17:38:32 UTC

Backpressure with fieldsgrouping

Hi,

We have a topology which consumes data generated by multiple applications
via Kafka. The data for one application is aggregated in a single bolt task
using fieldsgrouping. All applications push data at different rates so some
executors of the bolt are busier/overloaded than others and capacity
distribution is non-uniform.

The problem we're facing now is that when there's a spike in data produced
by one (or more applications), capacity goes up for that executor, we see
frequent gc pauses and eventually the corresponding jvm crashes causing
worker restarts.

As an ideal solution, we want to slow down only the application(s) which
cause the spike. We cannot use the built in backpressure here because it
happens at the spout level and slows down the entire pipeline.

What are your thoughts on this? How can we fix this?

Thanks

Re: Backpressure with fieldsgrouping

Posted by Tarun Chabarwal <ta...@paytm.com>.

You can try adjusting *maxUncommittedOffsets* and *offsetCommitPeriodMs*
variables according to load and number of bolts. By adjusting these two
params you're ensuring how many packets can stay in your topology and by
adjusting these, you can pretty much control the capacity.

On Fri, Apr 24, 2020 at 11:09 PM Abhishek Raj <ab...@yahoo.com>
wrote:

> Hi,
>
> We have a topology which consumes data generated by multiple applications
> via Kafka. The data for one application is aggregated in a single bolt task
> using fieldsgrouping. All applications push data at different rates so some
> executors of the bolt are busier/overloaded than others and capacity
> distribution is non-uniform.
>
> The problem we're facing now is that when there's a spike in data produced
> by one (or more applications), capacity goes up for that executor, we see
> frequent gc pauses and eventually the corresponding jvm crashes causing
> worker restarts.
>
> As an ideal solution, we want to slow down only the application(s) which
> cause the spike. We cannot use the built in backpressure here because it
> happens at the spout level and slows down the entire pipeline.
>
> What are your thoughts on this? How can we fix this?
>
> Thanks
>

Re: Backpressure with fieldsgrouping

Posted by Rui Abreu <ru...@gmail.com>.

Hi Abhishek,

Have a look at KafkaSpoutConfig (and KafkaSpoutConfig.Builder),
particularly  the setMaxUncommittedOffsets
and ConsumerConfig.MAX_POLL_RECORDS_CONFIG
(kConfigBuilder.setProp(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, i)))

https://storm.apache.org/releases/current/javadocs/org/apache/storm/kafka/spout/KafkaSpoutConfig.Builder.html
https://kafka.apache.org/21/javadoc/index.html?org/apache/kafka/clients/consumer/ConsumerConfig.html

This will allow you to have more control at the Spout level.

Also, have a look at the documentation of Storm 2.x. The backpressure
mechanism has been changed:
http://storm.apache.org/releases/current/Performance.html

Backpressure model for Storm 2.0:
https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit#heading=h.w07mdxni7moh

https://issues.apache.org/jira/browse/STORM-2306


On Fri, 24 Apr 2020 at 18:39, Abhishek Raj <ab...@yahoo.com> wrote:

> Hi,
>
> We have a topology which consumes data generated by multiple applications
> via Kafka. The data for one application is aggregated in a single bolt task
> using fieldsgrouping. All applications push data at different rates so some
> executors of the bolt are busier/overloaded than others and capacity
> distribution is non-uniform.
>
> The problem we're facing now is that when there's a spike in data produced
> by one (or more applications), capacity goes up for that executor, we see
> frequent gc pauses and eventually the corresponding jvm crashes causing
> worker restarts.
>
> As an ideal solution, we want to slow down only the application(s) which
> cause the spike. We cannot use the built in backpressure here because it
> happens at the spout level and slows down the entire pipeline.
>
> What are your thoughts on this? How can we fix this?
>
> Thanks
>