You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Simon Cooper <si...@featurespace.co.uk> on 2020/05/13 16:57:02 UTC

Backpressure causing deadlock with recursive topology structure

Hi,

We've encountered a problem with the new backpressure system introduced in Storm2. We've got two mutually recursive bolts in our topology (BoltA sends tuples to BoltB, which sends tuples back to BoltA). This worked fine in Storm1, but causes the topology to randomly deadlock on Storm2.

When BoltA task starts to bottleneck and sets the backpressure flag, this sends a signal to the worker running the corresponding BoltB task. This stops that task sending any more tuples until the flag is cleared. This also means the bolt cannot process any new tuples. This causes the input queue of BoltB task to fill up, which then sets the backpressure flag sent to BoltA task. We end up in a situation where neither bolt can send tuples to the other, and this causes the whole topology to grind to a halt. Failing tuples doesn't fix it, as tuples aren't removed from the input or output queues.

Are there any suggestions for settings which may alleviate this? Ideally, to turn the backpressure system off, or render it mute? If we can't find a workaround, we'll probably be forced to go back to Storm1.

A possible fix may be to allow the task to consume items from the queue, as long as it doesn't try to send any tuples to a backpressure'd task?

Many thanks,
Simon Cooper
This message, and any files/attachments transmitted together with it, is intended for the use only of the person (or persons) to whom it is addressed. It may contain information which is confidential and/or protected by legal privilege. Accordingly, any dissemination, distribution, copying or use of this message, or any part of it or anything sent together with it, other than by intended recipients, may constitute a breach of civil or criminal law and is hereby prohibited. Unless otherwise stated, any views expressed in this message are those of the person sending it and not the sender's employer. No responsibility, legal or otherwise, of whatever nature, is accepted as to the accuracy of the contents of this message or for the completeness of the message as received. Anyone who is not the intended recipient of this message is advised to make no use of it and is requested to contact Featurespace Limited as soon as possible. Any recipient of this message who has knowledge or suspects that it may have been the subject of unauthorised interception or alteration is also requested to contact Featurespace Limited.

Re: Backpressure causing deadlock with recursive topology structure

Posted by Ethan Li <et...@gmail.com>.
This is an interesting use case. But my understanding of storm is it only supports DAG.

But to mitigate this issue, I think there are a few things that can be tried.

1. Enable acking, and anchor the tuple before emitting, then set the max.spout.pending (https://github.com/apache/storm/blob/master/conf/defaults.yaml#L265 <https://github.com/apache/storm/blob/master/conf/defaults.yaml#L265>) so the number of tuples in flight can be controlled to a certain range

2. Increase receiveQ size for executors (https://github.com/apache/storm/blob/master/conf/defaults.yaml#L317 <https://github.com/apache/storm/blob/master/conf/defaults.yaml#L317>) so that backpressure can happen less likely

3. Increase parallelism, cpu, memory for your components to let them process faster.


Combing above, your topology is less likely to have backpressure.  There should be other ways. But above is my current thought at this moment.  Hope it helps.

-Ethan


> On May 13, 2020, at 11:57 AM, Simon Cooper <si...@featurespace.co.uk> wrote:
> 
> Hi,
>  
> We've encountered a problem with the new backpressure system introduced in Storm2. We've got two mutually recursive bolts in our topology (BoltA sends tuples to BoltB, which sends tuples back to BoltA). This worked fine in Storm1, but causes the topology to randomly deadlock on Storm2.
>  
> When BoltA task starts to bottleneck and sets the backpressure flag, this sends a signal to the worker running the corresponding BoltB task. This stops that task sending any more tuples until the flag is cleared. This also means the bolt cannot process any new tuples. This causes the input queue of BoltB task to fill up, which then sets the backpressure flag sent to BoltA task. We end up in a situation where neither bolt can send tuples to the other, and this causes the whole topology to grind to a halt. Failing tuples doesn't fix it, as tuples aren't removed from the input or output queues.
>  
> Are there any suggestions for settings which may alleviate this? Ideally, to turn the backpressure system off, or render it mute? If we can't find a workaround, we'll probably be forced to go back to Storm1.
>  
> A possible fix may be to allow the task to consume items from the queue, as long as it doesn't try to send any tuples to a backpressure'd task?
>  
> Many thanks,
> Simon Cooper
> This message, and any files/attachments transmitted together with it, is intended for the use only of the person (or persons) to whom it is addressed. It may contain information which is confidential and/or protected by legal privilege. Accordingly, any dissemination, distribution, copying or use of this message, or any part of it or anything sent together with it, other than by intended recipients, may constitute a breach of civil or criminal law and is hereby prohibited. Unless otherwise stated, any views expressed in this message are those of the person sending it and not the sender's employer. No responsibility, legal or otherwise, of whatever nature, is accepted as to the accuracy of the contents of this message or for the completeness of the message as received. Anyone who is not the intended recipient of this message is advised to make no use of it and is requested to contact Featurespace Limited as soon as possible. Any recipient of this message who has knowledge or suspects that it may have been the subject of unauthorised interception or alteration is also requested to contact Featurespace Limited.