You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Arun Mahadevan (JIRA)" <ji...@apache.org> on 2017/01/12 05:32:16 UTC

[jira] [Comment Edited] (STORM-2282) Streams api - provide options for handling errors

    [ https://issues.apache.org/jira/browse/STORM-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820222#comment-15820222 ] 

Arun Mahadevan edited comment on STORM-2282 at 1/12/17 5:31 AM:
----------------------------------------------------------------

[~roshan_naik]

bq. Its ok to keep retry-ing errors that are considered retry-worthy. The non-retry worthy are the ones we must fail fast and move them out as they will cause a jam and prevent good tuples from flowing. Also no good way to recover from that.

bq.One approach to do this...
bq.For non-retry worthy error conditions (like bad data), any processing element in the pipeline can throw a specific exception. This can then be handled by the runtime to send that tuple to a configurable dead letter queue. Ideally this needs a new kind of Fail() notification in the spout to avoid re-emit. Alternatively, we can send the spout an ACK instead of FAIL to avoid retry. The DeadLetterQ bolt's metrics will capture these failure metrics. Better not to have each spout/bolt explicitly deal with dead letter queue ... that will complicate the topology definition...as every spout bolt will need to be configured and wired up.
bq. For retry-worthy errors (timeouts, destination unavailable, etc)... The existing retry mechanism can kick in. However, in today's core API, there is one pain point for spout writers. Each spout needs to implement the logic to track inflight tuples and attempt retry on fail(). The implementation is moderately complicated as ACKs/Fails can come in any order. All the spouts have to do the same thing but end up doing slightly differently. Some have retry limits, some don't. This retry logic should ideally be lifted out of the Spout and handled in the API. This new API is a good opportunity to address this issue.

Yes the idea is good if we can expose the right api for users and also keep the implementation simple. At a high level there could be a global api at StreamBuilder or a more granular one at a specific stage in the pipeline. Something like,

{code:java}
// globally at stream builder
streamBuilder.setRetryPolicy(...);
Stream<T> errors = streamBuilder.deadLetterQueue();

// at stream api level
Stream<T>[] streams = stream.map(..).branchErrors();
Stream<T> success = stream[0];
Stream<T> errors = streams[1];
{code}

We also need to see how it will work with Storm's current timeout based replay mechanism and so on. We can discuss further and come up with the right approach.


was (Author: arunmahadevan):
bq. Its ok to keep retry-ing errors that are considered retry-worthy. The non-retry worthy are the ones we must fail fast and move them out as they will cause a jam and prevent good tuples from flowing. Also no good way to recover from that.

bq.One approach to do this...
bq.For non-retry worthy error conditions (like bad data), any processing element in the pipeline can throw a specific exception. This can then be handled by the runtime to send that tuple to a configurable dead letter queue. Ideally this needs a new kind of Fail() notification in the spout to avoid re-emit. Alternatively, we can send the spout an ACK instead of FAIL to avoid retry. The DeadLetterQ bolt's metrics will capture these failure metrics. Better not to have each spout/bolt explicitly deal with dead letter queue ... that will complicate the topology definition...as every spout bolt will need to be configured and wired up.
bq. For retry-worthy errors (timeouts, destination unavailable, etc)... The existing retry mechanism can kick in. However, in today's core API, there is one pain point for spout writers. Each spout needs to implement the logic to track inflight tuples and attempt retry on fail(). The implementation is moderately complicated as ACKs/Fails can come in any order. All the spouts have to do the same thing but end up doing slightly differently. Some have retry limits, some don't. This retry logic should ideally be lifted out of the Spout and handled in the API. This new API is a good opportunity to address this issue.

Yes the idea is good if we can expose the right api for users and also keep the implementation simple. At a high level there could be a global api at StreamBuilder or a more granular one at a specific stage in the pipeline. Something like,

{code:java}
// globally at stream builder
streamBuilder.setRetryPolicy(...);
Stream<T> errors = streamBuilder.deadLetterQueue();

// at stream api level
Stream<T>[] streams = stream.map(..).branchErrors();
Stream<T> success = stream[0];
Stream<T> errors = streams[1];
{code}

We also need to see how it will work with Storm's current timeout based replay mechanism and so on. We can discuss further and come up with the right approach.

> Streams api - provide options for handling errors
> -------------------------------------------------
>
>                 Key: STORM-2282
>                 URL: https://issues.apache.org/jira/browse/STORM-2282
>             Project: Apache Storm
>          Issue Type: Sub-task
>            Reporter: Arun Mahadevan
>
> Adding relevant discussions from PR 1693 below.
> Allow users to be explicit about how to handle errors. I don't know if any API out there does it... so this would be unique to Storm.
> In short:
> Broadly speaking there are two kinds of tuple process errors that the users need to be concerned about:
> 1- Retry worthy Errors - For instance, failure to deliver to destination service due to connection issues.
> 2- Not worth retrying - For instance, Parsing errors due to bad data in the tuple. These problems can jam up the pipeline if they are retried repeatedly. Such tuples can be sent to a configurable Dead-Letter-Queue.
> ---
> >1- Retry worthy Errors
> Right now theres no explicit `fail` api. When a stage in the stream completes processing (and possibly emits results), the underlying tuples are acked automatically. The only way spout will re-emit is after the message timeout. It may be good to have a fail fast api, but I am not sure how it would help. The replayed tuple could fail again in processing. Instead the processing logic itself can have some retry logic (say retry 3 times) and forward to an error stream  and ack the tuple.
> > 2- Not worth retrying
> This can be handled via branch logic. E.g. send valid values to stream1 and bad values to stream2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)