You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2016/05/28 15:03:12 UTC

[jira] [Commented] (HADOOP-13211) Swift driver should have a configurable retry feature when ecounter 5xx error

    [ https://issues.apache.org/jira/browse/HADOOP-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305433#comment-15305433 ] 

Steve Loughran commented on HADOOP-13211:
-----------------------------------------

There's a retry policy mechanism built into Hadoop IPC, which can choose different actions based on the exception class. It'd be good to use that, and to make it configurable.

Network exceptions (socket, unknown host, etc) may be transient, may not —they'll need a bounded retry with backoff a bit of jitter. Things like a 500 error, probably the same. Any other exception 404, 401, etc: I'd keep as a failure

Where retry may be most invaluable is on upload: you really don't want an upload to fail from a transient failure, as that will then lose data. If something fails on a Get, well, client code in a hadoop cluster should be designed to deal with failures from time to time

> Swift driver should have a configurable retry feature when ecounter 5xx error
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-13211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13211
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/swift
>    Affects Versions: 2.7.2
>            Reporter: Chen He
>            Assignee: Chen He
>
> In current code. if Swift driver meets a HTTP 5xx, it will throw exception and stop. As a driver, it will be more sophisticate if it can retry a configurable times before report failure. There are two reasons that I can image:
> 1. if the server is really busy, it is possible that the server will drop some requests to avoid DDoS attack.
> 2. If server accidentally unavailable for a short period of time and come back again, we may not need to fail the whole driver. Just record the exception and retry may be more flexible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org