You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/09/12 14:16:00 UTC

[jira] [Comment Edited] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling

    [ https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162889#comment-16162889 ] 

Steve Loughran edited comment on HADOOP-13761 at 9/12/17 2:15 PM:
------------------------------------------------------------------

Note that the DDB throttled EX is 400; you can't use RESTy error code logic here,

Also, in {{waitForTableActive()}} failures are remapped to Illegal Argument Exceptions.

+ that stack trace comes from some code which is meant to be handling throttling. Conclusion: either the logic is wrong or its giving up too early


was (Author: stevel@apache.org):
Note that the DDB throttled EX is 400; you can't use RESTy error code logic here,

Also, in {{waitForTableActive()}} failures are remapped to Illegal Argument Exceptions.


> S3Guard: implement retries for DDB failures and throttling
> ----------------------------------------------------------
>
>                 Key: HADOOP-13761
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13761
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Steve Loughran
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will create a separate JIRA for this as it will likely require interface changes (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing to make sure we're covered.  Failure injection tests can be a separate JIRA to make this easier to review.
> We also need basic configuration parameters around retry policy.  There should be a way to specify maximum retry duration, as some applications would prefer to receive an error eventually, than waiting indefinitely.  We should also be keeping statistics when inconsistency is detected and we enter a retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org