You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/06/27 17:36:00 UTC

[jira] [Resolved] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

     [ https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran resolved HADOOP-13811.
-------------------------------------
    Resolution: Cannot Reproduce

No Vinod, this is nothing to do with HADOOP-13786 except the improved retry logic there may mean that transient problems go away.

I've actually got some insight on a possible cause of [~lminer]'s stack trace by way of Ryan Blue and the Netflix Experience.

# the V1 list API experience always returns 5000 entries (as set in {{fs.s3a.paging.maximum}}
# except for the final entry
# if you have versioning turned on in your bucket, deleted entries retain tombstone markers with references to their versions
# which will surface in the S3-side of list calls, but get stripped out from the response
# so...for a very large tree, you may end up S3 having to keep a channel open while is skips of thousands to millions of deleted objects before it can find actual ones to return.
# which can time out connections.

The v2 API apparently fixes this by returning smaller pages when needed. With the move to v2 by default (HADOOP-13421), this error may have gone away. Marking the issue as related to that and closing as Cannot-Reproduce

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13811
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13811
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org