You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Dave Maughan <da...@gmail.com> on 2016/10/06 11:07:54 UTC

S3AFileSystem & read-after-write consistency

Hi,

I'm investigating S3's read-after-write consistency model with
S3AFileSystem and something is not quite clear to me, so I'm hoping someone
more knowledgeable can clarify it for me.

Amazon state (
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html):

*    "Amazon S3 provides read-after-write consistency for PUTS of new
objects in your S3 bucket in all regions with one caveat. **The caveat is
that if you make a HEAD or GET request to the key name (to find if the
object exists) before creating the object, Amazon S3 provides eventual
consistency for read-after-write".*

In S3FileSystem, create -> exists -> getFileStatus ->
AmazonS3Client.getObjectMetadata (HEAD).

Does this mean that currently, S3AFileSystem cannot take advantage of S3's
read-after-write consistency?

Thanks
- Dave

Re: S3AFileSystem & read-after-write consistency

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Dave,

You are correct that S3A currently may suffer unexpected effects from eventual consistency due to negative caching on the S3 side for the initial HEAD request.  In practice, I have never seen any negative consequences from this particular aspect of S3 eventual consistency, but in theory the problem is possible.

If you are interested in mitigating the effects of S3 eventual consistency, then you might be interested in watching development of the S3Guard project, tracked in Apache JIRA HADOOP-13345.

https://issues.apache.org/jira/browse/HADOOP-13345

To summarize, we plan to support use of an external store with strong consistency guarantees for S3A file system metadata.  In the interaction you described, we could consult the consistent metadata store instead of sending a HEAD request to S3 to determine if the object already exists.

--Chris Nauroth

From: Dave Maughan <da...@gmail.com>
Date: Thursday, October 6, 2016 at 4:07 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: S3AFileSystem & read-after-write consistency

Hi,

I'm investigating S3's read-after-write consistency model with S3AFileSystem and something is not quite clear to me, so I'm hoping someone more knowledgeable can clarify it for me.

Amazon state (http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html):

    "Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write".

In S3FileSystem, create -> exists -> getFileStatus -> AmazonS3Client.getObjectMetadata (HEAD).

Does this mean that currently, S3AFileSystem cannot take advantage of S3's read-after-write consistency?

Thanks
- Dave