You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/07/23 19:31:00 UTC

[jira] [Created] (HADOOP-15625) S3A input stream to use etags to detect changed source files

Steve Loughran created HADOOP-15625:
---------------------------------------

             Summary: S3A input stream to use etags to detect changed source files
                 Key: HADOOP-15625
                 URL: https://issues.apache.org/jira/browse/HADOOP-15625
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 3.2.0
            Reporter: Brahma Reddy Battula


S3A input stream doesn't handle changing source files any better than the other cloud store connectors. Specifically: it doesn't noticed it has changed, caches the length from startup, and whenever a seek triggers a new GET, you may get one of: old data, new data, and even perhaps go from new data to old data due to eventual consistency.

We can't do anything to stop this, but we could detect changes by

# caching the etag of the first HEAD/GET (we don't get that HEAD on open with S3Guard, BTW)
# on future GET requests, verify the etag of the response
# raise an IOE if the remote file changed during the read.

It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org