You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/04/27 10:28:00 UTC

[jira] [Commented] (HADOOP-15421) Stabilise/formalise the JSON _SUCCESS format used in the S3A committers

    [ https://issues.apache.org/jira/browse/HADOOP-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456209#comment-16456209 ] 

Steve Loughran commented on HADOOP-15421:
-----------------------------------------

+ [~rdblue]. ~[~jzhuge] I know iceberg has its own format which uses unique filenames to avoid update inconsistency, but they might have some suggestions here.

Current format: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java

There's always been a version marker in the file, to allow us to switch to a new format & let tests discover this by checking the version field alone...

> Stabilise/formalise the JSON _SUCCESS format used in the S3A committers
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-15421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15421
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> the S3A committers rely on an atomic PUT to save a JSON summary of the job to the dest FS, containing files, statistics, etc. This is for internal testing, but it turns out to be useful for spark integration testing, Hive, etc.
> IBM's stocator also generated a manifest.
> Proposed: come up with (an extensible) design that we are happy with as a long lived format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org