You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2023/07/06 16:35:00 UTC
[jira] [Commented] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints

    [ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740694#comment-17740694 ] 

Steve Loughran commented on HADOOP-18776:
-----------------------------------------

i see; and its not so much from the multi process that you get speedup as the fact that tasks commit as the job goes along; only the final task is on the critical path.

there's also HADOOP-18757, which implies that we aren't threading enough in the magic committer. even there, with thousands of files you'd suffer.

bq. This issue can be solved by using this type of committer only for the use case where there is no task attempts and if any of the taskAttempts fails the job will also fail


that was the intent of the v2 committer, but too many people discovered a "faster" committer and switched to it without realising the flaw; at some point it even became the default. I don't want to enable people to make the same mistake again, sorry. 

so afraid i don't want this in. sorry. But if you do it externally we can add a link to it in the docs with the "conditions needed for use" section well covered.


now, if spark was modified to recognise that timeout/failure meant job failure, then we could think about it as the failure would then be observable to all.

> Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-18776
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18776
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>
> The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs.
> The following are the differences in MagicCommitter vs OptimizedMagicCommitter
>  
> ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*||
> |commitTask    |1. Lists all {{.pending}} files in its attempt directory.
>  
> 2. The contents are loaded into a list of single pending uploads.
>  
> 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory
>  
> 2. The contents are loaded into a list of single pending uploads.
>  
> 3. For each pending upload, commit operation is called (complete multiPartUpload)|
> |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory
>  
> 2. Then every pending commit in the job will be committed.
>  
> 3. "SUCCESS" marker is created (if config is enabled)
>  
> 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled)
>  
> 2.  "__magic" directory is cleaned up.|
>  
> *Performance Benefits :-*
>  # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads).
>  # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver.
>  
> *TradeOffs :-*
> The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such
>  # During execution, intermediate data becomes visible after commitTask operation
>  # On a failure, all output must be deleted and the job needs to be restarted.
>  
> *Performance Benchmark :-*
> Cluster : c4.8x large (ec2-instance)
> Instance : 1 (primary) + 5 (core)
> Data Size : 3TB Partitioned(TPC-DS store_sales data)
> Engine     : Apache Spark 3.3.1 / Hadoop 3.3.3
> Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations)
> {code:java}
> insert into <table> select ss_quantity from store_sales; {code}
> ||Committer||Iteration 1||Iteration 2||Iteration 3||
> |Magic|126|127|122|
> |OptimizedMagic|50|51|58|
> So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter.
>  
> _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org