You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Syed Shameerur Rahman (Jira)" <ji...@apache.org> on 2023/06/22 05:58:00 UTC
[jira] [Comment Edited] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints

    [ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735980#comment-17735980 ] 

Syed Shameerur Rahman edited comment on HADOOP-18776 at 6/22/23 5:57 AM:
-------------------------------------------------------------------------

[~stevel@apache.org] - Thanks a lot for taking a took at this.

I fully understand your concerns. I am also aware of the same.

> "it lacks the ability to recover from task failure"

Yes this is true. When a task fails or the task JVM crashes in commitTask operation. Some files gets committed(visible) in the final path and some may not. If task re-attempts are enabled, A new task will come up and will write the files leading to duplicate(some) data in the final path. This issue can be solved by using this type of committer only for the use case where there is no task attempts and if any of the taskAttempts fails the job will also fail.

This can still have files written by the failed taskAttempts in the final path but then since the job had failed, The user can clear off the data manually and re-run the same job. I guess the same issue is still possible with MagicS3ACommitter as well, Since commitJob is not atomic and if the Job Driver JVM crashes in commitJob operation it can also lead to some files being visible in the final path.

 

> Finally, I'd love to know size of jobs where you hit problems, use etc. If there's anything you can say publicly, that'd be great

My use case was, I had to write large number of files in a single query and since commitJob is single process(multi-threaded as opposed to distributed in the proposed use-case) which needs to call complete MPU for all these files it can become a bottleneck and hence explored other options  ({*}~2.3x{*} faster as compared to MagicCommitter.)

 

So my understanding is that when there is max 1 taskAttempt, This committer tend to behave similar (with same grantees) as MagicCommitter and hence can be used on specific use-cases.


was (Author: srahman):
[~stevel@apache.org] - Thanks a lot for taking a took at this.

I fully understand your concerns. I am also aware of the same.

> "it lacks the ability to recover from task failure"

Yes this is true. When a task fails or the task JVM crashes in commitTask operation. Some files gets committed(visible) in the final path and some may not. If task re-attempts are enabled, A new task will come up and will write the files leading to duplicate(some) data in the final path. This issue can be solved by using this type of committer only for the use case where there is no task attempts and if any of the taskAttempts fails the job will also fail.

This can still have files written by the failed taskAttempts in the final path but then since the job had failed, The user can clear off the data manually and re-run the same job. I guess the same issue is still possible with MagicS3ACommitter as well, Since commitJob is not atomic and if the Job Driver JVM crashes in commitJob operation it can also lead to some files being visible in the final path.

 

> Finally, I'd love to know size of jobs where you hit problems, use etc. If there's anything you can say publicly, that'd be great

My use case was, I had to write large number of files in a single query and since commitJob is single process which needs to call complete MPU for all these files it can become a bottleneck and hence explored other options  ({*}~2.3x{*} faster as compared to MagicCommitter.)

 

So my understanding is that when there is max 1 taskAttempt, This committer tend to behave similar (with same grantees) as MagicCommitter and hence can be used on specific use-cases.

> Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-18776
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18776
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>
> The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs.
> The following are the differences in MagicCommitter vs OptimizedMagicCommitter
>  
> ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*||
> |commitTask    |1. Lists all {{.pending}} files in its attempt directory.
>  
> 2. The contents are loaded into a list of single pending uploads.
>  
> 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory
>  
> 2. The contents are loaded into a list of single pending uploads.
>  
> 3. For each pending upload, commit operation is called (complete multiPartUpload)|
> |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory
>  
> 2. Then every pending commit in the job will be committed.
>  
> 3. "SUCCESS" marker is created (if config is enabled)
>  
> 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled)
>  
> 2.  "__magic" directory is cleaned up.|
>  
> *Performance Benefits :-*
>  # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads).
>  # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver.
>  
> *TradeOffs :-*
> The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such
>  # During execution, intermediate data becomes visible after commitTask operation
>  # On a failure, all output must be deleted and the job needs to be restarted.
>  
> *Performance Benchmark :-*
> Cluster : c4.8x large (ec2-instance)
> Instance : 1 (primary) + 5 (core)
> Data Size : 3TB Partitioned(TPC-DS store_sales data)
> Engine     : Apache Spark 3.3.1
> Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations)
> {code:java}
> insert into <table> select ss_quantity from store_sales; {code}
> ||Committer||Iteration 1||Iteration 2||Iteration 3||
> |Magic|126|127|122|
> |OptimizedMagic|50|51|58|
> So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter.
>  
> _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org