You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2023/06/28 20:06:00 UTC

[jira] [Updated] (HADOOP-18757) S3A Committer finalizing the commits in a single thread

     [ https://issues.apache.org/jira/browse/HADOOP-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-18757:
------------------------------------
    Summary: S3A Committer finalizing the commits in a single thread  (was: S3A Committer finalyzing the commits in a single thread)

> S3A Committer finalizing the commits in a single thread
> -------------------------------------------------------
>
>                 Key: HADOOP-18757
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18757
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.5
>            Reporter: Moditha Hewasinghage
>            Priority: Major
>              Labels: pull-request-available
>
> S3A Committer is being bottle-necked on the driver when finalizing the commits. It seems like only a single thread is being used to finalize the commit. In the experiment we are saving 36,000 files ending committing for almost 2 hours each file taking 0.1 - 0.5 seconds while all the executors stay idle while the driver commits. I have attached the driver log snippets to support this theory in comparison to spark 3.4.0.
> The most likely reason is the usage of [ThreadPoolExecutor]([https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html]) in 
> [https://github.com/apache/hadoop/blob/706d88266abcee09ed78fbaa0ad5f74d818ab0e9/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitContext.java#L239-L244]
> where an LinkedBlockingQueue is used with a corePoolSize of 0 which ends up with a single thread and no new threads created.
> From the documentation
> {code:java}
> Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution; for example, in a web page server. While this style of queuing can be useful in smoothing out transient bursts of requests, it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed.{code}
> Magic Committer spark 3.5.0-SNAPSHOT
> {code:java}
> 2023-05-26 15:35:04,852 DEBUG impl.CommitContext: creating thread pool of size 32
> 2023-05-26 15:35:04,922 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s) for resource profile id: 0.
> 2023-05-26 15:35:07,910 INFO commit.AbstractS3ACommitter: Starting: committing the output of 36000 task(s)
> 2023-05-26 15:35:07,914 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset
> 2023-05-26 15:35:07,941 DEBUG files.PersistentCommitData: Reading commit data from file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000000.pendingset
> 2023-05-26 15:35:08,036 DEBUG impl.CommitContext: creating thread pool of size 32
> 2023-05-26 15:35:08,037 DEBUG impl.CommitOperations: Committing single commit SinglePendingCommit
> {version=2, uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000000_1920855/__base/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', uploadId='SCp78J9aYOmnPwrtfd5a.Q6B9Zu6olQw3eZcpyt.W.BCu6.M6fz54nlTe2ZYUDicSskFgPocsjlVXAbfiPn3Xu 26MzXNdWcg5j_kBGg9iWpeoWh4K21gt7bbKetML95MXAck15yP.VGFeOleoAspUg--', created=1685113555232, saved=1685113555232, size=110636173, date='Fri May 26 15:05:55 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', taskId='attemp t_202305261454174742717183892533843_0031_m_000000_1920855', notes='', etags=[64b9d14a89890cf7f931ff05074f6b6a,7a8c22d95d85736806205ad0f84094a5,82f55a1b0595b38420c6b55cb2763a46,212a3105afcf4a9045730f885a622e0a,b563ec91932a66498 e8e9d717b625961,0425b7ae17a8a2af80734f0adf4debe9,496e9fbc08a583657fd29779dcd848d6,72bff33d090699baacb6a6171760a675,b2c6b9b76256416d304392bfbabca382,1a8f63d960cf50589b83b407ddfc77c7,d79cf49f16cd61fb00faf67034268529,e3b7ea2bd0ff f2e20a89eccd0ebbeb4a,8edc254a94902fee2884350b4a89f223,f0858853a4542b562c6973e28ee40c19]}
> 2023-05-26 15:35:08,037 INFO impl.CommitOperations: Starting: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet s
> ize 110636173
> 2023-05-26 15:35:08,198 DEBUG impl.CommitOperations: Successful commit of file length 110636173
> 2023-05-26 15:35:08,199 INFO impl.CommitOperations: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet size 110636
> 173: duration 0:00.162s
> 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
> 0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset: duration 0:00.294s
> 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset
> 2023-05-26 15:35:08,208 DEBUG files.PersistentCommitData: Reading commit data from file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000001.pendingset
> 2023-05-26 15:35:08,232 DEBUG impl.CommitOperations: Committing single commit SinglePendingCommit
> {version=2, uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000001_1920856/__base/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', uploadId='I.5Kns.n5JKLbupr5.AWbzApL9YDGhQyI438WQT3JXUjevkQS8u1aQwQhkVbwBoxnFXItCbKJCS8EwPdtoijtc UFrjU0eU3pNvt7lwcYRJ0iklLwXexHbYMpZq3pDj0udwsHCq0trnqyq2Ee.o.prw--', created=1685113548923, saved=1685113548923, size=110570244, date='Fri May 26 15:05:48 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', taskId='attemp t_202305261454174742717183892533843_0031_m_000001_1920856', notes='', etags=[835dd18d4ce995c22c1a84de5cd26fd8,c1f2acd3360b2f94ce90ef0d80edd7e5,208787fce146d917dbc50b1cf4adff4c,a66b228379201a0a278061b096dc2faf,58134b7ae228f0d2a 6879ec53a3f35c1,e847a7614639d3f1a43a1b3090b603e1,2d8eac2399238e6d356fe0bdfafd076e,e0c3a964ee2a3c270f67e7e4a7791440,84b05192e896e41e1d16aa363f0eb48b,c3cf764e4361da4573b8dfa6361e4174,78b145f682946c7e18750758da3578cd,d9cd438899d2 0522668b9cd7f61cc098,e8b3944929f17d927ed7aef31800bdbf,52d279a75e9382ead94fe4d9b81c3bf9]}
> 2023-05-26 15:35:08,232 INFO impl.CommitOperations: Starting: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet s
> ize 110570244
> 2023-05-26 15:35:08,438 DEBUG impl.CommitOperations: Successful commit of file length 110570244
> 2023-05-26 15:35:08,438 INFO impl.CommitOperations: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet size 110570
> 244: duration 0:00.206s
> 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
> 0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset: duration 0:00.240s
> 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000002.pendingset
> 2023-05-26 15:35:08,448 DEBUG files.PersistentCommitData: Reading commit data from file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000002.pendingset
> 2023-05-26 15:35:08,489 DEBUG impl.CommitOperations: Committing single commit SinglePendingCommit
> {version=2, uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000002_1920857/__base/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet', uploadId='AB982jhG0EAfnSxXWphX7R.QfqrkS2ipP6APqJZYGlqDl335WX4X2xn81wy7l2i5NBQrvG.eVcQgYFudk51wi5 EMDJRNVz__SEWyF0xTncpIHpaC5xn8La.AY8gzCHQrAfRJ83nErQhe4Idmhcs04w--', created=1685113553072, saved=1685113553072, size=110136058, date='Fri May 26 15:05:53 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', taskId='attemp t_202305261454174742717183892533843_0031_m_000002_1920857', notes='', etags=[f043628528fc9bdd6db4107fd175eb1d,11e36b4bde35bdb1d2e160ffd5fcc3f6,5105cb3ebee1b27cfec66f7682679316,1a85e5c9477e855e38f7bd4e81bed28e,7c57360865e12c19b 3125cbfb8c786f8,59d928629e7afff4c4c038a1dadd0ce7,bfcedee2f51b45c20a097fbbae85a48f,7e3ff2665714d8cdc6ec20134856eb4c,0117e5c47f94a9ff4d8d0eafd1f6b76e,3c4f27e69544e4cc5297c56413231639,a32cd6422aaae63bdcbaafa7400ab889,1d5d07f0a0b2 58773993a456bc7ec7ee,a220e786aa36f643edc330a184e23d88,e19a85ab3accaa4dc697e7dfbf5d5325]}
> 2023-05-26 15:35:08,489 INFO impl.CommitOperations: Starting: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet s
> ize 110136058
> 2023-05-26 15:35:08,646 DEBUG impl.CommitOperations: Successful commit of file length 110136058
> 2023-05-26 15:35:08,646 INFO impl.CommitOperations: Committing file <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet size 110136058: duration 0:00.157s
> {code}
> Magic Committer spark 3.4.0
>  
> {code:java}
> 2023-05-24 09:47:19,906 INFO yarn.YarnAllocator: Driver requested a total number of 524 executor(s) for resource profile id: 0.
> 2023-05-24 09:47:23,064 INFO commit.AbstractS3ACommitter: Starting: committing the output of 36000 task(s)
> 2023-05-24 09:47:23,069 DEBUG commit.AbstractS3ACommitter: Task committer attempt_202305240933298436217226765687147_0000_m_000000_0: creating thread pool of size 32
> 2023-05-24 09:47:23,074 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,074 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000006.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
> 2023-05-24 09:47:23,078 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org