You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/06/17 09:26:00 UTC

[jira] [Commented] (HADOOP-18298) Hadoop AWS | Staging committer Multipartupload not completing on minio

    [ https://issues.apache.org/jira/browse/HADOOP-18298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555499#comment-17555499 ] 

Steve Loughran commented on HADOOP-18298:
-----------------------------------------

I'm sorry but I believe you have completely failed to realise the fundamental aspect of the S3A committers

h2. We do not complete multipart uploads in task commit, because that is what we do in job commit.


The delayed commit is the core, critical part of the entire algorithm. So your statment "uploadFileToPendingCommit" doesn't finish the upload is correct. that is why it is called uploadFileToPendingCommit and not uploadFile. Delaying the manifestation of the upload is how we ensure that no intermediate the data is visible until job commit. And this allows for speculation and for task failure during both task execution and task commit. It also insures that if the entire job fails at any point prior to job commit, none of the work is visible.
 
For more information please read the algorithm
https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_release_2021-05-17

If you have found errors in the algorithm especially the correctness of the protocol, welcome to submit changes, ideally including proofs of correctness.

And if you find that is a problem with things working on Minio, well, Minio has quirks.

My suggestion to you is to run the entire Hadoop-aws integration test suites against your S3 server. These include running MapReduce jobs against the store and verifying that the output is present and correct. Precisely because we do this against AWS S3, I am confident it works. That and the little *detail* that we have been using this in production for 3+ years.

I am going to close this issue as invalid. I have however changed the title to make clear that minio may be a factor.

> Hadoop AWS | Staging committer Multipartupload not completing on minio
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-18298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18298
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.1
>         Environment: minio
>            Reporter: Ayush Goyal
>            Priority: Major
>
> In Hadoop aws staging committer(org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter), Committer uploads files from local to s3(method- commitTaskInternal) which calls uploadFileToPendingCommit of CommitOperation to upload file using multipart upload.
>  
> Multipart upload consists of three steps-
> 1)Initialise multipartupload.
> 2) Breaks the file to part and upload Parts.
> 3) Merge all the parts of files and finalize multipart.
>  
> In the implementation of uploadFileToPendingCommit, first 2 steps are implemented. However, 3rd part is missing which leads to uploading the parts file but because it is not merged at the end of job no files are there in destination directory.
>  
> S3 logs before implement 3rd steps-
>  
> {code:java}
> 2022-05-30T13:49:31:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               8.677ms      ↑ 137 B ↓ 724 B
> 2022-05-30T13:49:31:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploadId=f3beae8e-3001-48be-9bc4-306b71940e50&partNumber=1  240b:c1d1:123:664f:c5d2:2::                443.156ms    ↑ 51 KiB ↓ 325 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_SUCCESS%2F&fetch-owner=false  240b:c1d1:123:664f:c5d2:2::                3.414ms      ↑ 137 B ↓ 646 B
> 2022-05-30T13:49:32:000 [200 OK] s3.PutObject localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_SUCCESS 240b:c1d1:123:664f:c5d2:2::                52.734ms     ↑ 8.7 KiB ↓ 380 B
> 2022-05-30T13:49:32:000 [200 OK] s3.DeleteMultipleObjects localhost:9000/minio-feature-testing/?delete  240b:c1d1:123:664f:c5d2:2::                73.954ms     ↑ 350 B ↓ 432 B
> 2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_temporary 240b:c1d1:123:664f:c5d2:2::                2.658ms      ↑ 137 B ↓ 291 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_temporary%2F&fetch-owner=false  240b:c1d1:123:664f:c5d2:2::                 4.807ms      ↑ 137 B ↓ 648 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListMultipartUploads localhost:9000/minio-feature-testing/?uploads&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F  240b:c0e0:102:553e:b4c2:2::               1.081ms      ↑ 137 B ↓ 776 B
> 2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/.spark-staging-ce0a965f-622a-4950-bb4b-550470883134 240b:c1d1:123:664f:c5d2:2::                 5.68ms       ↑ 137 B ↓ 291 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2 localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F.spark-staging-ce0a965f-622a-4950-bb4b-550470883134%2F&fetch-owner=false  240b:c1d1:123:664f:c5d2:2::              2.452ms      ↑ 137 B ↓ 689 B
>   {code}
> Here , After s3.PutObjectPart there is no completeMultipartupload call for 3rd step.
>  
> S3 logs after implement 3rd steps-
>  
> {code:java}
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               9.116ms      ↑ 137 B ↓ 750 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               9.416ms      ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               8.506ms      ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               9.815ms      ↑ 137 B ↓ 750 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               10.09ms      ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               9.851ms      ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               9.006ms      ↑ 137 B ↓ 750 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               9.217ms      ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f&partNumber=1  240b:c1d1:123:664f:c5d2:2::               817.474ms    ↑ 52 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=782769d0-43f1-43b8-aae0-54ac4c8c6603&partNumber=1  240b:c1d1:123:664f:c5d2:2::               818.363ms    ↑ 85 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c&partNumber=1  240b:c1d1:123:664f:c5d2:2::               819.765ms    ↑ 54 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c7e09609-6193-4d41-bc05-4020291725e4&partNumber=1  240b:c1d1:123:664f:c5d2:2::               818.782ms    ↑ 55 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590&partNumber=1  240b:c1d1:123:664f:c5d2:2::               817.97ms     ↑ 51 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=8fe799e3-c712-43b7-a074-a2359232de07&partNumber=1  240b:c1d1:123:664f:c5d2:2::               819.183ms    ↑ 80 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c2e1477b-5457-4cbe-8fdb-4e80eaca63fe&partNumber=1  240b:c1d1:123:664f:c5d2:2::               818.126ms    ↑ 53 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=992167c8-fbde-4a0d-bd4d-5ce7ddd51a87&partNumber=1  240b:c1d1:123:664f:c5d2:2::               818.176ms    ↑ 56 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f  240b:c1d1:123:664f:c5d2:2::               632.761ms    ↑ 272 B ↓ 1.1 KiB
> 2022-06-17T10:56:13:000 [200 OK] s3.NewMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads  240b:c1d1:123:664f:c5d2:2::               6.231ms      ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590  240b:c1d1:123:664f:c5d2:2::               697.946ms    ↑ 272 B ↓ 1.1 KiB
> 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c  240b:c1d1:123:664f:c5d2:2::               714.377ms    ↑ 272 B ↓ 1.1 KiB
>  {code}
>  
>  
> Needs to be implement -
>  
> After uploadPart call and all upload id's are added to commitData, innerCommit should be called.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org