You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "sean (Jira)" <ji...@apache.org> on 2022/05/13 19:21:00 UTC

[jira] [Created] (PARQUET-2146) AvroParquetWriter write to s3 bucket throws data intergrity exception

sean created PARQUET-2146:
-----------------------------

             Summary: AvroParquetWriter  write to s3 bucket throws data intergrity exception 
                 Key: PARQUET-2146
                 URL: https://issues.apache.org/jira/browse/PARQUET-2146
             Project: Parquet
          Issue Type: Bug
    Affects Versions: 1.12.2
            Reporter: sean


 

Hi, we are trying to use [org.apache.parquet.avro|https://www.tabnine.com/code/java/packages/org.apache.parquet.avro].AvroParquetWriter

to write parquet file to s3 bucket. The file is successfully written to s3 bucket but 

get an exception

com.amazonaws.SdkClientException: Unable to verify integrity of data upload.

The purpose is to resolve this exceptions while  The s3 bucket is encrypted with SSE-KMS not SSE-S3. 

 

It appears that the exceptions are thrown because of code blocks in the link below

[https://github.com/aws/aws-sdk-java/blob/fd409dee8ae23fb8953e0bb4dbde65536a7e0514/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1876]

From amazon doc, the etag is not same as MD5 when s3 bucket is encrypted with SSE-KMS

[https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]

 

*The possible way is to pass MD5 in request header or set system.property to disable validation in  skipMd5CheckStrategy.skipClientSideValidationPerPutResponse as indicated in link*

[https://github.com/aws/aws-sdk-java/blob/99fe75a823d4b02f4e90fa0dda06a1558d5617a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L42]

 The issue is that I do not find a proper way to inject such configurations into AvroParquetWriter. Is this possible? If yes, can you help to show how to do it? 

 

Thanks

 

Sean

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)