You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2024/01/11 12:34:00 UTC

[jira] [Commented] (HADOOP-19033) S3A: disable checksum validation

    [ https://issues.apache.org/jira/browse/HADOOP-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805554#comment-17805554 ] 

ASF GitHub Bot commented on HADOOP-19033:
-----------------------------------------

steveloughran opened a new pull request, #6441:
URL: https://github.com/apache/hadoop/pull/6441

   HADOOP-19033. S3A: disable checksums when fs.s3a.checksum.validation == false
   
       
   Add new option fs.s3a.checksum.validation, default false, which
   is used when creating s3 clients.
   
   There's two interceptors in fs.s3a.audit.interceptors to assist
   here but it turns out they aren't needed. Retained in case they are,
   but we could cut them.
   
   
   ### How was this patch tested?
   
   Add a test in ITestS3AOpenCost to validate that disabling works.
   This has to use reflection to walk down the sdk filter stream chain; when checksums are
   enabled it fails because the bottom stream in the chain is one of the two checksum
   validating streams.
   
   ```
   Expecting:<so...@60179ebf>
   not to be an instance of:<software.amazon.awssdk.services.s3.internal.checksums.S3ChecksumValidatingInputStream>
   at org.apache.hadoop.fs.s3a.S3ATestUtils.assertStreamIsNotChecksummed(S3ATestUtils.java:1728)
   at org.apache.hadoop.fs.s3a.performance.ITestS3AOpenCost.testStreamIsNotChecksummed(ITestS3AOpenCost.java:173)
   ```
   
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   




> S3A: disable checksum validation
> --------------------------------
>
>                 Key: HADOOP-19033
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19033
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> AWS v2 sdk turns on client-side checksum validation; this kills performance
> Given we are using TLS to download from AWS s3, there's implicit channel checksumming going on on, that's along with the IPv4 TCP checksumming.
> We don't need it, all it does is slow us down.
> proposed: disable in DefaultS3ClientFactory
> I don't want to add an option to enable it as it only complicates life (yet another config option), but I am open to persuasion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org