You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/12/01 17:25:57 UTC

[GitHub] [hadoop] steveloughran commented on pull request #5172: HADOOP-18543. AliyunOSSFileSystem#open(Path path, int bufferSize) use buffer size as its downloadPartSize

steveloughran commented on PR #5172:
URL: https://github.com/apache/hadoop/pull/5172#issuecomment-1334108437

   sorry, but I'm going to say -1 to using the normal IO buffer size as the GET range. The default value of 4k is way too small even for parquet/orc reads, it will break all existing apps in performance terms: distcp, parquet library, avro, ORC, everything, as they all use the default value.
   
   1. there is a configuration option for multipart download size, which is filesystem-wide. Not as flexible, but something everyone will expect to work.
   2. If you want better control of read policy, buffer sizes etc, then this connector needs to implement openFile(), as s3a and abfs do. that will let you add a new option to specify the range for GET calls.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org