You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/11/09 04:08:06 UTC

[GitHub] [hadoop] ahmarsuhail commented on a diff in pull request #5120: HADOOP-18246. Remove lower limit on s3a prefetching/caching block size

ahmarsuhail commented on code in PR #5120:
URL: https://github.com/apache/hadoop/pull/5120#discussion_r1017404868


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md:
##########
@@ -43,6 +43,12 @@ Multiple blocks may be read in parallel.
 |`fs.s3a.prefetch.block.size`    |Size of a block    |`8M`    |
 |`fs.s3a.prefetch.block.count`    |Number of blocks to prefetch    |`8`    |
 
+Although, default size of the block for prefetching the input stream is 8 MB, 
+minimum size allowed to set is 1 byte for a block.
+User should set the block size with the understanding that smaller block sizes increases the number of blocks.
+Thus, smaller block size affects the performance by increasing the overhead for reading and prefetching
+each block.

Review Comment:
   ```suggestion
   The default size of a block is 8MB, and the minimum allowed block size is 1 byte. 
   Decreasing block size will increase the number of blocks to be read for a file. 
   A smaller block size may negatively impact performance as the number of prefetches required will increase. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org