You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2016/06/13 10:17:21 UTC

[jira] [Resolved] (HADOOP-13106) if fs.s3a.block.size option == 0, use partition size option for blocksize

     [ https://issues.apache.org/jira/browse/HADOOP-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran resolved HADOOP-13106.
-------------------------------------
    Resolution: Won't Fix

I'm going to mark this as wontfix

Split calculation is usually on the directory listing; this doesn't know the details of individual files, and to query them would mean O(n) HEAD calls during the serialized split operation. 

We may still want to think about tagging every uploaded file with some partition info, but more for information than automated split calculation

> if fs.s3a.block.size option == 0, use partition size option for blocksize
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-13106
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13106
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.2
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>
> Input perf is best if blocksize matches the partition size used to upload the data.
> There's currently no way of knowing this —but S3A does know the partition size to use when uploading data. 
> I propose picking that up as the blocksize if the configured value of {{fs.s3a.block.size}} is zero. This value is utterly illegal today (more precisely, it will break any app using it to calculate splits), even when there is no check for its range



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org