You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Sandy Ryza (JIRA)" <ji...@apache.org> on 2013/03/18 22:23:17 UTC

[jira] [Commented] (MAPREDUCE-5076) CombineFileInputFormat can create splits that exceed maxSplitSize

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605659#comment-13605659 ] 

Sandy Ryza commented on MAPREDUCE-5076:
---------------------------------------

It looks like this is not nearly as bad as I first thought it was.  The last 16 MB were being added to the second split, meaning that the max split size was being exceeded, but that no data was lost.
                
> CombineFileInputFormat can create splits that exceed maxSplitSize
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5076
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5076
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> I ran a local job with CombineFileInputFormat using an 80 MB file and a max split size of 32 MB (the default local FS block size).  The job ran with two splits of 32 MB, and the last 16 MB were just omitted.
> This appears to be caused by a subtle bug in getMoreSplits, in which the code that generates the splits from the blocks expects the 16 MB block to be at the end of the block list. But the code that generates the blocks does not respect this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira