You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/08/31 21:19:55 UTC

[jira] Created: (MAPREDUCE-2046) A input split cannot be less than a dfs block

A input split cannot be less than a dfs block 
----------------------------------------------

                 Key: MAPREDUCE-2046
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2046
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Namit Jain


I ran into this while testing some hive features.

Whether we use hiveinputformat or combinehiveinputformat, a split cannot be less than a dfs block size.
This is a problem if we want to increase the block size for older data to reduce memory consumption for the
name node.

It would be useful if the input split was independent of the dfs block size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-2046) A input split cannot be less than a dfs block

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved MAPREDUCE-2046.
--------------------------------------

    Resolution: Cannot Reproduce

This isn't true. InputSplits can be arbitrarily sized by the InputFormat. mapred.TextInputFormat if you set the number of maps very high, you will generate a large number of maps. In the new mapreduce.in.TextInputFormat, there are knobs that set the minimum and maximum block size.

> A input split cannot be less than a dfs block 
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2046
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2046
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Namit Jain
>
> I ran into this while testing some hive features.
> Whether we use hiveinputformat or combinehiveinputformat, a split cannot be less than a dfs block size.
> This is a problem if we want to increase the block size for older data to reduce memory consumption for the
> name node.
> It would be useful if the input split was independent of the dfs block size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (MAPREDUCE-2046) A input split cannot be less than a dfs block

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur reopened MAPREDUCE-2046:
-----------------------------------------

      Assignee: dhruba borthakur

> A input split cannot be less than a dfs block 
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2046
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2046
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: dhruba borthakur
>
> I ran into this while testing some hive features.
> Whether we use hiveinputformat or combinehiveinputformat, a split cannot be less than a dfs block size.
> This is a problem if we want to increase the block size for older data to reduce memory consumption for the
> name node.
> It would be useful if the input split was independent of the dfs block size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.