You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2007/01/30 19:00:35 UTC

[jira] Commented: (HADOOP-960) Incorrect number of map tasks when there are multiple input files

    [ https://issues.apache.org/jira/browse/HADOOP-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468718 ] 

Owen O'Malley commented on HADOOP-960:
--------------------------------------

Oops, sorry about that. To actually get 128 splits you would need to write your own InputFormat and implement getSplit yourself. That said, it is usually better to take the extra maps and get data locality on the map input.

> Incorrect number of map tasks when there are multiple input files
> -----------------------------------------------------------------
>
>                 Key: HADOOP-960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-960
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Andrew McNabb
>
> This problem happens with hadoop-streaming and possibly elsewhere.  If there are 5 input files, it will create 130 map tasks, even if mapred.map.tasks=128.  The number of map tasks is incorrectly set to a multiple of the number of files.  (I wrote a much more complete bug report, but Jira lost it when it had an error, so I'm not in the mood to write it all again)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.