You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2011/07/16 20:26:00 UTC
[jira] [Resolved] (HADOOP-960) Incorrect number of map tasks when
there are multiple input files
[ https://issues.apache.org/jira/browse/HADOOP-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J resolved HADOOP-960.
----------------------------
Resolution: Invalid
The ability to specify "mapred.map.tasks" is going away with the new API of MR. The only _right_ way to control splits is to have your own InputFormat that does it the way you need it to. The default way has worked for many (being local-data sensitive, as long as such information is available, but also split size tunable), and can also be asked to process whole files with a very simple subclass/configuration.
Resolving as invalid (now, and onwards) since InputFormat#getSplits(…) is not going anywhere, and can do what you want it to.
Regd. record num splits, MR now has NLineInputFormat as well, which indeed opens and reads through the file.
> Incorrect number of map tasks when there are multiple input files
> -----------------------------------------------------------------
>
> Key: HADOOP-960
> URL: https://issues.apache.org/jira/browse/HADOOP-960
> Project: Hadoop Common
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.10.1
> Reporter: Andrew McNabb
> Priority: Minor
>
> This problem happens with hadoop-streaming and possibly elsewhere. If there are 5 input files, it will create 130 map tasks, even if mapred.map.tasks=128. The number of map tasks is incorrectly set to a multiple of the number of files. (I wrote a much more complete bug report, but Jira lost it when it had an error, so I'm not in the mood to write it all again)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira