You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/10/03 15:29:53 UTC
[jira] Resolved: (HADOOP-1054) Add more then one input file per
map?
[ https://issues.apache.org/jira/browse/HADOOP-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Enis Soztutar resolved HADOOP-1054.
-----------------------------------
Resolution: Duplicate
HADOOP-1515 does exactly the same.
> Add more then one input file per map?
> -------------------------------------
>
> Key: HADOOP-1054
> URL: https://issues.apache.org/jira/browse/HADOOP-1054
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.11.2
> Reporter: Johan Oskarsson
> Priority: Trivial
>
> I've got a problem with mapreduce overhead when it comes to small input files.
> Roughly 100 mb comes in to the dfs every few hours. Then afterwards data related to that batch might be added on for another few weeks.
> The problem is that this data is roughly 4-5 kbytes per file. So for every reasonably big file we might have 4-5 small ones.
> As far as I understand it each small file will get assigned a task of it's own. This causes performance issues since the overhead of such small
> files is pretty big.
> Would it be possible to have hadoop assign multiple files to a map task up until a configurable limit?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.