You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/10/03 15:29:53 UTC

[jira] Resolved: (HADOOP-1054) Add more then one input file per map?

     [ https://issues.apache.org/jira/browse/HADOOP-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar resolved HADOOP-1054.
-----------------------------------

    Resolution: Duplicate

HADOOP-1515 does exactly the same. 

> Add more then one input file per map?
> -------------------------------------
>
>                 Key: HADOOP-1054
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1054
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.11.2
>            Reporter: Johan Oskarsson
>            Priority: Trivial
>
> I've got a problem with mapreduce overhead when it comes to small input files.
> Roughly 100 mb comes in to the dfs every few hours. Then afterwards data related to that batch might be added on for another few weeks.
> The problem is that this data is roughly 4-5 kbytes per file. So for every reasonably big file we might have 4-5 small ones.
> As far as I understand it each small file will get assigned a task of it's own. This causes performance issues since the overhead of such small
> files is pretty big.
> Would it be possible to have hadoop assign multiple files to a map task up until a configurable limit?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.