You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "David Mollitor (JIRA)" <ji...@apache.org> on 2019/03/05 14:00:00 UTC

[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

    [ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784463#comment-16784463 ] 

David Mollitor commented on MAPREDUCE-207:
------------------------------------------

Came across a situation lately where a user had the LZO compression codec enabled in the cluster.  The codec was installed across the cluster.  However, MR jobs, that did not even require the codec, were failing because the compression codec was not installed on the client node where the jobs were being submitted from.  As part of the client's role in calculating splits, the client loads the codec configuration and all the associated codec implementations.  This fails on external clients because they did not have the codec installed.  The user understandably did not want to have to install the LZO codec on every client node, but it was at the cost of having to maintain separate hdfs-site files for different client hosts.

Moving all of this work into the cluster removes this dependency from the clients.

> Computing Input Splits on the MR Cluster
> ----------------------------------------
>
>                 Key: MAPREDUCE-207
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>            Reporter: Philip Zeyliger
>            Assignee: Gera Shegalov
>            Priority: Major
>         Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, MAPREDUCE-207.v07.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could have a separate "job task type" that computes the input splits, therefore allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org