You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/10/01 18:33:23 UTC

[jira] [Updated] (MAPREDUCE-4421) Remove dependency on deployed MR jars

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4421:
----------------------------------

    Attachment: MAPREDUCE-4421-3.patch

Thanks for taking another look, Hitesh.

bq. Regarding addMRFrameworkToDistributedCache() - one minor question: the code allows for a non-qualified URI. Should we enforce provision of a fully-qualified path always?

I thought it would be easier to let it be qualified by the cluster's configured defaults if not already fully qualified.  Otherwise users/admins would have to not only say "hdfs:/path/to/archive" but "hdfs://namenode:port/path/to/archive" and if/when the name or port of the filesystem changes then it breaks.  If we let it be qualified by cluster defaults then admins can update the default filesystem in core-site and the simpler forms continue to work unmodified.

bq. Minor nit: I believe there should be nothing in the implementation that requires HDFS as the storage for the MR tarball?

Good point.  I updated the documentation to refer to a distributed cache deploy rather than an HDFS deploy.  However I did call out in the docs the performance ramifications of not using the cluster's default filesystem and a publicly-readable path for the archive.  Otherwise the job submitter could end up re-uploading and the nodes re-localizing the framework for each job or each user.  It will work, but it will be slower than necessary.

> Remove dependency on deployed MR jars
> -------------------------------------
>
>                 Key: MAPREDUCE-4421
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Jason Lowe
>         Attachments: MAPREDUCE-4421-2.patch, MAPREDUCE-4421-3.patch, MAPREDUCE-4421.patch, MAPREDUCE-4421.patch
>
>
> Currently MR AM depends on MR jars being deployed on all nodes via implicit dependency on YARN_APPLICATION_CLASSPATH. 
> We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, probably, just rely on adding a shaded MR jar along with job.jar to the dist-cache.



--
This message was sent by Atlassian JIRA
(v6.1#6144)