You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/10/01 18:33:23 UTC
[jira] [Updated] (MAPREDUCE-4421) Remove dependency on deployed MR
jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe updated MAPREDUCE-4421:
----------------------------------
Attachment: MAPREDUCE-4421-3.patch
Thanks for taking another look, Hitesh.
bq. Regarding addMRFrameworkToDistributedCache() - one minor question: the code allows for a non-qualified URI. Should we enforce provision of a fully-qualified path always?
I thought it would be easier to let it be qualified by the cluster's configured defaults if not already fully qualified. Otherwise users/admins would have to not only say "hdfs:/path/to/archive" but "hdfs://namenode:port/path/to/archive" and if/when the name or port of the filesystem changes then it breaks. If we let it be qualified by cluster defaults then admins can update the default filesystem in core-site and the simpler forms continue to work unmodified.
bq. Minor nit: I believe there should be nothing in the implementation that requires HDFS as the storage for the MR tarball?
Good point. I updated the documentation to refer to a distributed cache deploy rather than an HDFS deploy. However I did call out in the docs the performance ramifications of not using the cluster's default filesystem and a publicly-readable path for the archive. Otherwise the job submitter could end up re-uploading and the nodes re-localizing the framework for each job or each user. It will work, but it will be slower than necessary.
> Remove dependency on deployed MR jars
> -------------------------------------
>
> Key: MAPREDUCE-4421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Jason Lowe
> Attachments: MAPREDUCE-4421-2.patch, MAPREDUCE-4421-3.patch, MAPREDUCE-4421.patch, MAPREDUCE-4421.patch
>
>
> Currently MR AM depends on MR jars being deployed on all nodes via implicit dependency on YARN_APPLICATION_CLASSPATH.
> We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, probably, just rely on adding a shaded MR jar along with job.jar to the dist-cache.
--
This message was sent by Atlassian JIRA
(v6.1#6144)