You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chris Trezzo (JIRA)" <ji...@apache.org> on 2015/03/18 03:46:39 UTC

[jira] [Updated] (MAPREDUCE-5951) Add support for the YARN Shared Cache

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Trezzo updated MAPREDUCE-5951:
------------------------------------
    Attachment: MAPREDUCE-5951-trunk-v7.patch

[~kasha@cloudera.com] Thanks again for the comments!

Attached is v7 of the patch. This version is rebased and addresses your comments above. I removed the DistributedCache changes, addressed comments about Job, JobID, JobImpl. With respect to comment 5.2, the patch is not hard coding MR job submission to always use SharedCache. See if the new patch improves clarity around that and let me know if you have more questions. There are two changes that will happen even if the shared cache is disabled:
1. The SharedCacheConfig class will be used to parse configuration in JobResourceUploader. If the shared cache config parameters do not exist, then it is a no-op.
2. The MR classpath around job jars has be changed slightly (that is the reason for the MRApps and TestMRApps changes), but should present no behavioral changes to the user. This is to handle the case where the job jar used by a job comes from the shared cache and it is named anything other than job.jar. Note that the current code assumes that whatever is localized in the job.jar directory is a single file named job.jar (i.e. job.jar/job.jar in the classpath). In the case where the job.jar is named something else, it will not get put on the classpath. This change simply puts everything in the job.jar directory (currently only the job jar) on the classpath (i.e. job.jar/*).

With respect to the comment about fool-proof config: did you have anything specific in mind? Currently the config should only recognize disabled, enabled, jobjar, libjars, files, archives. I could split each into a separate boolean config parameter if that seems more safe? Let me know. I was trying to come up with a concise single parameter for all the modes, but maybe splitting them up into separate boolean parameters is better. I can also see the JOBJAR_VISIBILITY parameter being slightly confusing and will think if there is a better way to do that. Again, let me know if you have suggestions.

Also, let me know if you want me to split this patch further. I could see splitting it into the following (although the splits won't be fully functional):
1. JobResourceUploader changes. The diff is still a little wonky with the code restructure from adding shared cache checks.
2. TaskImpl changes.
3. JobImpl changes.
4. job.jar classpath changes

> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch
>
>
> Implement the necessary changes so that the MapReduce application can leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set of resources they would like to cache (i.e. jobjar, libjars, archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)