You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Karthik Kambatla (JIRA)" <ji...@apache.org> on 2015/02/13 02:22:12 UTC

[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319375#comment-14319375 ] 

Karthik Kambatla commented on MAPREDUCE-5951:
---------------------------------------------

Sorry for the delay in getting to this. Getting a continuous chunk of time to look at this somewhat large patch was hard. 

Here are my first round of comments - a combination of high-level and detailed comments. Let us see if we can get some of this in through other JIRAs first, to allow for a more thorough review.
# DistributedCache changes aren’t central to what this JIRA is trying to address. Could we leave them out and address in another JIRA? 
## This has nothing to do with this patch, but it would be nice to make the code around setting CLASSPATH_FILES a little more readable. Could we define another String prefix to hold “” or classpath, based on whether classpath is null. 
# Job
## The new APIs should all be @Unstable
## Let us make the javadoc for the new APIs a little more formal - we don’t need to mention SCMClientProtocol.use, or that the APIs are intended for user use. Even for the return value, I would go with something like “If shared cache is enabled and the resource added successfully, return true. Otherwise, return false.”
## How about renaming the methods to addFileToSharedCache, addArchiveToSharedCache, addFileToSharedCacheAndClasspath? 
## Make both new methods private static instead of static private.
# JobID changes might not be required. Use ConverterUtils#toApplicationId? 
# JobImpl
## cleanupSharedCacheResources - nit: I would check for (checksums == null || checksums.length == 0) and return to save on indentations. 80 chars is already too small.
## cleanupSharedCacheUploadPolicies - javadoc should use block comments. Well, may be a nit.
# JobSubmitter
## Can we do the code moving from JobSumitter to FileUploader (may be, we need a more descriptive name) to another JIRA and look at that first if needed. Otherwise, it is hard to review the changes.
## May be, I am misreading the patch. Is this patch hardcoding MR job submission to always use SharedCache? If yes, we should definitely avoid that. 
# mapred-default.xml: We need a little more fool-proof config. The way the patch currently is, a typo will lead to unexpected behavior without any warnings.


> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch
>
>
> Implement the necessary changes so that the MapReduce application can leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set of resources they would like to cache (i.e. jobjar, libjars, archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)