You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Sangjin Lee (JIRA)" <ji...@apache.org> on 2017/01/11 05:13:58 UTC
[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817206#comment-15817206 ] 

Sangjin Lee commented on MAPREDUCE-5951:
----------------------------------------

I went over the patch with Chris in some detail today, and am posting the review comments here for the record.

(MRJobConfig.java)
- {{mapreduce.job.jobjar.visibility}} and {{mapreduce.job.jobjar.sharedcache.uploadpolicy}} are computed values that are not user-facing; in that case we should not even define the defaults so that there is no confusion that these values are computed

(JobResourceUploader.java)
- l.159-160: I understand lines 161-165 are there to support programmatic use cases of the distributed cache that come in outside of the job submitter code path. Can we make the comments clearer so that the intent of this comes through? We could also annotate them in {{MRJobConfig}}.
- l.171-172: it might be slightly better to use {{LinkedHashMap}}. That way, we'd have a predictable iteration order (the order in which they are specified in the user values).
- l.219: To be fair, this is a bug we need to fix, right? Then can we file a JIRA and add the JIRA id here?
- l.237-240: I would not worry about handling previous values here. Having duplicate paths is not really supported and the worst case scenario here is to reset this upload policy with the same value.
- l.260-263: I think we can improve on this, and reconcile the shared cache with the wildcard feature. We could see if any resource is uploaded to the staging directory, and if so, still preserve the wildcard entry. We also need to consider the case where the shared cache is disabled but the wildcard is enabled.
- l.291-294: same comment as above
- l.348-351: same comment as above
- l.388: Since we're dealing with a local filesystem URI based on l.381-382, the authority check is not meaningful. We should remove this check.
- On a larger note, the path/URI handling between the job jar, libjars, files, and archives is not very consistent, which is an existing behavior. We need to see if they need to get the same consistent treatment for this to work.

(Job.java)
- l.1446: I realized later that passing an empty map has an effect of nulling out the config value; perhaps we could make that more explicit in the javadoc and/or comments/code?
- l.1449-1463: nit: it might be slightly easier to read by using a simple string concatenation with "+" (JVM internally uses the {{StringBuilder}})
- l.1490: here also it might be better to use {{LinkedHashMap}}


> Add support for the YARN Shared Cache
> -------------------------------------
>
>                 Key: MAPREDUCE-5951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>              Labels: BB2015-05-TBR
>         Attachments: MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, MAPREDUCE-5951-trunk-v9.patch, MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, MAPREDUCE-5951-trunk.018.patch
>
>
> Implement the necessary changes so that the MapReduce application can leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify which set of resources they would like to cache (i.e. jobjar, libjars, archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org