You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Miklos Szegedi (JIRA)" <ji...@apache.org> on 2017/11/01 23:41:00 UTC

[jira] [Commented] (MAPREDUCE-6994) Uploader tool for Distributed Cache Deploy code changes

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234968#comment-16234968 ] 

Miklos Szegedi commented on MAPREDUCE-6994:
-------------------------------------------

Thank you, [~yufeigu].
bq. Method expandEnvironmentVariables() seems better to be in class Shell.
I intentionally want to keep this change separate from other projects in the repo
bq.
The current design requires users to understand what directory should be collected by providing multiple directories for "input". For example, users needs to input $HADOOP_HOME/share/hadoop/client/:$HADOOP_HOME/share/hadoop/common/, etc. The benefit of this solution is that it is flexible no matter how input directories organized. However, the directory hierarchy in $HADOOP_HOME is fixed especially in upstream output, how about providing an option to just input a $HADOOP_HOME and tool can figure out which sub-directories to get jars? Seems like to make method collectPackages recursively traverse the input directory would be enough since there is predefined whitelist.
So the main design point is to include whatever it is needed to run mapreduce jobs. This is by default the class path. The class path is the default input. Changing it to a root directory would add to the traversal time, and I think it is not necessary in this case. The white list and the black list filter the class path, to make sure only the necessary jars are included. Walking through the root has the risk of including jars with the same name, or jars that are not necessary, since they were not in the class path in the original scenario. Changing input to anything other than the class path is possible, however not advised.
I addressed all other comments.


> Uploader tool for Distributed Cache Deploy code changes
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6994
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>            Priority: Major
>         Attachments: MAPREDUCE-6994.000.patch
>
>
> The proposal is to create a tool that collects all available jars in the Hadoop classpath and adds them to a single tarball file. It then uploads the resulting archive to an HDFS directory. This saves the cluster administrator from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org