You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Gera Shegalov (JIRA)" <ji...@apache.org> on 2016/02/02 00:17:39 UTC

[jira] [Commented] (HADOOP-12747) support wildcard in libjars argument

    [ https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127227#comment-15127227 ] 

Gera Shegalov commented on HADOOP-12747:
----------------------------------------

bq. It is true that today one can pass in a directory for -files and also for -libjars. In case of MR, the entire directory (including all files and directories recursively) does get copied over and localized to nodes. For libjars, however, as you observed, the classpath basically doesn't work if you meant it as a list of jars as it simply references the directory. On the other hand, if you meant it as a real directory root (consisting of class files), it still works correctly.

My [solution|https://issues.apache.org/jira/browse/HADOOP-12747?focusedCommentId=15123058&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15123058]  (tested many times with production pipelines I own) using dir uploads is obviously geared to the former case and not the case of exploded archive as you can see from the value of application classpath. was just suggesting to automate it via this property or mapreduce.job.classpath.files.

bq. When it comes to classpaths (after which libjars is modeled), directory and directory/* are different as you're undoubtedly aware. directory/* is specifically interpreted as the list of jars in that directory by the JVM. IMO it would be good to maintain that definition for libjars. That would lead to a consistent expectation.

I don't know whether libjars (<comma separated list of jars>) is modeled after CLASSPATH. But I think there should be a separation of concerns: syntax vs how it's implemented. In the end, I am saying let us not bloat configuration regardless whether it's 'libs/*' or 'libs/'. 

bq. Also, I learned of this interesting nugget while looking at GenericOptionsParser: the value of libjars is added to the client classpath:

This is the code we had discussed with [~ianoc]. This a very brittle code because you have no control when frameworks start using GOP.  E.g., Scalding needs scala before it comes to GOP. Once you start asking people to put more stuff on HADOOP_CLASSPATH to bootstrap anyways why do this in libjars, as well? With [Pants|https://pantsbuild.github.io/build_dictionary.html] we don't need it at all becuase the jar manifest already includes all jars on the classpath see MAPREDUCE-6128. Maybe we should deprecate the client-side effect of libjars and not try to add more in this JIRA.

Regarding YARN-1492, since it's a recent feature, it should accommodate for directory uploads anyways. It can negotiate at the directory level (e.g., recursive checksum of files/dirs). But that's not a subject for this JIRA.

> support wildcard in libjars argument
> ------------------------------------
>
>                 Key: HADOOP-12747
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12747
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch
>
>
> There is a problem when a user job adds too many dependency jars in their command line. The HADOOP_CLASSPATH part can be addressed, including using wildcards (\*). But the same cannot be done with the -libjars argument. Today it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this situation. The idea is to handle it the same way the JVM does it: \* expands to the list of jars in that directory. It does not traverse into any child directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't do it for -files and -archives).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)