You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2016/03/01 21:20:18 UTC

[jira] [Commented] (HADOOP-12747) support wildcard in libjars argument

    [ https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174355#comment-15174355 ] 

Chris Nauroth commented on HADOOP-12747:
----------------------------------------

bq. You mentioned earlier that libjars don't support non-local paths, but strictly speaking HADOOP-7112 addresses only the aspect of adding libjars back to the client classpath.

That's very interesting.  I missed the point that non-local jars are skipped only for adding to the client's own classpath.  {{JobResourceUploader}} separately parses libjars and does not do the same filtering.  Certainly since non-local libjars for the task is already supported, we'd have to maintain that behavior for reasons of backwards compatibility.

I find the lack of consistency quite confusing.  It's unclear to me how much of this behavior is by design and how much is accidental.  I assume the filtering away from the client's classpath was done to avoid the complexity of needing to run some kind of "mini-localization" on the client side to support non-local files.

Regarding the proposed options, I have a question on this con for option 2:

bq. con: need to re-interpret or deprecate (minor) behavior, such as adding libjar entries to the client classpath and allowing directories as a set of classfiles

This sounds backwards-incompatible, right?  If so, then that would tip my opinion towards option 1.

Also, if wildcard expansion is delayed, then it seems there could be a risk of unexpected behavior if the contents of the directory change after job submission but before launch of the container.  Maybe rolling upgrade scenarios would get weird.  (Maybe not if the directories themselves are version-stamped properly.)

> support wildcard in libjars argument
> ------------------------------------
>
>                 Key: HADOOP-12747
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12747
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch, HADOOP-12747.03.patch
>
>
> There is a problem when a user job adds too many dependency jars in their command line. The HADOOP_CLASSPATH part can be addressed, including using wildcards (\*). But the same cannot be done with the -libjars argument. Today it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this situation. The idea is to handle it the same way the JVM does it: \* expands to the list of jars in that directory. It does not traverse into any child directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't do it for -files and -archives).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)