You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Anant Nag (JIRA)" <ji...@apache.org> on 2015/02/12 05:32:11 UTC

[jira] [Commented] (HIVE-9664) Hive "add jar" command should be able to download and add jars from a repository

    [ https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317573#comment-14317573 ] 

Anant Nag commented on HIVE-9664:
---------------------------------

The gradle like notation can also be extended to other commands such as  ADD FILE  and  ADD ARCHIVE. The LIST and DELETE commands should continue to work and produce sensible output. 

The following syntax for the command is proposed: 

 {code}add [FILE|JAR|ARCHIVE] <ivy://org:module:version?exclude=org1:module1&org2:module2>  <ivy://org3:module3:version?exclude=org4:module4&org5:module5>*{code}
{code}add [FILE|JAR|ARCHIVE] <http://url_of_the_jar> <http://url_of_the_jar>*{code}
 {code}add [FILE|JAR|ARCHIVE] <https://url_of_the_jar> <https://url_of_the_jar>*{code}
 {code}add [FILE|JAR|ARCHIVE] <file://location_of_the_jar>  <file://location_of_the_jar>*{code}
{code}add [FILE|JAR|ARCHIVE] <location_of_the_jar> <location_of_the_jar>*{code}

The motivation for the above syntax is being able to differentiate how a jar(file) is obtained. Having something like <ivy://org:module:version> helps us to identify that the file is being downloaded from the artifact such as maven repository whereas <file:///tmp/abc.jar> helps us to identify that the jar is being added from the local system.

We're assuming that the jar can be added by either of the following methods:

1. A jar can be added from the artifactory( like maven repository). In such a case, transitive dependencies( if enabled) should also be downloaded and added to the classpath. If some dependencies have to be excluded then those should be mentioned in the command itself.
Command: 
{code}
add jar <ivy://org:module:version> <ivy://org:module:version?exclude=org1:module1>* 
{code}

exclude=org1:module1 denotes that these dependencies should be excluded while satisfying transitive dependencies. 

2) A http or https url of the jar can directly be provided. In such a case, the jar will be downloaded and added to the classpath. This might be useful in cases where a single jar is required which is not present in the artifactory but a download link is available.
Command:
{code}
add jar <http://xyz.com/abc.jar> 
add jar <https://xyz.com/abc.jar>
{code}

3) The jar can be added from the local filesystem. This is basically what hive already supports with the add command. The file is already there in the filesystem and it is just added to the classpath.
Command:
{code}
add jar jarname
add jar file:///tmp/sample.jar
{code}

4) The jar can be added from the hdfs file system. 

Command: 
{code}
add jar hdfs:/user/abc/dwh-udf.jar;
add jar hdfs:///user/abc/dwh-udf.jar;
{code}

Having syntax like the commands above helps us to clearly distinct the location of the jar and also the method used to obtain the jar. Please mention your queries and thoughts in the comments.

> Hive "add jar" command should be able to download and add jars from a repository
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-9664
>                 URL: https://issues.apache.org/jira/browse/HIVE-9664
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Anant Nag
>              Labels: hive
>
> Currently Hive's "add jar" command takes a local path to the dependency jar. This clutters the local file-system as users may forget to remove this jar later
> It would be nice if Hive supported a Gradle like notation to download the jar from a repository.
> Example:  add jar org:module:version
>         
> It should also be backward compatible and should take jar from the local file-system as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)