You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2010/02/11 03:20:29 UTC

[jira] Created: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

UDFs can't be loaded via "add jar" when jar is on HDFS
------------------------------------------------------

                 Key: HIVE-1157
                 URL: https://issues.apache.org/jira/browse/HIVE-1157
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Philip Zeyliger
            Priority: Minor


As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.

{quote}
Hi folks,

I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:

# This is Hive 0.5, from svn
$bin/hive                                              
Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
hive> add jar hdfs://localhost/FooTest.jar;                                                  
Added hdfs://localhost/FooTest.jar to class path
hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask

Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.

Thanks,

-- Philip
{quote}

{quote}
Yes that's correct. I prefer to download the jars in "add jar".

Zheng
{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by Ted Yu <yu...@gmail.com>.
Philip:
hive> add jar hdfs://localhost/FooTest.jar;
Unable to validate hdfs://localhost/FooTest.jar
Exception: Call to localhost/127.0.0.1:8020 failed on connection exception:
java.net.ConnectException: Connection refused

Do you know how the port (8020) is configured for 'add jar' command ?

Thanks

On Sat, Mar 27, 2010 at 9:04 PM, Philip Zeyliger (JIRA) <ji...@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850628#action_12850628]
>
> Philip Zeyliger commented on HIVE-1157:
> ---------------------------------------
>
> Edward,
>
> I'm having trouble reproducing the error you're seeing.
>
> {quote}
>
> create temporary function geoip as
> 'com.jointhegrid.hive.udf.GenericUDFGeoIP';
>
> hive> select geoip(theIp ,'COUNTRY_NAME', './GeoLiteCity.dat.gz' ) from ip
> ;
> java.lang.ClassNotFoundException: com.jointhegrid.hive.udf.GenericUDFGeoIP
> Continuing ...
> {quote}
>
> On my machine, if I create temporary function with a class name that
> doesn't exist, it fails.  So it makes no sense to me that "create temporary
> function" is succeeding, but then it's immediately not finding it.  Do you
> have any theories on what's going on?  Can you try to run it with debug on?
>
> Thanks!
>
> > UDFs can't be loaded via "add jar" when jar is on HDFS
> > ------------------------------------------------------
> >
> >                 Key: HIVE-1157
> >                 URL: https://issues.apache.org/jira/browse/HIVE-1157
> >             Project: Hadoop Hive
> >          Issue Type: Improvement
> >          Components: Query Processor
> >            Reporter: Philip Zeyliger
> >            Priority: Minor
> >         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt,
> HIVE-1157.v2.patch.txt, output.txt
> >
> >
> > As discussed on the mailing list, it would be nice if you could use UDFs
> that are on jars on HDFS.  The proposed implementation would be for "add
> jar" to recognize that the target file is on HDFS, copy it locally, and load
> it into the classpath.
> > {quote}
> > Hi folks,
> > I have a quick question about UDF support in Hive.  I'm on the 0.5
> branch.  Can you use a UDF where the jar which contains the function is on
> HDFS, and not on the local filesystem.  Specifically, the following does not
> seem to work:
> > # This is Hive 0.5, from svn
> > $bin/hive
> > Hive history
> file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> > hive> add jar hdfs://localhost/FooTest.jar;
> > Added hdfs://localhost/FooTest.jar to class path
> > hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> > FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.FunctionTask
> > Does this work for other people?  I could probably fix it by changing
> "add jar" to download remote jars locally, when necessary (to load them into
> the classpath), or update URLClassLoader (or whatever is underneath there)
> to read directly from HDFS, which seems a bit more fragile.  But I wanted to
> make sure that my interpretation of what's going on is right before I have
> at it.
> > Thanks,
> > -- Philip
> > {quote}
> > {quote}
> > Yes that's correct. I prefer to download the jars in "add jar".
> > Zheng
> > {quote}
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915469#action_12915469 ] 

Namit Jain commented on HIVE-1157:
----------------------------------

This is good to have - I will take a look

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach resolved HIVE-1157.
----------------------------------

    Resolution: Duplicate

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.patch.v6.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915619#action_12915619 ] 

Namit Jain commented on HIVE-1157:
----------------------------------

The changes looked good, but I got the following error:

    [junit] Begin query: alter1.q
    [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I Location -I transient_lastDdlTime -I last_modified_ -I java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* more /data/users/njain/hive_commit2/hive_commit2/build/ql/test/logs/clientpositive/alter1.q.out /data/users/njain/hive_commit2/hive_commit2/ql/src/test/results/clientpositive/alter1.q.out
    [junit] 778d777
    [junit] < Resource ../data/files/TestSerDe.jar already added.


Philip, can you take care of that ?

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832381#action_12832381 ] 

Edward Capriolo commented on HIVE-1157:
---------------------------------------

Removing local file dependencies is much cleaner.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1157:
---------------------------------

    Status: Patch Available  (was: Open)

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HIVE-1157:
----------------------------------

    Attachment: HIVE-1157.patch.v6.txt

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.patch.v6.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848218#action_12848218 ] 

Philip Zeyliger commented on HIVE-1157:
---------------------------------------

Anyone care to take a look?

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HIVE-1157:
----------------------------------

    Attachment: hive-1157.patch.txt

This patch changes SessionState.java to copy jar resources locally, if they're not local already.

Because I had to manage additional per-resource state (namely, the location of the local copy, so that it can be cleaned up), I modified the ResourceType enum to be simply an enum, and now there is one ResourceHook object per resource, not per resource type.  I changed the container map to be an EnumMap.

It turns out that you can't specify an HDFS path to "-libjars", so I had to also modify ExecDriver.java to call a special method when it's getting jar resources.

I would appreciate some guidance on how to test this best.  So far, I've manually done the following steps:
{noformat}
create table t (x int);
# Create a file with "1\n2\n3\n" as /tmp/a.
load data local inpath '/tmp/a' into table t;
add jar hdfs://localhost:8020/Test.jar;
create temporary function cube as 'org.apache.hive.test.CubeSampleUDF';  # I wrote this
select cube(x) from t;
{noformat}
What else would it be reasonable for me to do?  It looks like there's no DFS in the test environment.  I might be able to register an ad-hoc file system implementation of some sort or use mockito or some such...  What do you recommend?

I'm running the existing tests to make sure that I haven't broken anything.  These seem to take a while, so I'll report back.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HIVE-1157:
----------------------------------

    Attachment: HIVE-1157.patch.v3.txt

Ed,

Indeed, I've been able to reproduce that.  I traced it down to some bad error handling when scratch_dir doesn't exist.  The new patch creates a scratch dir if it doesn't already exist, and adds an if/else to make sure localFile.delete() isn't called if localFile is null.

Sorry about that.  I'm not sure whether something changed between when I created the patch and now on trunk to change how the scratchdir works, or if I had the scratch dir craeted by other tests in my local checkout.  Either way, this should fix it.

Thanks!

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916956#action_12916956 ] 

Philip Zeyliger commented on HIVE-1157:
---------------------------------------

Namit,

Thanks for the review.  I've fixed the test failures.  The one you pointed out was a missing log line from the results.  And there was a second one having to do with relative paths.

Oddly enough, however, when I tried to bring the changes up to current trunk, it turned out that HIVE-1624 conflicted, and, when I looked at it, it turns out to supply the same feature as this patch.  I'll upload the fixed patch for posterity, but it looks like this issue is no longer necessary.

-- Philip

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1157:
---------------------------------

    Attachment: HIVE-1157.patch.v5.txt

Attaching an updated version of Phil's patch that applies cleanly with -p0

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832389#action_12832389 ] 

Philip Zeyliger commented on HIVE-1157:
---------------------------------------

Edward,

I'm not sure what you mean.

-- Philip

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HIVE-1157:
----------------------------------

    Attachment: HIVE-1157.v2.patch.txt

I've uploaded a new patch with a bug fix (wasn't unregistering the jars correctly) and with a test.

The test starts a MiniDFSCluster and runs add jar and delete jar explicitly, without using the ".q" framework.  I felt this was the best way to test just the new behavior.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842013#action_12842013 ] 

Edward Capriolo commented on HIVE-1157:
---------------------------------------

Philip, I will apply test the code tonight. In the mean time I do not see a unit test .q file. 

Since the test target happens after the build target you can possibly bundle up CubeSampleUDF into a jar file and write a .q file 

Then you can put your test code in a .q file. You can take a look at :

*contrib/src/test/queries/clientpositive/udf_example_add.q 
*contrib/src/test/queries/clientpositive/dboutput.q

These both show how to test a UDF that is not hard coded into the FunctionRegistry

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1157:
-----------------------------

    Status: Open  (was: Patch Available)

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated HIVE-1157:
----------------------------------

    Attachment: HIVE-1157.patch.v4.txt

Carl,

I updated the patch to current trunk.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.patch.v4.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850628#action_12850628 ] 

Philip Zeyliger commented on HIVE-1157:
---------------------------------------

Edward,

I'm having trouble reproducing the error you're seeing.

{quote}

create temporary function geoip as 'com.jointhegrid.hive.udf.GenericUDFGeoIP';

hive> select geoip(theIp ,'COUNTRY_NAME', './GeoLiteCity.dat.gz' ) from ip ; 
java.lang.ClassNotFoundException: com.jointhegrid.hive.udf.GenericUDFGeoIP
Continuing ...
{quote}

On my machine, if I create temporary function with a class name that doesn't exist, it fails.  So it makes no sense to me that "create temporary function" is succeeding, but then it's immediately not finding it.  Do you have any theories on what's going on?  Can you try to run it with debug on?

Thanks!

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902208#action_12902208 ] 

Carl Steinbach commented on HIVE-1157:
--------------------------------------

Hi Philip, please rebase the patch and I will take a look. Thanks.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated HIVE-1157:
----------------------------------

    Attachment: output.txt

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848261#action_12848261 ] 

Edward Capriolo commented on HIVE-1157:
---------------------------------------


{noformat}
[edward@ec hive]$ ant -Dtestcase=TestAddJarFromHDFS test

<testcase classname="org.apache.hadoop.hive.ql.session.TestAddJarFromHDFS" name="testAddJarFromHDFS" time="6.73">
    <error type="java.lang.NullPointerException">java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.session.SessionState$JarResourceHook.preHook(SessionState.java:391)
	at org.apache.hadoop.hive.ql.session.SessionState.add_resource(SessionState.java:474)
	at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:52)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:173)
	at org.apache.hadoop.hive.ql.session.TestAddJarFromHDFS.testAddJarFromHDFS(TestAddJarFromHDFS.java:71)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at junit.framework.TestCase.runTest(TestCase.java:154)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at junit.framework.TestSuite.runTest(TestSuite.java:208)
	at junit.framework.TestSuite.run(TestSuite.java:203)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
</error>
  </testcase>
  <system-out><![CDATA[Starting DataNode 0 with dfs.data.dir: build/test/data/dfs/data/data1,build/test/data/dfs/data/data2
]]></system-out>
  <system-err><![CDATA[Waiting for the Mini HDFS Cluster to start...
]]></system-err>
</testsuite>
{noformat}

{noformat}
String jarURI = fs.getUri().toString() + "/addJarFromHdfs.jar";
 int ret = cliDriver.processCmd("ADD JAR " + jarURI);
{noformat}

This is a clean checkout and build. I will try to trace this down more.


> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832554#action_12832554 ] 

Edward Capriolo commented on HIVE-1157:
---------------------------------------

Sorry about that. I am not sure if I had an incomplete thought, or I cut half my message.

In any case, I like your idea of bringing jars into HDFS. The fact that the jar file has to live on the local filesystem where the job is launched from is very constraining. You can not leverage your Distributed File System. The same can be said for SerDe's. Maybe these files should live on HDFS in the warehouse directory somehow.

+1 on your thinking


> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848242#action_12848242 ] 

Edward Capriolo commented on HIVE-1157:
---------------------------------------

Looking at this now. One quick thing is you generated the patch from ql not from the trunk so you need to regenerate.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841987#action_12841987 ] 

Philip Zeyliger commented on HIVE-1157:
---------------------------------------

Has anyone had a chance to look at this?  Would appreciate the feedback!

Thanks!

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840332#action_12840332 ] 

Edward Capriolo commented on HIVE-1157:
---------------------------------------

Very cool. I will take a look at this.

Even though the Hive Test Cases are abstracted, I believe you can use the DFS/ MiniMRCluster. Since the hive class path inherits the Hadoop one. If i understand your problem you might be able to do this:

{noformat}
dfs -put your.jar /wherever
add jar hdfs://localhost:8020/wherever/your.jar;
{noformat}

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848790#action_12848790 ] 

Edward Capriolo commented on HIVE-1157:
---------------------------------------

Phillip,

1) You are generating your patch from the ql subdirectory. For the final commit you have to generate it from the build root.

2) I tried this on a cluster with a local job tracker, and local (psudeo-distributed) running namenode and datanode. It did not run. I am attaching the output.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS.  The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem.  Specifically, the following does not seem to work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                                  
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';                    
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile.  But I wanted to make sure that my interpretation of what's going on is right before I have at it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.