You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Nitin Pawar <ni...@gmail.com> on 2014/12/30 17:31:14 UTC

Re: CREATE FUNCTION: How to automatically load extra jar file?

just copy pasting Jason's reply to other thread

If you have a recent version of Hive (0.13+), you could try registering
your UDF as a "permanent" UDF which was added in HIVE-6047:

1) Copy your JAR somewhere on HDFS, say
hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar.
2) In Hive, run CREATE FUNCTION zeroifnull AS 'com.test.udf.ZeroIfNullUDF'
USING JAR 'hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar';

The function definition should be saved in the metastore and Hive should
remember to pull the JAR from the location you specified in the CREATE
FUNCTION call.

On Tue, Dec 30, 2014 at 9:54 PM, Arthur.hk.chan@gmail.com <
arthur.hk.chan@gmail.com> wrote:

> Thank you.
>
> Will this work for *hiveserver2 *?
>
>
> Arthur
>
> On 30 Dec, 2014, at 2:24 pm, vic0777 <vi...@163.com> wrote:
>
>
> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar.
> Then, the file is automatically loaded when Hive is started.
>
> Wantao
>
>
>
>
> At 2014-12-30 11:01:06, "Arthur.hk.chan@gmail.com" <
> arthur.hk.chan@gmail.com> wrote:
>
> Hi,
>
> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an
> extra JAR file to hive for UDF, below are my steps to create the UDF
> function. I have tried the following but still no luck to get thru.
>
> Please help!!
>
> Regards
> Arthur
>
>
> Step 1:   (make sure the jar in in HDFS)
> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02
> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>
> Step 2: (drop if function exists)
> hive> drop function sysdate;
>
> OK
> Time taken: 0.013 seconds
>
> Step 3: (create function using the jar in HDFS)
> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate'
> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
> Added
> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> to class path
> Added resource:
> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> OK
> Time taken: 0.034 seconds
>
> Step 4: (test)
> hive> select sysdate();
>
>
> Automatically selecting local only mode for query
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 14/12/30 10:17:06 WARN conf.Configuration:
> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 14/12/30 10:17:06 WARN conf.Configuration:
> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
> attempt to override final parameter: yarn.nodemanager.loacl-dirs;  Ignoring.
> 14/12/30 10:17:06 WARN conf.Configuration:
> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> Execution log at:
> /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
> java.io.FileNotFoundException: File does not exist:
> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
> at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
> at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
> at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
> at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Job Submission failed with exception 'java.io.FileNotFoundException(File
> does not exist:
> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> )'
> Execution failed with exit status: 1
> Obtaining error information
> Task failed!
> Task ID:
>   Stage-1
> Logs:
> /tmp/hadoop/hive.log
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>
>
> Step 5: (check the file)
> hive> dfs -ls
> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar;
> ls: `/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar':
> No such file or directory
> Command failed with exit code = 1
> Query returned non-zero code: 1, cause: null
>
>
>
>
>
>
>
>
>
>
>


-- 
Nitin Pawar

Re: CREATE FUNCTION: How to automatically load extra jar file?

Posted by "Arthur.hk.chan@gmail.com" <ar...@gmail.com>.

Hi

I have already placed it in another folder, not the /tmp/ one:

>>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar

However, Hive places it to /tmp/ folder during its "CREATE FUNCTION USING JAR"
>>> Step 3: (create function using the jar in HDFS)
>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> Added /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar to class path
>>> Added resource: /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> OK
>>> Time taken: 0.034 seconds


Any ideas how to avoid HIVE uses /tmp/ folder?

Arthur



On 31 Dec, 2014, at 2:27 pm, Nitin Pawar <ni...@gmail.com> wrote:

> If you put a file inside tmp then there is no guarantee it will live there forever based on ur cluster configuration. 
> 
> You may want to put it as a place where all users can access it like making a folder and keeping it read permission 
> 
> On Wed, Dec 31, 2014 at 11:40 AM, Arthur.hk.chan@gmail.com <ar...@gmail.com> wrote:
> 
> Hi,
> 
> Thanks.
> 
> Below are my steps, I did copy my JAR to HDFS and "CREATE FUNCTION  using the JAR in HDFS", however during my smoke test, I got FileNotFoundException.
> 
>>> java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> 
> 
> 
>>> Step 1:   (make sure the jar in in HDFS)
>>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> 
>>> Step 2: (drop if function exists) 
>>> hive> drop function sysdate;                                                  
>>> OK
>>> Time taken: 0.013 seconds
>>> 
>>> Step 3: (create function using the jar in HDFS)
>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> Added /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar to class path
>>> Added resource: /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> OK
>>> Time taken: 0.034 seconds
>>> 
>>> Step 4: (test)
>>> hive> select sysdate(); 
>>> Execution log at: /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>>> java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> 
> 
> Please help!
> 
> Arthur
> 
> 
> 
> On 31 Dec, 2014, at 12:31 am, Nitin Pawar <ni...@gmail.com> wrote:
> 
>> just copy pasting Jason's reply to other thread 
>> 
>> If you have a recent version of Hive (0.13+), you could try registering your UDF as a "permanent" UDF which was added in HIVE-6047:
>> 
>> 1) Copy your JAR somewhere on HDFS, say hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar. 
>> 2) In Hive, run CREATE FUNCTION zeroifnull AS 'com.test.udf.ZeroIfNullUDF' USING JAR 'hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar';
>> 
>> The function definition should be saved in the metastore and Hive should remember to pull the JAR from the location you specified in the CREATE FUNCTION call.
>> 
>> On Tue, Dec 30, 2014 at 9:54 PM, Arthur.hk.chan@gmail.com <ar...@gmail.com> wrote:
>> Thank you.
>> 
>> Will this work for hiveserver2 ?
>> 
>> 
>> Arthur
>> 
>> On 30 Dec, 2014, at 2:24 pm, vic0777 <vi...@163.com> wrote:
>> 
>>> 
>>> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. Then, the file is automatically loaded when Hive is started.
>>> 
>>> Wantao
>>> 
>>> 
>>> 
>>> 
>>> At 2014-12-30 11:01:06, "Arthur.hk.chan@gmail.com" <ar...@gmail.com> wrote:
>>> Hi,
>>> 
>>> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an extra JAR file to hive for UDF, below are my steps to create the UDF function. I have tried the following but still no luck to get thru.
>>> 
>>> Please help!!
>>> 
>>> Regards
>>> Arthur
>>> 
>>> 
>>> Step 1:   (make sure the jar in in HDFS)
>>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> 
>>> Step 2: (drop if function exists) 
>>> hive> drop function sysdate;                                                  
>>> OK
>>> Time taken: 0.013 seconds
>>> 
>>> Step 3: (create function using the jar in HDFS)
>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> Added /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar to class path
>>> Added resource: /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> OK
>>> Time taken: 0.034 seconds
>>> 
>>> Step 4: (test)
>>> hive> select sysdate();                                                                                                                                
>>> Automatically selecting local only mode for query
>>> Total jobs = 1
>>> Launching Job 1 out of 1
>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
>>> 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an attempt to override final parameter: yarn.nodemanager.loacl-dirs;  Ignoring.
>>> 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
>>> Execution log at: /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>>> java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> 	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
>>> 	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
>>> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
>>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
>>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>> 	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
>>> 	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
>>> 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
>>> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>>> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>>> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>>> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>>> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>>> 	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>>> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
>>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> 	at java.lang.reflect.Method.invoke(Method.java:606)
>>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>> Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar)'
>>> Execution failed with exit status: 1
>>> Obtaining error information
>>> Task failed!
>>> Task ID:
>>>   Stage-1
>>> Logs:
>>> /tmp/hadoop/hive.log
>>> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>>> 
>>> 
>>> Step 5: (check the file)
>>> hive> dfs -ls /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>> ls: `/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar': No such file or directory
>>> Command failed with exit code = 1
>>> Query returned non-zero code: 1, cause: null
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> Nitin Pawar
> 
> 
> 
> 
> -- 
> Nitin Pawar

Re: CREATE FUNCTION: How to automatically load extra jar file?

Posted by Nitin Pawar <ni...@gmail.com>.

If you put a file inside tmp then there is no guarantee it will live there
forever based on ur cluster configuration.

You may want to put it as a place where all users can access it like making
a folder and keeping it read permission

On Wed, Dec 31, 2014 at 11:40 AM, Arthur.hk.chan@gmail.com <
arthur.hk.chan@gmail.com> wrote:

>
> Hi,
>
> Thanks.
>
> Below are my steps, I did copy my JAR to HDFS and "CREATE FUNCTION  using
> the JAR in HDFS", however during my smoke test, I got FileNotFoundException.
>
> java.io.FileNotFoundException: File does not exist:
>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>
>>
>
>
> Step 1:   (make sure the jar in in HDFS)
>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02
>> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>
>> Step 2: (drop if function exists)
>> hive> drop function sysdate;
>>
>> OK
>> Time taken: 0.013 seconds
>>
>> Step 3: (create function using the jar in HDFS)
>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate'
>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> Added
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> to class path
>> Added resource:
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> OK
>> Time taken: 0.034 seconds
>>
>> Step 4: (test)
>> hive> select sysdate();
>>
>> Execution log at:
>> /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>> java.io.FileNotFoundException: File does not exist:
>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>
>>
>
> Please help!
>
> Arthur
>
>
>
> On 31 Dec, 2014, at 12:31 am, Nitin Pawar <ni...@gmail.com> wrote:
>
> just copy pasting Jason's reply to other thread
>
> If you have a recent version of Hive (0.13+), you could try registering
> your UDF as a "permanent" UDF which was added in HIVE-6047:
>
> 1) Copy your JAR somewhere on HDFS, say
> hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar.
> 2) In Hive, run CREATE FUNCTION zeroifnull AS
> 'com.test.udf.ZeroIfNullUDF' USING JAR '
> hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar';
>
> The function definition should be saved in the metastore and Hive should
> remember to pull the JAR from the location you specified in the CREATE
> FUNCTION call.
>
> On Tue, Dec 30, 2014 at 9:54 PM, Arthur.hk.chan@gmail.com <
> arthur.hk.chan@gmail.com> wrote:
>
>> Thank you.
>>
>> Will this work for *hiveserver2 *?
>>
>>
>> Arthur
>>
>> On 30 Dec, 2014, at 2:24 pm, vic0777 <vi...@163.com> wrote:
>>
>>
>> You can put it into $HOME/.hiverc like this: ADD JAR
>> full_path_of_the_jar. Then, the file is automatically loaded when Hive is
>> started.
>>
>> Wantao
>>
>>
>>
>>
>> At 2014-12-30 11:01:06, "Arthur.hk.chan@gmail.com" <
>> arthur.hk.chan@gmail.com> wrote:
>>
>> Hi,
>>
>> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an
>> extra JAR file to hive for UDF, below are my steps to create the UDF
>> function. I have tried the following but still no luck to get thru.
>>
>> Please help!!
>>
>> Regards
>> Arthur
>>
>>
>> Step 1:   (make sure the jar in in HDFS)
>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02
>> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>
>> Step 2: (drop if function exists)
>> hive> drop function sysdate;
>>
>> OK
>> Time taken: 0.013 seconds
>>
>> Step 3: (create function using the jar in HDFS)
>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate'
>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> Added
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> to class path
>> Added resource:
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> OK
>> Time taken: 0.034 seconds
>>
>> Step 4: (test)
>> hive> select sysdate();
>>
>>
>> Automatically selecting local only mode for query
>> Total jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 14/12/30 10:17:06 WARN conf.Configuration:
>> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>> attempt to override final parameter:
>> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
>> 14/12/30 10:17:06 WARN conf.Configuration:
>> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>> attempt to override final parameter: yarn.nodemanager.loacl-dirs;  Ignoring.
>> 14/12/30 10:17:06 WARN conf.Configuration:
>> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>> attempt to override final parameter:
>> mapreduce.job.end-notification.max.attempts;  Ignoring.
>> Execution log at:
>> /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>> java.io.FileNotFoundException: File does not exist:
>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
>> at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
>> at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>> at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>> at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
>> at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>> at
>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
>> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>> Job Submission failed with exception 'java.io.FileNotFoundException(File
>> does not exist:
>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> )'
>> Execution failed with exit status: 1
>> Obtaining error information
>> Task failed!
>> Task ID:
>>   Stage-1
>> Logs:
>> /tmp/hadoop/hive.log
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>>
>>
>> Step 5: (check the file)
>> hive> dfs -ls
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> ls: `/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar':
>> No such file or directory
>> Command failed with exit code = 1
>> Query returned non-zero code: 1, cause: null
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Nitin Pawar
>
>
>


-- 
Nitin Pawar

Re: CREATE FUNCTION: How to automatically load extra jar file?

Posted by "Arthur.hk.chan@gmail.com" <ar...@gmail.com>.

Hi,

Thanks.

Below are my steps, I did copy my JAR to HDFS and "CREATE FUNCTION  using the JAR in HDFS", however during my smoke test, I got FileNotFoundException.

>> java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar



>> Step 1:   (make sure the jar in in HDFS)
>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> 
>> Step 2: (drop if function exists) 
>> hive> drop function sysdate;                                                  
>> OK
>> Time taken: 0.013 seconds
>> 
>> Step 3: (create function using the jar in HDFS)
>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> Added /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar to class path
>> Added resource: /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> OK
>> Time taken: 0.034 seconds
>> 
>> Step 4: (test)
>> hive> select sysdate(); 
>> Execution log at: /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>> java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar


Please help!

Arthur



On 31 Dec, 2014, at 12:31 am, Nitin Pawar <ni...@gmail.com> wrote:

> just copy pasting Jason's reply to other thread 
> 
> If you have a recent version of Hive (0.13+), you could try registering your UDF as a "permanent" UDF which was added in HIVE-6047:
> 
> 1) Copy your JAR somewhere on HDFS, say hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar. 
> 2) In Hive, run CREATE FUNCTION zeroifnull AS 'com.test.udf.ZeroIfNullUDF' USING JAR 'hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar';
> 
> The function definition should be saved in the metastore and Hive should remember to pull the JAR from the location you specified in the CREATE FUNCTION call.
> 
> On Tue, Dec 30, 2014 at 9:54 PM, Arthur.hk.chan@gmail.com <ar...@gmail.com> wrote:
> Thank you.
> 
> Will this work for hiveserver2 ?
> 
> 
> Arthur
> 
> On 30 Dec, 2014, at 2:24 pm, vic0777 <vi...@163.com> wrote:
> 
>> 
>> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. Then, the file is automatically loaded when Hive is started.
>> 
>> Wantao
>> 
>> 
>> 
>> 
>> At 2014-12-30 11:01:06, "Arthur.hk.chan@gmail.com" <ar...@gmail.com> wrote:
>> Hi,
>> 
>> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an extra JAR file to hive for UDF, below are my steps to create the UDF function. I have tried the following but still no luck to get thru.
>> 
>> Please help!!
>> 
>> Regards
>> Arthur
>> 
>> 
>> Step 1:   (make sure the jar in in HDFS)
>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 10:02 hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> 
>> Step 2: (drop if function exists) 
>> hive> drop function sysdate;                                                  
>> OK
>> Time taken: 0.013 seconds
>> 
>> Step 3: (create function using the jar in HDFS)
>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> Added /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar to class path
>> Added resource: /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> OK
>> Time taken: 0.034 seconds
>> 
>> Step 4: (test)
>> hive> select sysdate();                                                                                                                                
>> Automatically selecting local only mode for query
>> Total jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
>> 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an attempt to override final parameter: yarn.nodemanager.loacl-dirs;  Ignoring.
>> 14/12/30 10:17:06 WARN conf.Configuration: file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
>> Execution log at: /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>> java.io.FileNotFoundException: File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> 	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
>> 	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
>> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
>> 	at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>> 	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
>> 	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
>> 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
>> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>> 	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 	at java.lang.reflect.Method.invoke(Method.java:606)
>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>> Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar)'
>> Execution failed with exit status: 1
>> Obtaining error information
>> Task failed!
>> Task ID:
>>   Stage-1
>> Logs:
>> /tmp/hadoop/hive.log
>> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> 
>> 
>> Step 5: (check the file)
>> hive> dfs -ls /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> ls: `/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar': No such file or directory
>> Command failed with exit code = 1
>> Query returned non-zero code: 1, cause: null
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> -- 
> Nitin Pawar