You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Jianhua Wang <wj...@163.com> on 2011/03/01 03:19:45 UTC

about User scripte in HiveQL

Hi all,

Recently, i have met a problem, and i can not solve it after some efforts. So I wanna look for help here, and any help will be appreciated. Thanks!

My case is depicted as below:

I want to execute the HiveQL command :

select transform(a.col) using '/home/pc/mypython.py' as (col string) from tmp_table a where a.col2='01';

where the 'mypython.py' is a python script of mine.

I have built a environment of hadoop within the vmware machine on my single node PC-home, and the command works well on this environment within only single node.

I also have a cluster of three PC servers, including node A, B, and C.

Then, I store the '/home/pc/mypython.py' on node A.

However, every time I issue the command to the cluster, i am always going to get the error information like this:

-------------------------------------------------------------------------------------------------------------------
Caused by: java.io.IOException: Cannot run program "/home/pc/mypython.py": java.io.IOException: error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
... 20 more
Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 21 more
-------------------------------------------------------------------------------------------------------------------
By looking up the Job logs, these errors were reported by node B and node C. It seems that the tasktracker B and C can not find the script.
On hive wiki, I didn't find any instruction on how to place the user script.
What should I do to place my script in proper place?
Thanks in advance for any reply!

2011-03-01

Jianhua Wang

Re: about User scripte in HiveQL

Posted by Roberto Congiu <ro...@openx.org>.

You have to add the file to the query like in the example

http://wiki.apache.org/hadoop/Hive/GettingStarted

look at the part in red.

CREATE TABLE u_data_new (
  userid INT,
  movieid INT,
  rating INT,
  weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
add FILE weekday_mapper.py;

INSERT OVERWRITE TABLE u_data_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)
  USING 'python weekday_mapper.py'
  AS (userid, movieid, rating, weekday)
FROM u_data;

SELECT weekday, COUNT(*)
FROM u_data_new
GROUP BY weekday;


2011/2/28 Jianhua Wang <wj...@163.com>

> Hi all,
>
>      Recently, i have met a problem, and i can not solve it after some
> efforts. So I wanna look for help here, and any help will be appreciated.
> Thanks!
>
>      My case is depicted as below:
>
>      I want to execute the HiveQL command :
>
> select transform(a.col) using '/home/pc/mypython.py' as (col string) from
> tmp_table a where a.col2='01';
>
> where the 'mypython.py' is a python script of mine.
>
> I have built a environment of hadoop within the vmware machine on my single
> node PC-home, and the command works well on this environment within only
> single node.
>
> I also have a cluster of three PC servers, including node A, B, and C.
>
> Then, I store the '/home/pc/mypython.py' on node A.
>
> However, every time I issue the command to the cluster, i am always going
> to get the error information like this:
>
>
> -------------------------------------------------------------------------------------------------------------------
> Caused by: java.io.IOException: Cannot run program "/home/pc/mypython.py":
> java.io.IOException: error=2, No such file or directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>         at
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
>         ... 20 more
>    Caused by: java.io.IOException: java.io.IOException: error=2, No such
> file or directory
>         at java.lang.UNIXProcess.(UNIXProcess.java:148)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>         ... 21 more
>
>  -------------------------------------------------------------------------------------------------------------------
> By looking up the Job logs, these errors were reported by node B and node
> C. It seems that the tasktracker B and C can not find the script.
> On hive wiki, I didn't find any instruction on how to place the user
> script.
> What should I do to place my script in proper place?
> Thanks in advance for any reply!
>
> 2011-03-01
>
>
>
> Jianhua Wang
>



-- 
Roberto Congiu -Data Engineer - OpenX
20 E Del Mar blvd, Pasadena, CA

Re: about User scripte in HiveQL

Posted by Wil - <wi...@yahoo.com>.

Hi,

You would need to add the files to the distributed cache so other machines can 
access it.

http://wiki.apache.org/hadoop/Hive/GettingStarted#STREAMING
http://wiki.apache.org/hadoop/Hive/LanguageManual/Cli#Hive_Resources

hive> add file /home/pc/mypython.py;
hive> select transform(a.col) using  'mypython.py' as (col string) 
from tmp_table a where a.col2='01';




________________________________
From: Jianhua Wang <wj...@163.com>
To: user <us...@hive.apache.org>; "dev@hive.apache.org" <de...@hive.apache.org>
Sent: Mon, February 28, 2011 6:19:45 PM
Subject: about User scripte in HiveQL

  
Hi all,
 
      Recently, i have met a problem, and i can  not solve it after some 
efforts. So I wanna look for help here, and any help  will be appreciated. 
Thanks!
 
      My case is depicted as below:
 
      I want to execute the HiveQL command : 
 
select transform(a.col) using  '/home/pc/mypython.py' as (col string) from 
tmp_table a where a.col2='01';
 
where the 'mypython.py' is a python script of  mine.
 
I have built a environment of hadoop within the  vmware machine on my single 
node PC-home, and the command works well on  this environment within only single 

node.
 
I also have a cluster of three PC servers,  including node A, B, and C.
 
Then, I store the '/home/pc/mypython.py' on node  A.
 
However, every time I issue the command to  the cluster, i am always going to 
get the error information like this: 

 
-------------------------------------------------------------------------------------------------------------------


Caused by: java.io.IOException: Cannot run program  "/home/pc/mypython.py": 
java.io.IOException: error=2, No such file or  directory
         at  java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
         at  
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
         ...  20 more
    Caused by: java.io.IOException:  java.io.IOException: error=2, No such file 
or  directory
         at  java.lang.UNIXProcess.(UNIXProcess.java:148)
         at  java.lang.ProcessImpl.start(ProcessImpl.java:65)
         at  java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
         ...  21  more
    -------------------------------------------------------------------------------------------------------------------


By looking up the Job logs, these errors were reported by node B and node C. It 
seems that the tasktracker B and C can not find the script. 

On hive wiki, I didn't find any instruction on how to place the user script.
What should I do to place my script in proper place? 
Thanks in advance for any reply!
 
2011-03-01 
________________________________

Jianhua Wang