You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by CF <th...@genkiearth.com> on 2010/04/05 08:59:16 UTC

using DEFINE...SHIP() for programs residing on HDFS or S3

Hi,

Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD?

This pig script runs fine on a single hadoop installation on my local computer. Note that the ruby program is sourced from my local HDD.

    messages = LOAD 'msg.tsv';                                                                -- msg.tsv is in HDFS
    DEFINE message_to_words `words.rb` SHIP('words.rb');             -- words.rb is in my local computer
    words = STREAM messages THROUGH message_to_words;
    dump words;

However, I am trying to run this on an Amazon MapReduce instance, which means that I either have to ship from S3 or from HDFS. None of these SHIP commands worked for me:

    copyToLocal s3://bucketname/words.rb /home/hadoop/words.rb        -- copy to local drive
    cp s3://bucketname/words.rb words.rb                                                      -- copy to HDFS

    DEFINE message_to_words `words.rb` SHIP('hdfs:///words.rb');                          -- not working
    DEFINE message_to_words `words.rb` SHIP('S3://bucketname/words.rb');      -- not working
    DEFINE message_to_words `words.rb` SHIP('words.rb');                                     -- not working

Can anyone advice on the proper SHIP() syntax?

Thanks,
Chiew


Re: using DEFINE...SHIP() for programs residing on HDFS or S3

Posted by CF <th...@genkiearth.com>.
Hi Rekha,

Thank you very much for your response. You've pointed me to the right direction. 

Just for reference, I've successfully shipped the ruby code on an Amazon Elastic MapReduce pig run by doing: 

	messages = LOAD '$S3PATH/msg.tsv';
	cp $S3PATH/words.rb words.rb    -- copy to HDFS first
	copyToLocal words.rb /home/hadoop/words.rb   -- copy from HDFS into local amazon instance
	DEFINE message_to_words `ruby -Ku /home/hadoop/words.rb` SHIP('/home/hadoop/words.rb');
	words = STREAM messages THROUGH message_to_words;
	dump words;

cheers,
Chiew

On Apr 6, 2010, at 3:20 PM, Rekha Joshi wrote:

> Not sure what is the error on pig logs when you say ship failed, so hazarding a guess..
> 
> Assuming permissions are alright, sometimes the rb executable on compute node is not same as on local, so you might need ensure the ruby path.
> Also the file needs to be there locally to ship, if copyToLocal has not worked try hadoop fs -get. In the ship command, you may try giving fully-qualified local path to words.rb.,as
> 
> DEFINE message_to_words `/path/to/ruby words.rb` SHIP('/home/Cflocalpath/words.rb');
> 
> Cheers,
> /
> 
> On 4/5/10 12:29 PM, "CF" <th...@genkiearth.com> wrote:
> 
> Hi,
> 
> Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD?
> 
> This pig script runs fine on a single hadoop installation on my local computer. Note that the ruby program is sourced from my local HDD.
> 
>    messages = LOAD 'msg.tsv';                                                                -- msg.tsv is in HDFS
>    DEFINE message_to_words `words.rb` SHIP('words.rb');             -- words.rb is in my local computer
>    words = STREAM messages THROUGH message_to_words;
>    dump words;
> 
> However, I am trying to run this on an Amazon MapReduce instance, which means that I either have to ship from S3 or from HDFS. None of these SHIP commands worked for me:
> 
>    copyToLocal s3://bucketname/words.rb /home/hadoop/words.rb        -- copy to local drive
>    cp s3://bucketname/words.rb words.rb                                                      -- copy to HDFS
> 
>    DEFINE message_to_words `words.rb` SHIP('hdfs:///words.rb');                          -- not working
>    DEFINE message_to_words `words.rb` SHIP('S3://bucketname/words.rb');      -- not working
>    DEFINE message_to_words `words.rb` SHIP('words.rb');                                     -- not working
> 
> Can anyone advice on the proper SHIP() syntax?
> 
> Thanks,
> Chiew
> 
> 


Re: using DEFINE...SHIP() for programs residing on HDFS or S3

Posted by Rekha Joshi <re...@yahoo-inc.com>.
Not sure what is the error on pig logs when you say ship failed, so hazarding a guess..

Assuming permissions are alright, sometimes the rb executable on compute node is not same as on local, so you might need ensure the ruby path.
Also the file needs to be there locally to ship, if copyToLocal has not worked try hadoop fs -get. In the ship command, you may try giving fully-qualified local path to words.rb.,as

DEFINE message_to_words `/path/to/ruby words.rb` SHIP('/home/Cflocalpath/words.rb');

Cheers,
/

On 4/5/10 12:29 PM, "CF" <th...@genkiearth.com> wrote:

Hi,

Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD?

This pig script runs fine on a single hadoop installation on my local computer. Note that the ruby program is sourced from my local HDD.

    messages = LOAD 'msg.tsv';                                                                -- msg.tsv is in HDFS
    DEFINE message_to_words `words.rb` SHIP('words.rb');             -- words.rb is in my local computer
    words = STREAM messages THROUGH message_to_words;
    dump words;

However, I am trying to run this on an Amazon MapReduce instance, which means that I either have to ship from S3 or from HDFS. None of these SHIP commands worked for me:

    copyToLocal s3://bucketname/words.rb /home/hadoop/words.rb        -- copy to local drive
    cp s3://bucketname/words.rb words.rb                                                      -- copy to HDFS

    DEFINE message_to_words `words.rb` SHIP('hdfs:///words.rb');                          -- not working
    DEFINE message_to_words `words.rb` SHIP('S3://bucketname/words.rb');      -- not working
    DEFINE message_to_words `words.rb` SHIP('words.rb');                                     -- not working

Can anyone advice on the proper SHIP() syntax?

Thanks,
Chiew