You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by meghana narasimhan <me...@gmail.com> on 2012/11/30 21:59:49 UTC

Pig 0.9.2 and avro on S3

Hi all,

Is this bug https://issues.apache.org/jira/browse/PIG-2540 applicable to
plain ec2 instances as well. I seem to have hit a snag with Apache Pig
version 0.9.2-cdh4.0.1 (rexported) and avro files on S3. My hadoop cluster
is made of Amazon ec2 instances.

Here is my load statement :

dimRad = LOAD 's3n://credentials@bucket/dimensions/2012/11/29/20121129-000159123456/dim'
USING
  AVRO_STORAGE AS
   (a:int
  , b:chararray
  );

and it gives me a :

2012-11-30 20:42:44,205 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: Wrong FS:
s3n://credentials@bucket/dimensions/2012/11/29/20121129-000159123456/dim,
expected: hdfs://ec2-1xxxx.compute-1.amazonaws.com:8020


Thanks,
Meg

Re: Pig 0.9.2 and avro on S3

Posted by meghana narasimhan <me...@gmail.com>.
Ah thanks William. I am trying it with an upgraded piggybank
of 0.10.0-cdh4.1.1. It seems to be running , else pig 10 would be the way
to go.


On Fri, Nov 30, 2012 at 1:11 PM, William Oberman
<ob...@civicscience.com>wrote:

> I should have read more closely, you're not using EMR.
>
> I'm guessing if you upgrade to pig 0.10 the issue will go away...
>
>
> On Fri, Nov 30, 2012 at 4:09 PM, William Oberman
> <ob...@civicscience.com>wrote:
>
> > A couple of weeks ago I spent a bunch of time trying to get EMR + S3 +
> > Avro working:
> > https://forums.aws.amazon.com/thread.jspa?messageID=398194&#398194
> >
> > Short story, yes I think PIG-2540 is the issue.  I'm currently trying to
> > get pig 0.10 running in EMR with help from AWS support.   You have to do:
> > --bootstrap-action s3://elasticmapreduce/bootstrap-actions/run-if --args
> > "instance.isMaster=true,s3://yourbucket/path/install_pig_0.10.0.sh"
> >
> > install_pig_0.10.0.sh contents:
> > ---------------------
> > #!/usr/bin/env bash
> > cd /home/hadoop
> > wget http://apache.mirrors.hoobly.com/pig/pig-0.10.0/pig-0.10.0.tar.gz
> > tar zxf pig-0.10.0.tar.gz
> > mv pig-0.10.0 pig
> > echo "export HADOOP_HOME=/home/hadoop" >> ~/.bashrc
> > echo "export PATH=/home/hadoop/pig/bin/:\$PATH" >> ~/.bashrc
> > cd pig
> > ant
> > cd contrib/piggybank/java
> > ant
> > cp piggybank.jar /home/hadoop/lib/.
> > cd /home/hadoop/lib
> > wget "http://json-simple.googlecode.com/files/json_simple-1.1.jar"
> > ------------------
> >
> > But note, I have NOT got around to testing this yet!   If you do, and it
> > works let me know :-)
> >
> > will
> >
> > On Fri, Nov 30, 2012 at 4:05 PM, meghana narasimhan <
> > meghana.narasimhan@gmail.com> wrote:
> >
> >> Oh I should also mention piggybank : 0.9.2-cdh4.0.1
> >>
> >>
> >> On Fri, Nov 30, 2012 at 12:59 PM, meghana narasimhan <
> >> meghana.narasimhan@gmail.com> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Is this bug https://issues.apache.org/jira/browse/PIG-2540 applicable
> >> to
> >> > plain ec2 instances as well. I seem to have hit a snag with Apache Pig
> >> > version 0.9.2-cdh4.0.1 (rexported) and avro files on S3. My hadoop
> >> cluster
> >> > is made of Amazon ec2 instances.
> >> >
> >> > Here is my load statement :
> >> >
> >> > dimRad = LOAD 's3n://credentials@bucket
> >> /dimensions/2012/11/29/20121129-000159123456/dim'
> >> > USING
> >> >   AVRO_STORAGE AS
> >> >    (a:int
> >> >   , b:chararray
> >> >   );
> >> >
> >> > and it gives me a :
> >> >
> >> > 2012-11-30 20:42:44,205 [main] ERROR org.apache.pig.tools.grunt.Grunt
> -
> >> > ERROR 1200: Wrong FS: s3n://credentials@bucket
> >> /dimensions/2012/11/29/20121129-000159123456/dim,
> >> > expected: hdfs://ec2-1xxxx.compute-1.amazonaws.com:8020
> >> >
> >> >
> >> > Thanks,
> >> > Meg
> >> >
> >>
> >
> >
> >
> >
>

Re: Pig 0.9.2 and avro on S3

Posted by William Oberman <ob...@civicscience.com>.
I should have read more closely, you're not using EMR.

I'm guessing if you upgrade to pig 0.10 the issue will go away...


On Fri, Nov 30, 2012 at 4:09 PM, William Oberman
<ob...@civicscience.com>wrote:

> A couple of weeks ago I spent a bunch of time trying to get EMR + S3 +
> Avro working:
> https://forums.aws.amazon.com/thread.jspa?messageID=398194&#398194
>
> Short story, yes I think PIG-2540 is the issue.  I'm currently trying to
> get pig 0.10 running in EMR with help from AWS support.   You have to do:
> --bootstrap-action s3://elasticmapreduce/bootstrap-actions/run-if --args
> "instance.isMaster=true,s3://yourbucket/path/install_pig_0.10.0.sh"
>
> install_pig_0.10.0.sh contents:
> ---------------------
> #!/usr/bin/env bash
> cd /home/hadoop
> wget http://apache.mirrors.hoobly.com/pig/pig-0.10.0/pig-0.10.0.tar.gz
> tar zxf pig-0.10.0.tar.gz
> mv pig-0.10.0 pig
> echo "export HADOOP_HOME=/home/hadoop" >> ~/.bashrc
> echo "export PATH=/home/hadoop/pig/bin/:\$PATH" >> ~/.bashrc
> cd pig
> ant
> cd contrib/piggybank/java
> ant
> cp piggybank.jar /home/hadoop/lib/.
> cd /home/hadoop/lib
> wget "http://json-simple.googlecode.com/files/json_simple-1.1.jar"
> ------------------
>
> But note, I have NOT got around to testing this yet!   If you do, and it
> works let me know :-)
>
> will
>
> On Fri, Nov 30, 2012 at 4:05 PM, meghana narasimhan <
> meghana.narasimhan@gmail.com> wrote:
>
>> Oh I should also mention piggybank : 0.9.2-cdh4.0.1
>>
>>
>> On Fri, Nov 30, 2012 at 12:59 PM, meghana narasimhan <
>> meghana.narasimhan@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > Is this bug https://issues.apache.org/jira/browse/PIG-2540 applicable
>> to
>> > plain ec2 instances as well. I seem to have hit a snag with Apache Pig
>> > version 0.9.2-cdh4.0.1 (rexported) and avro files on S3. My hadoop
>> cluster
>> > is made of Amazon ec2 instances.
>> >
>> > Here is my load statement :
>> >
>> > dimRad = LOAD 's3n://credentials@bucket
>> /dimensions/2012/11/29/20121129-000159123456/dim'
>> > USING
>> >   AVRO_STORAGE AS
>> >    (a:int
>> >   , b:chararray
>> >   );
>> >
>> > and it gives me a :
>> >
>> > 2012-11-30 20:42:44,205 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> > ERROR 1200: Wrong FS: s3n://credentials@bucket
>> /dimensions/2012/11/29/20121129-000159123456/dim,
>> > expected: hdfs://ec2-1xxxx.compute-1.amazonaws.com:8020
>> >
>> >
>> > Thanks,
>> > Meg
>> >
>>
>
>
>
>

Re: Pig 0.9.2 and avro on S3

Posted by William Oberman <ob...@civicscience.com>.
A couple of weeks ago I spent a bunch of time trying to get EMR + S3 + Avro
working:
https://forums.aws.amazon.com/thread.jspa?messageID=398194&#398194

Short story, yes I think PIG-2540 is the issue.  I'm currently trying to
get pig 0.10 running in EMR with help from AWS support.   You have to do:
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/run-if --args
"instance.isMaster=true,s3://yourbucket/path/install_pig_0.10.0.sh"

install_pig_0.10.0.sh contents:
---------------------
#!/usr/bin/env bash
cd /home/hadoop
wget http://apache.mirrors.hoobly.com/pig/pig-0.10.0/pig-0.10.0.tar.gz
tar zxf pig-0.10.0.tar.gz
mv pig-0.10.0 pig
echo "export HADOOP_HOME=/home/hadoop" >> ~/.bashrc
echo "export PATH=/home/hadoop/pig/bin/:\$PATH" >> ~/.bashrc
cd pig
ant
cd contrib/piggybank/java
ant
cp piggybank.jar /home/hadoop/lib/.
cd /home/hadoop/lib
wget "http://json-simple.googlecode.com/files/json_simple-1.1.jar"
------------------

But note, I have NOT got around to testing this yet!   If you do, and it
works let me know :-)

will

On Fri, Nov 30, 2012 at 4:05 PM, meghana narasimhan <
meghana.narasimhan@gmail.com> wrote:

> Oh I should also mention piggybank : 0.9.2-cdh4.0.1
>
>
> On Fri, Nov 30, 2012 at 12:59 PM, meghana narasimhan <
> meghana.narasimhan@gmail.com> wrote:
>
> > Hi all,
> >
> > Is this bug https://issues.apache.org/jira/browse/PIG-2540 applicable to
> > plain ec2 instances as well. I seem to have hit a snag with Apache Pig
> > version 0.9.2-cdh4.0.1 (rexported) and avro files on S3. My hadoop
> cluster
> > is made of Amazon ec2 instances.
> >
> > Here is my load statement :
> >
> > dimRad = LOAD 's3n://credentials@bucket
> /dimensions/2012/11/29/20121129-000159123456/dim'
> > USING
> >   AVRO_STORAGE AS
> >    (a:int
> >   , b:chararray
> >   );
> >
> > and it gives me a :
> >
> > 2012-11-30 20:42:44,205 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1200: Wrong FS: s3n://credentials@bucket
> /dimensions/2012/11/29/20121129-000159123456/dim,
> > expected: hdfs://ec2-1xxxx.compute-1.amazonaws.com:8020
> >
> >
> > Thanks,
> > Meg
> >
>

Re: Pig 0.9.2 and avro on S3

Posted by meghana narasimhan <me...@gmail.com>.
Oh I should also mention piggybank : 0.9.2-cdh4.0.1


On Fri, Nov 30, 2012 at 12:59 PM, meghana narasimhan <
meghana.narasimhan@gmail.com> wrote:

> Hi all,
>
> Is this bug https://issues.apache.org/jira/browse/PIG-2540 applicable to
> plain ec2 instances as well. I seem to have hit a snag with Apache Pig
> version 0.9.2-cdh4.0.1 (rexported) and avro files on S3. My hadoop cluster
> is made of Amazon ec2 instances.
>
> Here is my load statement :
>
> dimRad = LOAD 's3n://credentials@bucket/dimensions/2012/11/29/20121129-000159123456/dim'
> USING
>   AVRO_STORAGE AS
>    (a:int
>   , b:chararray
>   );
>
> and it gives me a :
>
> 2012-11-30 20:42:44,205 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1200: Wrong FS: s3n://credentials@bucket/dimensions/2012/11/29/20121129-000159123456/dim,
> expected: hdfs://ec2-1xxxx.compute-1.amazonaws.com:8020
>
>
> Thanks,
> Meg
>