You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ruslan Al-Fakikh <me...@gmail.com> on 2013/12/24 19:15:34 UTC
AvroStorage schema_uri pointing to local file doesn't work
Hey guys,
I am using AvroStorage like this:
STORE alias INTO '$OUTPUT'
USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
"index" : 1,
"schema_uri": "file://path/schema.avsc"}');
so, it is explicit to take the schema.avsc from the local file system, not
HDFS.
It works in a pseudo-distributed cluster, but fails on a normal cluster
with java.io.FileNotFoundException for the schema file
Looks like this is happening in backend.
I assume this is because the backend invocation of AvroStorage on a node,
different from the node I am running the pig script from, cannot find the
file in the local file system.
Why can't it use the schema file from front-end invocation?
Does it mean that I am only limited to either HDFS locations for schema_uri
or using embedding the schema string in AvroStorage parameters?
Thanks in advance
Ruslan Al-Fakikh
Re: AvroStorage schema_uri pointing to local file doesn't work
Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Thank you, Cheolsoo!
Ok, I'll have Pig 0.12 when my team upgrades to a newer CDH.
For now I am using this workaround:
%declare WORK_DIR `pwd`
%declare SCHEMA_LITERAL `cat $WORK_DIR/schema.avsc`
...
STORE inputs INTO 'output'
USING com.magnetic.org.apache.pig.piggybank.storage.avro.AvroStorage('{
"index" : 1,
"schema": $SCHEMA_LITERAL}');
Best Regards,
Ruslan Al-Fakikh
On Wed, Dec 25, 2013 at 11:48 AM, Cheolsoo Park <pi...@gmail.com>wrote:
> avro to bcc:
>
> >> Why can't it use the schema file from front-end invocation?
>
> You're right. It should load the schema file in the front-end and pass it
> to the back-end via properties. Unfortunately, Piggybank AvroStorage
> doesn't do this.
>
> However, the new built-in AvroStorage in Pig 0.12 does exactly what you
> want. Can you use it instead?
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120
>
>
> On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <metaruslan@gmail.com
> >wrote:
>
> > Hey guys,
> >
> > I am using AvroStorage like this:
> >
> > STORE alias INTO '$OUTPUT'
> > USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> > "index" : 1,
> > "schema_uri": "file://path/schema.avsc"}');
> >
> > so, it is explicit to take the schema.avsc from the local file system,
> not
> > HDFS.
> > It works in a pseudo-distributed cluster, but fails on a normal cluster
> > with java.io.FileNotFoundException for the schema file
> > Looks like this is happening in backend.
> > I assume this is because the backend invocation of AvroStorage on a node,
> > different from the node I am running the pig script from, cannot find the
> > file in the local file system.
> > Why can't it use the schema file from front-end invocation?
> > Does it mean that I am only limited to either HDFS locations for
> > schema_uri or using embedding the schema string in AvroStorage
> parameters?
> >
> > Thanks in advance
> >
> > Ruslan Al-Fakikh
> >
>
Re: AvroStorage schema_uri pointing to local file doesn't work
Posted by Cheolsoo Park <pi...@gmail.com>.
avro to bcc:
>> Why can't it use the schema file from front-end invocation?
You're right. It should load the schema file in the front-end and pass it
to the back-end via properties. Unfortunately, Piggybank AvroStorage
doesn't do this.
However, the new built-in AvroStorage in Pig 0.12 does exactly what you
want. Can you use it instead?
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120
On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:
> Hey guys,
>
> I am using AvroStorage like this:
>
> STORE alias INTO '$OUTPUT'
> USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> "index" : 1,
> "schema_uri": "file://path/schema.avsc"}');
>
> so, it is explicit to take the schema.avsc from the local file system, not
> HDFS.
> It works in a pseudo-distributed cluster, but fails on a normal cluster
> with java.io.FileNotFoundException for the schema file
> Looks like this is happening in backend.
> I assume this is because the backend invocation of AvroStorage on a node,
> different from the node I am running the pig script from, cannot find the
> file in the local file system.
> Why can't it use the schema file from front-end invocation?
> Does it mean that I am only limited to either HDFS locations for
> schema_uri or using embedding the schema string in AvroStorage parameters?
>
> Thanks in advance
>
> Ruslan Al-Fakikh
>
Re: AvroStorage schema_uri pointing to local file doesn't work
Posted by Cheolsoo Park <pi...@gmail.com>.
avro to bcc:
>> Why can't it use the schema file from front-end invocation?
You're right. It should load the schema file in the front-end and pass it
to the back-end via properties. Unfortunately, Piggybank AvroStorage
doesn't do this.
However, the new built-in AvroStorage in Pig 0.12 does exactly what you
want. Can you use it instead?
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120
On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:
> Hey guys,
>
> I am using AvroStorage like this:
>
> STORE alias INTO '$OUTPUT'
> USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> "index" : 1,
> "schema_uri": "file://path/schema.avsc"}');
>
> so, it is explicit to take the schema.avsc from the local file system, not
> HDFS.
> It works in a pseudo-distributed cluster, but fails on a normal cluster
> with java.io.FileNotFoundException for the schema file
> Looks like this is happening in backend.
> I assume this is because the backend invocation of AvroStorage on a node,
> different from the node I am running the pig script from, cannot find the
> file in the local file system.
> Why can't it use the schema file from front-end invocation?
> Does it mean that I am only limited to either HDFS locations for
> schema_uri or using embedding the schema string in AvroStorage parameters?
>
> Thanks in advance
>
> Ruslan Al-Fakikh
>