You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ruslan Al-Fakikh <me...@gmail.com> on 2013/12/24 19:15:34 UTC

AvroStorage schema_uri pointing to local file doesn't work

Hey guys,

I am using AvroStorage like this:

STORE alias INTO '$OUTPUT'
    USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
    "index" : 1,
    "schema_uri": "file://path/schema.avsc"}');

so, it is explicit to take the schema.avsc from the local file system, not
HDFS.
It works in a pseudo-distributed cluster, but fails on a normal cluster
with java.io.FileNotFoundException for the schema file
Looks like this is happening in backend.
I assume this is because the backend invocation of AvroStorage on a node,
different from the node I am running the pig script from, cannot find the
file in the local file system.
Why can't it use the schema file from front-end invocation?
Does it mean that I am only limited to either HDFS locations for schema_uri
or using embedding the schema string in AvroStorage parameters?

Thanks in advance

Ruslan Al-Fakikh

Re: AvroStorage schema_uri pointing to local file doesn't work

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Thank you, Cheolsoo!

Ok, I'll have Pig 0.12 when my team upgrades to a newer CDH.
For now I am using this workaround:
%declare WORK_DIR `pwd`
%declare SCHEMA_LITERAL `cat $WORK_DIR/schema.avsc`
...
STORE inputs INTO 'output'
    USING com.magnetic.org.apache.pig.piggybank.storage.avro.AvroStorage('{
    "index" : 1,
    "schema": $SCHEMA_LITERAL}');

Best Regards,
Ruslan Al-Fakikh


On Wed, Dec 25, 2013 at 11:48 AM, Cheolsoo Park <pi...@gmail.com>wrote:

> avro to bcc:
>
> >> Why can't it use the schema file from front-end invocation?
>
> You're right. It should load the schema file in the front-end and pass it
> to the back-end via properties. Unfortunately, Piggybank AvroStorage
> doesn't do this.
>
> However, the new built-in AvroStorage in Pig 0.12 does exactly what you
> want. Can you use it instead?
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120
>
>
> On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <metaruslan@gmail.com
> >wrote:
>
> > Hey guys,
> >
> > I am using AvroStorage like this:
> >
> > STORE alias INTO '$OUTPUT'
> >     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> >     "index" : 1,
> >     "schema_uri": "file://path/schema.avsc"}');
> >
> > so, it is explicit to take the schema.avsc from the local file system,
> not
> > HDFS.
> > It works in a pseudo-distributed cluster, but fails on a normal cluster
> > with java.io.FileNotFoundException for the schema file
> > Looks like this is happening in backend.
> > I assume this is because the backend invocation of AvroStorage on a node,
> > different from the node I am running the pig script from, cannot find the
> > file in the local file system.
> > Why can't it use the schema file from front-end invocation?
> > Does it mean that I am only limited to either HDFS locations for
> > schema_uri or using embedding the schema string in AvroStorage
> parameters?
> >
> > Thanks in advance
> >
> > Ruslan Al-Fakikh
> >
>

Re: AvroStorage schema_uri pointing to local file doesn't work

Posted by Cheolsoo Park <pi...@gmail.com>.
avro to bcc:

>> Why can't it use the schema file from front-end invocation?

You're right. It should load the schema file in the front-end and pass it
to the back-end via properties. Unfortunately, Piggybank AvroStorage
doesn't do this.

However, the new built-in AvroStorage in Pig 0.12 does exactly what you
want. Can you use it instead?
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120


On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> Hey guys,
>
> I am using AvroStorage like this:
>
> STORE alias INTO '$OUTPUT'
>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
>     "index" : 1,
>     "schema_uri": "file://path/schema.avsc"}');
>
> so, it is explicit to take the schema.avsc from the local file system, not
> HDFS.
> It works in a pseudo-distributed cluster, but fails on a normal cluster
> with java.io.FileNotFoundException for the schema file
> Looks like this is happening in backend.
> I assume this is because the backend invocation of AvroStorage on a node,
> different from the node I am running the pig script from, cannot find the
> file in the local file system.
> Why can't it use the schema file from front-end invocation?
> Does it mean that I am only limited to either HDFS locations for
> schema_uri or using embedding the schema string in AvroStorage parameters?
>
> Thanks in advance
>
> Ruslan Al-Fakikh
>

Re: AvroStorage schema_uri pointing to local file doesn't work

Posted by Cheolsoo Park <pi...@gmail.com>.
avro to bcc:

>> Why can't it use the schema file from front-end invocation?

You're right. It should load the schema file in the front-end and pass it
to the back-end via properties. Unfortunately, Piggybank AvroStorage
doesn't do this.

However, the new built-in AvroStorage in Pig 0.12 does exactly what you
want. Can you use it instead?
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120


On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> Hey guys,
>
> I am using AvroStorage like this:
>
> STORE alias INTO '$OUTPUT'
>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
>     "index" : 1,
>     "schema_uri": "file://path/schema.avsc"}');
>
> so, it is explicit to take the schema.avsc from the local file system, not
> HDFS.
> It works in a pseudo-distributed cluster, but fails on a normal cluster
> with java.io.FileNotFoundException for the schema file
> Looks like this is happening in backend.
> I assume this is because the backend invocation of AvroStorage on a node,
> different from the node I am running the pig script from, cannot find the
> file in the local file system.
> Why can't it use the schema file from front-end invocation?
> Does it mean that I am only limited to either HDFS locations for
> schema_uri or using embedding the schema string in AvroStorage parameters?
>
> Thanks in advance
>
> Ruslan Al-Fakikh
>