You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by pRaShAnT <pr...@gmail.com> on 2011/11/17 19:52:30 UTC

bytearray to chararray cast exception

As per Alan F Gates in "Programming Pig" :

*Pig does not know whether integer values in baseball are stored as ASCII
strings, Java serialized values, binary coded decimal, or some other
format. So it asks the load function. It is the responsibility of the load
function to cast bytearrays to other types. In general this works nicely,
but it does lead to a few corner cases where Pig does not know how to cast
a bytearray. In particular, if a UDF returns a bytearray Pig will not know
how to perform casts on it, because that bytearray is not generated by a
load function.*

I have a UDF that does exactly this, return a Tuple of bytearrays and I am
unable to cast them to other types. *How can I get around this?
*
For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray).

A = load 'input' using PigStorage();
B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r);

But if I try

C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST EXCEPTION -
bytearray cannot be cast to chararray

OR IF I TRY

C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION-
bytearray cannot be cast to chararray

Thanks,
Prashant

Re: bytearray to chararray cast exception

Posted by Prashant Kommireddi <pr...@gmail.com>.
Thanks Xuting. I don't think the problem is about accessing a non-existent
field, its more about the CAST between bytearray and other types when the
bytearray is returned by a UDF (and not generated by a LOAD function).

I am trying to find a workaround for this.

Best,
Prashant

On Thu, Nov 17, 2011 at 11:56 AM, XT Z <bi...@gmail.com> wrote:

> Hi Prashant,
>
>   The Pig wiki says: "If a UDF returns a tuple or a bag and schema
> information is not provided, Pig assumes that the tuple contains a single
> field of type bytearray. If this is not the case, then not specifying the
> schema can cause failures.". It also offers some examples in this case. It
> seems the solution is to overload and define your schema information in the
> OutputSchema function in the EvalFunction. I find this in the following
> link:
> http://wiki.apache.org/pig/UDFManual
>
>   Hope it can help.
> Best,
> Xuting
>
> 2011/11/17 pRaShAnT <pr...@gmail.com>
>
> > As per Alan F Gates in "Programming Pig" :
> >
> > *Pig does not know whether integer values in baseball are stored as ASCII
> > strings, Java serialized values, binary coded decimal, or some other
> > format. So it asks the load function. It is the responsibility of the
> load
> > function to cast bytearrays to other types. In general this works nicely,
> > but it does lead to a few corner cases where Pig does not know how to
> cast
> > a bytearray. In particular, if a UDF returns a bytearray Pig will not
> know
> > how to perform casts on it, because that bytearray is not generated by a
> > load function.*
> >
> > I have a UDF that does exactly this, return a Tuple of bytearrays and I
> am
> > unable to cast them to other types. *How can I get around this?
> > *
> > For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray).
> >
> > A = load 'input' using PigStorage();
> > B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r);
> >
> > But if I try
> >
> > C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST
> EXCEPTION -
> > bytearray cannot be cast to chararray
> >
> > OR IF I TRY
> >
> > C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION-
> > bytearray cannot be cast to chararray
> >
> > Thanks,
> > Prashant
> >
>

Re: bytearray to chararray cast exception

Posted by XT Z <bi...@gmail.com>.
Hi Prashant,

   The Pig wiki says: "If a UDF returns a tuple or a bag and schema
information is not provided, Pig assumes that the tuple contains a single
field of type bytearray. If this is not the case, then not specifying the
schema can cause failures.". It also offers some examples in this case. It
seems the solution is to overload and define your schema information in the
OutputSchema function in the EvalFunction. I find this in the following
link:
http://wiki.apache.org/pig/UDFManual

   Hope it can help.
Best,
Xuting

2011/11/17 pRaShAnT <pr...@gmail.com>

> As per Alan F Gates in "Programming Pig" :
>
> *Pig does not know whether integer values in baseball are stored as ASCII
> strings, Java serialized values, binary coded decimal, or some other
> format. So it asks the load function. It is the responsibility of the load
> function to cast bytearrays to other types. In general this works nicely,
> but it does lead to a few corner cases where Pig does not know how to cast
> a bytearray. In particular, if a UDF returns a bytearray Pig will not know
> how to perform casts on it, because that bytearray is not generated by a
> load function.*
>
> I have a UDF that does exactly this, return a Tuple of bytearrays and I am
> unable to cast them to other types. *How can I get around this?
> *
> For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray).
>
> A = load 'input' using PigStorage();
> B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r);
>
> But if I try
>
> C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST EXCEPTION -
> bytearray cannot be cast to chararray
>
> OR IF I TRY
>
> C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION-
> bytearray cannot be cast to chararray
>
> Thanks,
> Prashant
>