You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by pRaShAnT <pr...@gmail.com> on 2011/11/17 19:52:30 UTC
bytearray to chararray cast exception
As per Alan F Gates in "Programming Pig" :
*Pig does not know whether integer values in baseball are stored as ASCII
strings, Java serialized values, binary coded decimal, or some other
format. So it asks the load function. It is the responsibility of the load
function to cast bytearrays to other types. In general this works nicely,
but it does lead to a few corner cases where Pig does not know how to cast
a bytearray. In particular, if a UDF returns a bytearray Pig will not know
how to perform casts on it, because that bytearray is not generated by a
load function.*
I have a UDF that does exactly this, return a Tuple of bytearrays and I am
unable to cast them to other types. *How can I get around this?
*
For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray).
A = load 'input' using PigStorage();
B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r);
But if I try
C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST EXCEPTION -
bytearray cannot be cast to chararray
OR IF I TRY
C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION-
bytearray cannot be cast to chararray
Thanks,
Prashant
Re: bytearray to chararray cast exception
Posted by Prashant Kommireddi <pr...@gmail.com>.
Thanks Xuting. I don't think the problem is about accessing a non-existent
field, its more about the CAST between bytearray and other types when the
bytearray is returned by a UDF (and not generated by a LOAD function).
I am trying to find a workaround for this.
Best,
Prashant
On Thu, Nov 17, 2011 at 11:56 AM, XT Z <bi...@gmail.com> wrote:
> Hi Prashant,
>
> The Pig wiki says: "If a UDF returns a tuple or a bag and schema
> information is not provided, Pig assumes that the tuple contains a single
> field of type bytearray. If this is not the case, then not specifying the
> schema can cause failures.". It also offers some examples in this case. It
> seems the solution is to overload and define your schema information in the
> OutputSchema function in the EvalFunction. I find this in the following
> link:
> http://wiki.apache.org/pig/UDFManual
>
> Hope it can help.
> Best,
> Xuting
>
> 2011/11/17 pRaShAnT <pr...@gmail.com>
>
> > As per Alan F Gates in "Programming Pig" :
> >
> > *Pig does not know whether integer values in baseball are stored as ASCII
> > strings, Java serialized values, binary coded decimal, or some other
> > format. So it asks the load function. It is the responsibility of the
> load
> > function to cast bytearrays to other types. In general this works nicely,
> > but it does lead to a few corner cases where Pig does not know how to
> cast
> > a bytearray. In particular, if a UDF returns a bytearray Pig will not
> know
> > how to perform casts on it, because that bytearray is not generated by a
> > load function.*
> >
> > I have a UDF that does exactly this, return a Tuple of bytearrays and I
> am
> > unable to cast them to other types. *How can I get around this?
> > *
> > For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray).
> >
> > A = load 'input' using PigStorage();
> > B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r);
> >
> > But if I try
> >
> > C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST
> EXCEPTION -
> > bytearray cannot be cast to chararray
> >
> > OR IF I TRY
> >
> > C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION-
> > bytearray cannot be cast to chararray
> >
> > Thanks,
> > Prashant
> >
>
Re: bytearray to chararray cast exception
Posted by XT Z <bi...@gmail.com>.
Hi Prashant,
The Pig wiki says: "If a UDF returns a tuple or a bag and schema
information is not provided, Pig assumes that the tuple contains a single
field of type bytearray. If this is not the case, then not specifying the
schema can cause failures.". It also offers some examples in this case. It
seems the solution is to overload and define your schema information in the
OutputSchema function in the EvalFunction. I find this in the following
link:
http://wiki.apache.org/pig/UDFManual
Hope it can help.
Best,
Xuting
2011/11/17 pRaShAnT <pr...@gmail.com>
> As per Alan F Gates in "Programming Pig" :
>
> *Pig does not know whether integer values in baseball are stored as ASCII
> strings, Java serialized values, binary coded decimal, or some other
> format. So it asks the load function. It is the responsibility of the load
> function to cast bytearrays to other types. In general this works nicely,
> but it does lead to a few corner cases where Pig does not know how to cast
> a bytearray. In particular, if a UDF returns a bytearray Pig will not know
> how to perform casts on it, because that bytearray is not generated by a
> load function.*
>
> I have a UDF that does exactly this, return a Tuple of bytearrays and I am
> unable to cast them to other types. *How can I get around this?
> *
> For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray).
>
> A = load 'input' using PigStorage();
> B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r);
>
> But if I try
>
> C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST EXCEPTION -
> bytearray cannot be cast to chararray
>
> OR IF I TRY
>
> C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION-
> bytearray cannot be cast to chararray
>
> Thanks,
> Prashant
>