You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Michael May <mi...@gowalla.com> on 2011/06/23 00:59:25 UTC

Exception when using REGEX_EXTRACT with chararray, what gives?

Hello All,

I'm having an issue where I get a 'ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when passing in something of type chararray to REGEX_EXTRACT.

e.g.
A = load '/path/to/some/data' .... 
where A has a schema of something like ( f1:chararray, .... )

B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as regex_extract;

This gives me the above error.

Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is ( f1:bytearray, ..... ) this works as expected.


What gives? Am I using REGEX_EXTRACT wrong? Is this a bug? 
My understanding is that chararray is supposed to be used for things that are Strings, which is why I find the 'cannot cast to String' exception a bit funky. I've looked through the REGEX_EXTRACT source and looked over the JavaDoc's pertaining to DataTypes without being able to crack this.

Any help and information is appreciated!
Thanks for you time,

Michael

Re: Exception when using REGEX_EXTRACT with chararray, what gives?

Posted by Michael May <mi...@gowalla.com>.
I'm on Pig 0.8.0.

I am using a custom loader that is extending LoadFunc and implementing LoadMetaData. I think my custom loader is essentially attempting to do what PigStorageSchema does in PiggyBank. 
After reading through the PigStorageSchema source it was pretty obvious that I had overlooked several things in my implementation.  I'm going to go ahead and try to use PigStorageSchema.

Thanks for the help,
Michael 
  
On Jun 23, 2011, at 3:35 AM, Dmitriy Ryaboy wrote:

> Which version of pig? Are you using a special loader?
> I just tried with 8.1:
> 
> n = load 'tmp/numbers.txt' as (num:chararray);
> f = foreach n generate REGEX_EXTRACT($0, '(\\d)', 1);
> dump f;
> (1)
> (2)
> (3)
> (4)
> (5)
> 
> 
> -D
> 
> On Wed, Jun 22, 2011 at 3:59 PM, Michael May <mi...@gowalla.com> wrote:
> 
>> Hello All,
>> 
>> I'm having an issue where I get a 'ClassCastException:
>> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when
>> passing in something of type chararray to REGEX_EXTRACT.
>> 
>> e.g.
>> A = load '/path/to/some/data' ....
>> where A has a schema of something like ( f1:chararray, .... )
>> 
>> B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as
>> regex_extract;
>> 
>> This gives me the above error.
>> 
>> Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is (
>> f1:bytearray, ..... ) this works as expected.
>> 
>> 
>> What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
>> My understanding is that chararray is supposed to be used for things that
>> are Strings, which is why I find the 'cannot cast to String' exception a bit
>> funky. I've looked through the REGEX_EXTRACT source and looked over the
>> JavaDoc's pertaining to DataTypes without being able to crack this.
>> 
>> Any help and information is appreciated!
>> Thanks for you time,
>> 
>> Michael


Re: Exception when using REGEX_EXTRACT with chararray, what gives?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Which version of pig? Are you using a special loader?
I just tried with 8.1:

n = load 'tmp/numbers.txt' as (num:chararray);
f = foreach n generate REGEX_EXTRACT($0, '(\\d)', 1);
dump f;
(1)
(2)
(3)
(4)
(5)


-D

On Wed, Jun 22, 2011 at 3:59 PM, Michael May <mi...@gowalla.com> wrote:

> Hello All,
>
> I'm having an issue where I get a 'ClassCastException:
> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when
> passing in something of type chararray to REGEX_EXTRACT.
>
> e.g.
> A = load '/path/to/some/data' ....
> where A has a schema of something like ( f1:chararray, .... )
>
> B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as
> regex_extract;
>
> This gives me the above error.
>
> Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is (
> f1:bytearray, ..... ) this works as expected.
>
>
> What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
> My understanding is that chararray is supposed to be used for things that
> are Strings, which is why I find the 'cannot cast to String' exception a bit
> funky. I've looked through the REGEX_EXTRACT source and looked over the
> JavaDoc's pertaining to DataTypes without being able to crack this.
>
> Any help and information is appreciated!
> Thanks for you time,
>
> Michael