You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Michael May <mi...@gowalla.com> on 2011/06/23 00:59:25 UTC
Exception when using REGEX_EXTRACT with chararray, what gives?
Hello All,
I'm having an issue where I get a 'ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when passing in something of type chararray to REGEX_EXTRACT.
e.g.
A = load '/path/to/some/data' ....
where A has a schema of something like ( f1:chararray, .... )
B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as regex_extract;
This gives me the above error.
Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is ( f1:bytearray, ..... ) this works as expected.
What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
My understanding is that chararray is supposed to be used for things that are Strings, which is why I find the 'cannot cast to String' exception a bit funky. I've looked through the REGEX_EXTRACT source and looked over the JavaDoc's pertaining to DataTypes without being able to crack this.
Any help and information is appreciated!
Thanks for you time,
Michael
Re: Exception when using REGEX_EXTRACT with chararray, what gives?
Posted by Michael May <mi...@gowalla.com>.
I'm on Pig 0.8.0.
I am using a custom loader that is extending LoadFunc and implementing LoadMetaData. I think my custom loader is essentially attempting to do what PigStorageSchema does in PiggyBank.
After reading through the PigStorageSchema source it was pretty obvious that I had overlooked several things in my implementation. I'm going to go ahead and try to use PigStorageSchema.
Thanks for the help,
Michael
On Jun 23, 2011, at 3:35 AM, Dmitriy Ryaboy wrote:
> Which version of pig? Are you using a special loader?
> I just tried with 8.1:
>
> n = load 'tmp/numbers.txt' as (num:chararray);
> f = foreach n generate REGEX_EXTRACT($0, '(\\d)', 1);
> dump f;
> (1)
> (2)
> (3)
> (4)
> (5)
>
>
> -D
>
> On Wed, Jun 22, 2011 at 3:59 PM, Michael May <mi...@gowalla.com> wrote:
>
>> Hello All,
>>
>> I'm having an issue where I get a 'ClassCastException:
>> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when
>> passing in something of type chararray to REGEX_EXTRACT.
>>
>> e.g.
>> A = load '/path/to/some/data' ....
>> where A has a schema of something like ( f1:chararray, .... )
>>
>> B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as
>> regex_extract;
>>
>> This gives me the above error.
>>
>> Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is (
>> f1:bytearray, ..... ) this works as expected.
>>
>>
>> What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
>> My understanding is that chararray is supposed to be used for things that
>> are Strings, which is why I find the 'cannot cast to String' exception a bit
>> funky. I've looked through the REGEX_EXTRACT source and looked over the
>> JavaDoc's pertaining to DataTypes without being able to crack this.
>>
>> Any help and information is appreciated!
>> Thanks for you time,
>>
>> Michael
Re: Exception when using REGEX_EXTRACT with chararray, what gives?
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Which version of pig? Are you using a special loader?
I just tried with 8.1:
n = load 'tmp/numbers.txt' as (num:chararray);
f = foreach n generate REGEX_EXTRACT($0, '(\\d)', 1);
dump f;
(1)
(2)
(3)
(4)
(5)
-D
On Wed, Jun 22, 2011 at 3:59 PM, Michael May <mi...@gowalla.com> wrote:
> Hello All,
>
> I'm having an issue where I get a 'ClassCastException:
> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String' when
> passing in something of type chararray to REGEX_EXTRACT.
>
> e.g.
> A = load '/path/to/some/data' ....
> where A has a schema of something like ( f1:chararray, .... )
>
> B = foreach A generate REGEX_EXTRACT( f1, <the regex>, 1 ) as
> regex_extract;
>
> This gives me the above error.
>
> Now, the kicker is that if f1 is of type bytearray, (i.e. the schema is (
> f1:bytearray, ..... ) this works as expected.
>
>
> What gives? Am I using REGEX_EXTRACT wrong? Is this a bug?
> My understanding is that chararray is supposed to be used for things that
> are Strings, which is why I find the 'cannot cast to String' exception a bit
> funky. I've looked through the REGEX_EXTRACT source and looked over the
> JavaDoc's pertaining to DataTypes without being able to crack this.
>
> Any help and information is appreciated!
> Thanks for you time,
>
> Michael