You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Badrinarayanan S <ba...@fifthgentech.com> on 2011/04/07 14:26:49 UTC

Pig filter against flatten column

Hi, 

 

I am trying to run a filter against a column which is the result of a
flatten operation. But the filter clause throws an exception as
org.apache.pig.data.DataByteArray cannot be cast to java.lang.String.  The
exception is against the line doing the matches filter. If I change matches
to eq, I am not getting the exception and I don't get any result though I
have ColumnName having 'Page'.

 

Suspect the datatype of the ColumnName (which is result of flatten) of the
relation VisitPages is still bytearray. I have tried casting it to chararray
still same exception. However if I describe of VisitDetails it shows as
chararray. Any suggestions?

 

Below is the pig script:

 

Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
ColumnValue:chararray)})});

VisitsFlattened = FOREACH Visits GENERATE Id,
FLATTEN(DetailsBag.VisitDetails);

VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
FLATTEN(VisitDetails);

VisitDetails = FOREACH VisitDetailsFlattened GENERATE (chararray)ColumnName,
(chararray)ColumnValue, (chararray)Id;

DESCRIBE VisitDetails;

VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');

dump VisitPages;

......

 

Thanks,

badri



Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.

Re: Pig filter against flatten column

Posted by Jeremy Hanna <je...@gmail.com>.
The 0.7.4 version is here:
http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.7.4/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

The latest from 0.7 branch contains a way to get the cassandra schema for the column family it is querying against though:
http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

The contrib/pig directory has the build script.  If you download the source either from
http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.4/apache-cassandra-0.7.4-src.tar.gz
or from 
http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/ - 
everything is in the contrib/pig directory.

I'm also in #hadoop-pig on freenode (jeromatron) as well as Brandon (driftx) if you had any questions about it.

On Apr 8, 2011, at 1:31 PM, Daniel Dai wrote:

> I tried PigStorage, seems it is Ok. Suspect the issue is in CassandraStorage only. Where can I find the source code of it?
> 
> 
> On 04/07/2011 11:56 PM, Badrinarayanan S wrote:
>> Hi Daniel,
>> 
>> I was using Pig 0.8.0, I also ran against latest trunk. Still same issue.
>> 
>> Thanks,
>> badri
>> 
>> -----Original Message-----
>> From: Daniel Dai [mailto:jianyong@yahoo-inc.com]
>> Sent: Friday, April 08, 2011 3:53 AM
>> To: user@pig.apache.org
>> Subject: Re: Pig filter against flatten column
>> 
>> Which version of Pig are you using? Previous version of Pig have trouble
>> cast nested types. Can you try latest trunk?
>> 
>> Daniel
>> 
>> On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
>>> Hi,
>>> 
>>> 
>>> 
>>> I am trying to run a filter against a column which is the result of a
>>> flatten operation. But the filter clause throws an exception as
>>> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String.  The
>>> exception is against the line doing the matches filter. If I change
>> matches
>>> to eq, I am not getting the exception and I don't get any result though I
>>> have ColumnName having 'Page'.
>>> 
>>> 
>>> 
>>> Suspect the datatype of the ColumnName (which is result of flatten) of the
>>> relation VisitPages is still bytearray. I have tried casting it to
>> chararray
>>> still same exception. However if I describe of VisitDetails it shows as
>>> chararray. Any suggestions?
>>> 
>>> 
>>> 
>>> Below is the pig script:
>>> 
>>> 
>>> 
>>> Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
>>> DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
>>> VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
>>> ColumnValue:chararray)})});
>>> 
>>> VisitsFlattened = FOREACH Visits GENERATE Id,
>>> FLATTEN(DetailsBag.VisitDetails);
>>> 
>>> VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
>>> FLATTEN(VisitDetails);
>>> 
>>> VisitDetails = FOREACH VisitDetailsFlattened GENERATE
>> (chararray)ColumnName,
>>> (chararray)ColumnValue, (chararray)Id;
>>> 
>>> DESCRIBE VisitDetails;
>>> 
>>> VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');
>>> 
>>> dump VisitPages;
>>> 
>>> ......
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> badri
>>> 
>>> 
>>> 
>>> Disclaimer: This message (including any attachments) is being sent from
>> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
>> information that is proprietary, confidential and privileged. If you are not
>> the intended recipient, please inform the sender immediately by reply e-mail
>> and delete this message and attachments from your system, without retaining
>> a copy. Any unauthorized use or dissemination of this message in whole or in
>> part is strictly prohibited. 5G shall  not be liable for the improper or
>> incomplete transmission of the information contained in this  communication
>> nor for any delay in its receipt or damage to your system. 5G does not
>> guarantee that the integrity of this communication has been maintained nor
>> that this communication is free of viruses, interceptions or interference.
>> 
>> 
>> 
>> 
>> 
>> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
>> 
> 


Re: Pig filter against flatten column

Posted by Daniel Dai <ji...@yahoo-inc.com>.
I tried PigStorage, seems it is Ok. Suspect the issue is in 
CassandraStorage only. Where can I find the source code of it?


On 04/07/2011 11:56 PM, Badrinarayanan S wrote:
> Hi Daniel,
>
> I was using Pig 0.8.0, I also ran against latest trunk. Still same issue.
>
> Thanks,
> badri
>
> -----Original Message-----
> From: Daniel Dai [mailto:jianyong@yahoo-inc.com]
> Sent: Friday, April 08, 2011 3:53 AM
> To: user@pig.apache.org
> Subject: Re: Pig filter against flatten column
>
> Which version of Pig are you using? Previous version of Pig have trouble
> cast nested types. Can you try latest trunk?
>
> Daniel
>
> On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
>> Hi,
>>
>>
>>
>> I am trying to run a filter against a column which is the result of a
>> flatten operation. But the filter clause throws an exception as
>> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String.  The
>> exception is against the line doing the matches filter. If I change
> matches
>> to eq, I am not getting the exception and I don't get any result though I
>> have ColumnName having 'Page'.
>>
>>
>>
>> Suspect the datatype of the ColumnName (which is result of flatten) of the
>> relation VisitPages is still bytearray. I have tried casting it to
> chararray
>> still same exception. However if I describe of VisitDetails it shows as
>> chararray. Any suggestions?
>>
>>
>>
>> Below is the pig script:
>>
>>
>>
>> Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
>> DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
>> VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
>> ColumnValue:chararray)})});
>>
>> VisitsFlattened = FOREACH Visits GENERATE Id,
>> FLATTEN(DetailsBag.VisitDetails);
>>
>> VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
>> FLATTEN(VisitDetails);
>>
>> VisitDetails = FOREACH VisitDetailsFlattened GENERATE
> (chararray)ColumnName,
>> (chararray)ColumnValue, (chararray)Id;
>>
>> DESCRIBE VisitDetails;
>>
>> VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');
>>
>> dump VisitPages;
>>
>> ......
>>
>>
>>
>> Thanks,
>>
>> badri
>>
>>
>>
>> Disclaimer: This message (including any attachments) is being sent from
> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
> information that is proprietary, confidential and privileged. If you are not
> the intended recipient, please inform the sender immediately by reply e-mail
> and delete this message and attachments from your system, without retaining
> a copy. Any unauthorized use or dissemination of this message in whole or in
> part is strictly prohibited. 5G shall  not be liable for the improper or
> incomplete transmission of the information contained in this  communication
> nor for any delay in its receipt or damage to your system. 5G does not
> guarantee that the integrity of this communication has been maintained nor
> that this communication is free of viruses, interceptions or interference.
>
>
>
>
>
> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
>


RE: Pig filter against flatten column

Posted by Badrinarayanan S <ba...@fifthgentech.com>.
Hi Daniel, 

I was using Pig 0.8.0, I also ran against latest trunk. Still same issue.

Thanks,
badri 

-----Original Message-----
From: Daniel Dai [mailto:jianyong@yahoo-inc.com] 
Sent: Friday, April 08, 2011 3:53 AM
To: user@pig.apache.org
Subject: Re: Pig filter against flatten column

Which version of Pig are you using? Previous version of Pig have trouble 
cast nested types. Can you try latest trunk?

Daniel

On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
> Hi,
>
>
>
> I am trying to run a filter against a column which is the result of a
> flatten operation. But the filter clause throws an exception as
> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String.  The
> exception is against the line doing the matches filter. If I change
matches
> to eq, I am not getting the exception and I don't get any result though I
> have ColumnName having 'Page'.
>
>
>
> Suspect the datatype of the ColumnName (which is result of flatten) of the
> relation VisitPages is still bytearray. I have tried casting it to
chararray
> still same exception. However if I describe of VisitDetails it shows as
> chararray. Any suggestions?
>
>
>
> Below is the pig script:
>
>
>
> Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
> DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
> VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
> ColumnValue:chararray)})});
>
> VisitsFlattened = FOREACH Visits GENERATE Id,
> FLATTEN(DetailsBag.VisitDetails);
>
> VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
> FLATTEN(VisitDetails);
>
> VisitDetails = FOREACH VisitDetailsFlattened GENERATE
(chararray)ColumnName,
> (chararray)ColumnValue, (chararray)Id;
>
> DESCRIBE VisitDetails;
>
> VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');
>
> dump VisitPages;
>
> ......
>
>
>
> Thanks,
>
> badri
>
>
>
> Disclaimer: This message (including any attachments) is being sent from
Fifth Generation Technologies India (P) Ltd. (5G) and may contain
information that is proprietary, confidential and privileged. If you are not
the intended recipient, please inform the sender immediately by reply e-mail
and delete this message and attachments from your system, without retaining
a copy. Any unauthorized use or dissemination of this message in whole or in
part is strictly prohibited. 5G shall  not be liable for the improper or
incomplete transmission of the information contained in this  communication
nor for any delay in its receipt or damage to your system. 5G does not
guarantee that the integrity of this communication has been maintained nor
that this communication is free of viruses, interceptions or interference.





Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.


Re: Pig filter against flatten column

Posted by Daniel Dai <ji...@yahoo-inc.com>.
Which version of Pig are you using? Previous version of Pig have trouble 
cast nested types. Can you try latest trunk?

Daniel

On 04/07/2011 05:26 AM, Badrinarayanan S wrote:
> Hi,
>
>
>
> I am trying to run a filter against a column which is the result of a
> flatten operation. But the filter clause throws an exception as
> org.apache.pig.data.DataByteArray cannot be cast to java.lang.String.  The
> exception is against the line doing the matches filter. If I change matches
> to eq, I am not getting the exception and I don't get any result though I
> have ColumnName having 'Page'.
>
>
>
> Suspect the datatype of the ColumnName (which is result of flatten) of the
> relation VisitPages is still bytearray. I have tried casting it to chararray
> still same exception. However if I describe of VisitDetails it shows as
> chararray. Any suggestions?
>
>
>
> Below is the pig script:
>
>
>
> Visits = LOAD 'cassandra://test/Visits' USING CassandraStorage() as (Id,
> DetailsBag:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
> VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
> ColumnValue:chararray)})});
>
> VisitsFlattened = FOREACH Visits GENERATE Id,
> FLATTEN(DetailsBag.VisitDetails);
>
> VisitDetailsFlattened = FOREACH VisitsFlattened GENERATE Id,
> FLATTEN(VisitDetails);
>
> VisitDetails = FOREACH VisitDetailsFlattened GENERATE (chararray)ColumnName,
> (chararray)ColumnValue, (chararray)Id;
>
> DESCRIBE VisitDetails;
>
> VisitPages = FILTER VisitDetails BY (ColumnName MATCHES 'Page');
>
> dump VisitPages;
>
> ......
>
>
>
> Thanks,
>
> badri
>
>
>
> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.