You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Michael Moore <im...@live.com> on 2012/03/19 20:49:10 UTC

Filter based on results of previous filter

Hi All,
I have a statement like this:
-- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == 'sampleDataPoint';
I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:D = FILTER B BY datapoint1 == C.dataPoint2;
(This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
For example:  (format: dataPoint1,datapoint2)
B would return:1,21,42,82,13,78,7
If sampleDataPoint =2, C would return2,82,1
I'd like D to return:1,21,48,7
Is there a clever way to do this that I'm missing?  Thanks!-Mike 		 	   		  

Re: Filter based on results of previous filter

Posted by Thejas Nair <th...@hortonworks.com>.
To get what you want, you need to do an (inner) join on C and D,  using 
the lhs and rhs of the equality as join keys.

-Thejas

On 3/19/12 12:52 PM, Michael Moore wrote:
>
> Apologies for the formatting of the previous email.  Here's the question properly formatted:
> Hi All,
> I have a statement like this:
> -- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == 'sampleDataPoint';
> I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:D = FILTER B BY datapoint1 == C.dataPoint2;
> (This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
> For example:  (format: dataPoint1,datapoint2)
> B would return:1,21,42,82,13,78,7
> If sampleDataPoint =2, C would return2,82,1
> I'd like D to return:1,21,48,7
> Is there a clever way to do this that I'm missing?  Thanks!-Mike
>> From: imichaeldotorg@live.com
>> To: user@pig.apache.org
>> Subject: Filter based on results of previous filter
>> Date: Mon, 19 Mar 2012 15:49:10 -0400
>>
>>
>> Hi All,
>> I have a statement like this:
>> -- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == 'sampleDataPoint';
>> I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:D = FILTER B BY datapoint1 == C.dataPoint2;
>> (This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
>> For example:  (format: dataPoint1,datapoint2)
>> B would return:1,21,42,82,13,78,7
>> If sampleDataPoint =2, C would return2,82,1
>> I'd like D to return:1,21,48,7
>> Is there a clever way to do this that I'm missing?  Thanks!-Mike 		 	   		
>   		 	   		


RE: Filter based on results of previous filter

Posted by Michael Moore <im...@live.com>.
Apologies for the formatting of the previous email.  Here's the question properly formatted:
Hi All,
I have a statement like this:
-- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == 'sampleDataPoint';
I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:D = FILTER B BY datapoint1 == C.dataPoint2;
(This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
For example:  (format: dataPoint1,datapoint2)
B would return:1,21,42,82,13,78,7
If sampleDataPoint =2, C would return2,82,1
I'd like D to return:1,21,48,7
Is there a clever way to do this that I'm missing?  Thanks!-Mike
> From: imichaeldotorg@live.com
> To: user@pig.apache.org
> Subject: Filter based on results of previous filter
> Date: Mon, 19 Mar 2012 15:49:10 -0400
> 
> 
> Hi All,
> I have a statement like this:
> -- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == 'sampleDataPoint';
> I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:D = FILTER B BY datapoint1 == C.dataPoint2;
> (This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
> For example:  (format: dataPoint1,datapoint2)
> B would return:1,21,42,82,13,78,7
> If sampleDataPoint =2, C would return2,82,1
> I'd like D to return:1,21,48,7
> Is there a clever way to do this that I'm missing?  Thanks!-Mike