You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Haishan Liu (JIRA)" <ji...@apache.org> on 2015/07/30 08:35:04 UTC
[jira] [Updated] (PIG-4646) PushUpFilter should not push before
nested projection with FILTER operators
[ https://issues.apache.org/jira/browse/PIG-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haishan Liu updated PIG-4646:
-----------------------------
Description:
Verified the problem in 0.11.1. In short, filter should not be pushed before a nested foreach in which another filter operator is present. See the following minimum example:
{code}
cat data;
(1, {(1000, 'a'), (1001, 'b')})
(2, {(2000, 'a'), (2001, 'b'), (2002, 'c')})
A = load 'data' as (id:int, hits:{(score:int, name:chararray)});
B = foreach A {
filtered = filter hits by score > 2000;
generate id, filtered;
};
dump B;
(1,{})
(2,{(2001,'b'),(2002,'c')})
C = filter B by SIZE(filtered) > 0;
dump C;
(1,{})
(2,{(2001,'b'),(2002,'c')})
{code}
The desired result can be achieved with either '-optimizer_off PushUpFilter' when invoking Pig, or using the following convoluted way:
{code}
C = foreach B generate SIZE(filtered) as size, id, filtered;
D = filter C by size > 0;
E = foreach D generate id, filtered;
dump E;
(2,{(2001,'b'),(2002,'c')})
{code}
was:
Verified the problem in 0.11.1. In short, filter should not be pushed before a nested foreach in which another filter operator is present. See the following minimum example:
{code}
cat data;
(1, {(1000, 'a'), (1001, 'b')})
(2, {(2000, 'a'), (2001, 'b'), (2002, 'c')})
A = load 'data' as (id:int, hits:{(score:int, name:chararray)});
B = foreach A {
filtered = filter hits by score > 2000;
generate id, filtered;
};
dump B;
(1,{})
(2,{(2001,'b'),(2002,'c')})
C = filter B by SIZE(filtered) > 0;
dump C;
(1,{})
(2,{(2001,'b'),(2002,'c')})
{code}
> PushUpFilter should not push before nested projection with FILTER operators
> ---------------------------------------------------------------------------
>
> Key: PIG-4646
> URL: https://issues.apache.org/jira/browse/PIG-4646
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1
> Reporter: Haishan Liu
>
> Verified the problem in 0.11.1. In short, filter should not be pushed before a nested foreach in which another filter operator is present. See the following minimum example:
> {code}
> cat data;
> (1, {(1000, 'a'), (1001, 'b')})
> (2, {(2000, 'a'), (2001, 'b'), (2002, 'c')})
> A = load 'data' as (id:int, hits:{(score:int, name:chararray)});
> B = foreach A {
> filtered = filter hits by score > 2000;
> generate id, filtered;
> };
> dump B;
> (1,{})
> (2,{(2001,'b'),(2002,'c')})
> C = filter B by SIZE(filtered) > 0;
> dump C;
> (1,{})
> (2,{(2001,'b'),(2002,'c')})
> {code}
> The desired result can be achieved with either '-optimizer_off PushUpFilter' when invoking Pig, or using the following convoluted way:
> {code}
> C = foreach B generate SIZE(filtered) as size, id, filtered;
> D = filter C by size > 0;
> E = foreach D generate id, filtered;
> dump E;
> (2,{(2001,'b'),(2002,'c')})
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)