You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Haishan Liu (JIRA)" <ji...@apache.org> on 2015/07/30 08:35:04 UTC

[jira] [Updated] (PIG-4646) PushUpFilter should not push before nested projection with FILTER operators

     [ https://issues.apache.org/jira/browse/PIG-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Haishan Liu updated PIG-4646:
-----------------------------
    Description: 
Verified the problem in 0.11.1. In short, filter should not be pushed before a nested foreach in which another filter operator is present. See the following minimum example:

{code}
cat data;

(1, {(1000, 'a'), (1001, 'b')})
(2, {(2000, 'a'), (2001, 'b'), (2002, 'c')})

A = load 'data' as (id:int, hits:{(score:int, name:chararray)});
B = foreach A {
  filtered = filter hits by score > 2000;
  generate id, filtered;
};

dump B;

(1,{})
(2,{(2001,'b'),(2002,'c')})

C = filter B by SIZE(filtered) > 0;

dump C;

(1,{})
(2,{(2001,'b'),(2002,'c')})
{code}

The desired result can be achieved with either '-optimizer_off PushUpFilter' when invoking Pig, or using the following convoluted way:
{code}
C = foreach B generate SIZE(filtered) as size, id, filtered;
D = filter C by size > 0;
E = foreach D generate id, filtered;

dump E;

(2,{(2001,'b'),(2002,'c')})
{code}

  was:
Verified the problem in 0.11.1. In short, filter should not be pushed before a nested foreach in which another filter operator is present. See the following minimum example:

{code}
cat data;

(1, {(1000, 'a'), (1001, 'b')})
(2, {(2000, 'a'), (2001, 'b'), (2002, 'c')})

A = load 'data' as (id:int, hits:{(score:int, name:chararray)});
B = foreach A {
  filtered = filter hits by score > 2000;
  generate id, filtered;
};

dump B;

(1,{})
(2,{(2001,'b'),(2002,'c')})

C = filter B by SIZE(filtered) > 0;

dump C;

(1,{})
(2,{(2001,'b'),(2002,'c')})
{code}


> PushUpFilter should not push before nested projection with FILTER operators
> ---------------------------------------------------------------------------
>
>                 Key: PIG-4646
>                 URL: https://issues.apache.org/jira/browse/PIG-4646
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1
>            Reporter: Haishan Liu
>
> Verified the problem in 0.11.1. In short, filter should not be pushed before a nested foreach in which another filter operator is present. See the following minimum example:
> {code}
> cat data;
> (1, {(1000, 'a'), (1001, 'b')})
> (2, {(2000, 'a'), (2001, 'b'), (2002, 'c')})
> A = load 'data' as (id:int, hits:{(score:int, name:chararray)});
> B = foreach A {
>   filtered = filter hits by score > 2000;
>   generate id, filtered;
> };
> dump B;
> (1,{})
> (2,{(2001,'b'),(2002,'c')})
> C = filter B by SIZE(filtered) > 0;
> dump C;
> (1,{})
> (2,{(2001,'b'),(2002,'c')})
> {code}
> The desired result can be achieved with either '-optimizer_off PushUpFilter' when invoking Pig, or using the following convoluted way:
> {code}
> C = foreach B generate SIZE(filtered) as size, id, filtered;
> D = filter C by size > 0;
> E = foreach D generate id, filtered;
> dump E;
> (2,{(2001,'b'),(2002,'c')})
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)