You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Iman Elghandour <ie...@yahoo.com> on 2009/03/23 03:40:43 UTC
implicit splits in multiquery plans
Hello,
I have just noticed that the implicit split is added in the wrong place in this plan. I am just examining the plan for the Pig script that is available in the jira issue: https://issues.apache.org/jira/browse/PIG-627
A = load 'data' as (a, b, c);
B = filter A by a > 5;
store B into 'output1';
C = group B by b;
store C into 'output2';
The plan logical plan is below. I think the split operator
should be placed before the filter. And so the filter will
be performed on only one branch not on both.
Store 1-14 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: Unknown
|
|---SplitOutput[B] 1-21 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
| |
| Const 1-20 FieldSchema: boolean Type: boolean
|
|---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
|
|---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
| |
| GreaterThan 1-12 FieldSchema: boolean Type: boolean
| |
| |---Const 1-11 FieldSchema: int Type: int
| |
| |---Cast 1-18 FieldSchema: int Type: int
| |
| |---Project 1-10 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray
| Input: Load 1-9
|
|---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
Store 1-17 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c: bytearray}} Type: Unknown
|
|---CoGroup 1-16 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c: bytearray}} Type: bag
| |
| Project 1-15 Projections: [1] Overloaded: false FieldSchema: b: bytearray Type: bytearray
| Input: SplitOutput[B] 1-23
|
|---SplitOutput[B] 1-23 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
| |
| Const 1-22 FieldSchema: boolean Type: boolean
|
|---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
|
|---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
| |
| GreaterThan 1-12 FieldSchema: boolean Type: boolean
| |
| |---Const 1-11 FieldSchema: int Type: int
| |
| |---Cast 1-18 FieldSchema: int Type: int
| |
| |---Project 1-10 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray
| Input: Load 1-9
|
|---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
Thanks,
Iman.
__________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now at
http://ca.toolbar.yahoo.com.
Re: implicit splits in multiquery plans
Posted by Gunther Hagleitner <ha...@yahoo-inc.com>.
Hi,
I believe the split is in the right place. Both B and C need to have the
filter performed before they are stored. Also, the filter is only going to
be run once - load (1-9), filter (1-13) and split (1-19) are the same
operator in both paths. I've attached a graphical representation of the same
logical plan (which I think is easier to read).
If you wanted the filter to be performed only on the non-cogroup path, for
instance, the script would have to read:
A = load 'data' as (a, b, c);
B = filter A by a > 5;
store B into 'output1';
C = group A by b; -- Use pre-filter handle A, instead of B
store C into 'output2';
Thanks,
Gunther.
On 3/22/09 7:40 PM, "Iman Elghandour" <ie...@yahoo.com> wrote:
> Hello,
> I have just noticed that the implicit split is added in the wrong place in
> this plan. I am just examining the plan for the Pig script that is available
> in the jira issue: https://issues.apache.org/jira/browse/PIG-627
>
> A = load 'data' as (a, b, c);
> B = filter A by a > 5;
> store B into 'output1';
> C = group B by b;
> store C into 'output2';
>
> The plan logical plan is below. I think the split operator
> should be placed before the filter. And so the filter will
> be performed on only one branch not on both.
>
> Store 1-14 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: Unknown
> |
> |---SplitOutput[B] 1-21 Schema: {a: bytearray,b: bytearray,c: bytearray} Type:
> bag
> | |
> | Const 1-20 FieldSchema: boolean Type: boolean
> |
> |---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag
> |
> |---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray} Type:
> bag
> | |
> | GreaterThan 1-12 FieldSchema: boolean Type: boolean
> | |
> | |---Const 1-11 FieldSchema: int Type: int
> | |
> | |---Cast 1-18 FieldSchema: int Type: int
> | |
> | |---Project 1-10 Projections: [0] Overloaded: false
> FieldSchema: a: bytearray Type: bytearray
> | Input: Load 1-9
> |
> |---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray}
> Type: bag
>
> Store 1-17 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c:
> bytearray}} Type: Unknown
> |
> |---CoGroup 1-16 Schema: {group: bytearray,B: {a: bytearray,b: bytearray,c:
> bytearray}} Type: bag
> | |
> | Project 1-15 Projections: [1] Overloaded: false FieldSchema: b:
> bytearray Type: bytearray
> | Input: SplitOutput[B] 1-23
> |
> |---SplitOutput[B] 1-23 Schema: {a: bytearray,b: bytearray,c: bytearray}
> Type: bag
> | |
> | Const 1-22 FieldSchema: boolean Type: boolean
> |
> |---Split 1-19 Schema: {a: bytearray,b: bytearray,c: bytearray} Type:
> bag
> |
> |---Filter 1-13 Schema: {a: bytearray,b: bytearray,c: bytearray}
> Type: bag
> | |
> | GreaterThan 1-12 FieldSchema: boolean Type: boolean
> | |
> | |---Const 1-11 FieldSchema: int Type: int
> | |
> | |---Cast 1-18 FieldSchema: int Type: int
> | |
> | |---Project 1-10 Projections: [0] Overloaded: false
> FieldSchema: a: bytearray Type: bytearray
> | Input: Load 1-9
> |
> |---Load 1-9 Schema: {a: bytearray,b: bytearray,c: bytearray}
> Type: bag
>
> Thanks,
> Iman.
>
>
>
>
> __________________________________________________________________
> Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your
> favourite sites. Download it now at
> http://ca.toolbar.yahoo.com.