You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2010/07/01 02:13:50 UTC

[jira] Commented: (PIG-1321) Logical Optimizer: Merge cascading foreach

    [ https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884105#action_12884105 ] 

Xuefu Zhang commented on PIG-1321:
----------------------------------

Here is the scope of this type of optimization:

Pre-condition: 
1. two consecutive foreach statements.
2. the second foreach statement is a simple inner plan in which the ognly statement is a GENERATE statement. In other words, the second foreach statement must be something like "FOREACH A GENERATE ...."

Optimization result:
The two foreach statement will be merged to one. The new foreach statement keeps the first old foreach statement's inner plan with the new expressions for the GENERATE statement. These new expressions are generated based on those in the second foreach generate statement, combined with those in the first foreach generate statement. For instance, suppose we have the following pig script:

A = load 'file.txt' as (a, b, c);
B = foreach A generate a+b as u, c-b as v;
C = foreach B generate $0+5, v;
dump C;

The optimized plan after merge-foreach optimization will be equivalent to the following pig script

A = load 'file.txt' as (a, b, c);
C = foreach A generate a+b+5, c-b;
dump C;

Of course, first foreach can have any complex inner plan, which remains the same in the new foreach statement.

Patch for this optimization is coming soon...

> Logical Optimizer: Merge cascading foreach
> ------------------------------------------
>
>                 Key: PIG-1321
>                 URL: https://issues.apache.org/jira/browse/PIG-1321
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Xuefu Zhang
>
> We can merge consecutive foreach statement.
> Eg:
> b = foreach a generate a0#'key1' as b0, a0#'key2' as b1, a1;
> c = foreach b generate b0#'kk1', b0#'kk2', b1, a1;
> => c = foreach a generate a0#'key1'#'kk1', a0#'key1'#'kk2', a0#'key2', a1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.