You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Santhosh Srinivasan (JIRA)" <ji...@apache.org> on 2008/06/17 01:33:44 UTC

[jira] Issue Comment Edited: (PIG-161) Rework physical plan

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605456#action_12605456 ] 

sms edited comment on PIG-161 at 6/16/08 4:32 PM:
------------------------------------------------------------------

Consider the following example, a modification of Case 2 in the previous comment:

{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    generate group + SUM(C1);
};
{code}

Top level plan:

load -> group -> foreach

The foreach will have a nested plan:

plan 1: project(1) -> distinct -> accumulate

The accumulate will have a nested plan of: 

{format}
     project( * )
                        \
                        SUM()
                         / 
project(group)
{format}

The accumulate operator requires two inputs:

1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM

With the proposed changes, accumulate will not be able to receive inputs from both foreach and distinct. In order to solve this problem, accumulate has to be made a proxy root by attaching the input from foreach to accumulate. The second input from distinct will be retrieved using getNext()

In addition to the changes proposed in the previous comment, the following changes have to be made:

1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to accumulate in addition to all the roots in the nested plans of foreach

      was (Author: sms):
    Consider the following example, a modification of Case 2 in the previous comment:

{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    generate group + SUM(C1);
};
{code}

Top level plan:

load -> group -> foreach

The foreach will have a nested plan:

plan 1: project(1) -> distinct -> accumulate

The accumulate will have a nested plan of: 

{noformat}
     project( * )
                        \
                        COUNT()
                         / 
project(group)
{noformat}

The accumulate operator requires two inputs:

1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM

With the proposed changes, accumulate will not be able to receive inputs from both foreach and distinct. In order to solve this problem, accumulate has to be made a proxy root by attaching the input from foreach to accumulate. The second input from distinct will be retrieved using getNext()

In addition to the changes proposed in the previous comment, the following changes have to be made:

1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to accumulate in addition to all the roots in the nested plans of foreach
  
> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, BinCondAndNegative.patch, CastAndMapLookUp.patch, incr2.patch, incr3.patch, incr4.patch, incr5.patch, logToPhyTranslator.patch, missingOps.patch, MRCompilerTests_PlansAndOutputs.txt, Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch, physicalOps.patch, physicalOps.patch, physicalOps_latest.patch, POCast.patch, POCast.patch, podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch, posort.patch, POUserFuncCorrection.patch, TEST-org.apache.pig.test.TestLocalJobSubmission.txt, TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, TEST-org.apache.pig.test.TestMapReduce.txt, TEST-org.apache.pig.test.TestTypeCheckingValidator.txt, TEST-org.apache.pig.test.TestUnion.txt, translator.patch, translator.patch, translator.patch, translator.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.