You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Santhosh Srinivasan (JIRA)" <ji...@apache.org> on 2008/06/17 01:33:44 UTC
[jira] Issue Comment Edited: (PIG-161) Rework physical plan
[ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605456#action_12605456 ]
sms edited comment on PIG-161 at 6/16/08 4:32 PM:
------------------------------------------------------------------
Consider the following example, a modification of Case 2 in the previous comment:
{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
C1 = distinct $1;
generate group + SUM(C1);
};
{code}
Top level plan:
load -> group -> foreach
The foreach will have a nested plan:
plan 1: project(1) -> distinct -> accumulate
The accumulate will have a nested plan of:
{format}
project( * )
\
SUM()
/
project(group)
{format}
The accumulate operator requires two inputs:
1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM
With the proposed changes, accumulate will not be able to receive inputs from both foreach and distinct. In order to solve this problem, accumulate has to be made a proxy root by attaching the input from foreach to accumulate. The second input from distinct will be retrieved using getNext()
In addition to the changes proposed in the previous comment, the following changes have to be made:
1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to accumulate in addition to all the roots in the nested plans of foreach
was (Author: sms):
Consider the following example, a modification of Case 2 in the previous comment:
{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
C1 = distinct $1;
generate group + SUM(C1);
};
{code}
Top level plan:
load -> group -> foreach
The foreach will have a nested plan:
plan 1: project(1) -> distinct -> accumulate
The accumulate will have a nested plan of:
{noformat}
project( * )
\
COUNT()
/
project(group)
{noformat}
The accumulate operator requires two inputs:
1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM
With the proposed changes, accumulate will not be able to receive inputs from both foreach and distinct. In order to solve this problem, accumulate has to be made a proxy root by attaching the input from foreach to accumulate. The second input from distinct will be retrieved using getNext()
In addition to the changes proposed in the previous comment, the following changes have to be made:
1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to accumulate in addition to all the roots in the nested plans of foreach
> Rework physical plan
> --------------------
>
> Key: PIG-161
> URL: https://issues.apache.org/jira/browse/PIG-161
> Project: Pig
> Issue Type: Sub-task
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: arithmeticOperators.patch, BinCondAndNegative.patch, CastAndMapLookUp.patch, incr2.patch, incr3.patch, incr4.patch, incr5.patch, logToPhyTranslator.patch, missingOps.patch, MRCompilerTests_PlansAndOutputs.txt, Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch, physicalOps.patch, physicalOps.patch, physicalOps_latest.patch, POCast.patch, POCast.patch, podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch, posort.patch, POUserFuncCorrection.patch, TEST-org.apache.pig.test.TestLocalJobSubmission.txt, TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, TEST-org.apache.pig.test.TestMapReduce.txt, TEST-org.apache.pig.test.TestTypeCheckingValidator.txt, TEST-org.apache.pig.test.TestUnion.txt, translator.patch, translator.patch, translator.patch, translator.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.