You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by Swati Jain <sw...@aggiemail.usu.edu> on 2010/07/30 22:34:42 UTC

Group By with multiple key is not working with new 'Logical Optimizer'

Hello,

I have tried below mentioned script when new optimizer is enabled. I am
wondering, is this possible to have filter condition on any column of the
"group" (a tuple) column generated by Group By clause with multiple key.

The following script does a GroupBy multiple columns:

A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
G1 = GROUP A by (a1,a2);
D = Filter G1 by group.$0 > 1;
explain D;

The above fails with the following error when the new optimizer is enabled (
*it fails with the old framework too but only when it gets to the execution
stage*):

Caused by: java.lang.NullPointerException
       at org.apache.pig.experimental.
logical.LogicalPlanMigrationVistor$LogicalExpPlanMigrationVistor.visit(LogicalPlanMigrationVistor.java:424)
       at
org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:404)
       at
org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58)
       at
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
       at
org.apache.pig.experimental.logical.LogicalPlanMigrationVistor.translateExpressionPlan(LogicalPlanMigrationVistor.java:155)
       at
org.apache.pig.experimental.logical.LogicalPlanMigrationVistor.visit(LogicalPlanMigrationVistor.java:295)
       at org.apache.pig.impl.logicalLayer.LOFilter.visit(LOFilter.java:116)
       at org.apache.pig.impl.logicalLayer.LOFilter.visit(LOFilter.java:41)
       at
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
       at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
       at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:237)

I really appreciate if someone put some light on this.

Thanks,
Swati