You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/07/18 19:24:31 UTC

[jira] Commented: (PIG-306) count with multiple group by keys fails

    [ https://issues.apache.org/jira/browse/PIG-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614805#action_12614805 ] 

Alan Gates commented on PIG-306:
--------------------------------

The issue is that the type of one of the projects is not being set correctly.  The logical plan for this query looks like:

{code}
Logical Plan:
ForEach gates-Fri Jul 18 10:11:47 PDT 2008-12 Schema: {name: (null),age: (null),long} Type: bag
|   |
|   Project gates-Fri Jul 18 10:11:47 PDT 2008-5 Projections: [0] Overloaded: false FieldSchema: name: tuple Type: tuple
|   Input: Project gates-Fri Jul 18 10:11:47 PDT 2008-4 Projections: [0] Overloaded: false|
|   |---Project gates-Fri Jul 18 10:11:47 PDT 2008-4 Projections: [0] Overloaded: false FieldSchema: group: tuple({bytearray,bytearray}) Type: tuple
|       Input: CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3
|   |
|   Project gates-Fri Jul 18 10:11:47 PDT 2008-7 Projections: [1] Overloaded: false FieldSchema: age: tuple Type: tuple
|   Input: Project gates-Fri Jul 18 10:11:47 PDT 2008-6 Projections: [0] Overloaded: false|
|   |---Project gates-Fri Jul 18 10:11:47 PDT 2008-6 Projections: [0] Overloaded: false FieldSchema: group: tuple({bytearray,bytearray}) Type: tuple
|       Input: CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3
|   |
|   UserFunc gates-Fri Jul 18 10:11:47 PDT 2008-10 function: org.apache.pig.builtin.COUNT FieldSchema: long Type: long
|   |
|   |---Project gates-Fri Jul 18 10:11:47 PDT 2008-9 Projections: [2] Overloaded: false FieldSchema: gpa: bag({gpa: bytearray}) Type: bag
|       Input: Project gates-Fri Jul 18 10:11:47 PDT 2008-8 Projections: [1] Overloaded: false|
|       |---Project gates-Fri Jul 18 10:11:47 PDT 2008-8 Projections: [1] Overloaded: false FieldSchema: a: bag({name: bytearray,age: bytearray,gpa: bytearray}) Type: bag
|           Input: CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3
|
|---CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3 Schema: {group: (bytearray,bytearray),a: {name: bytearray,age: bytearray,gpa: bytearray}} Type: bag
    |   |
    |   Project gates-Fri Jul 18 10:11:47 PDT 2008-1 Projections: [0] Overloaded: false FieldSchema: name: bytearray cn: 0 Type: bytearray
    |   Input: Load gates-Fri Jul 18 10:11:47 PDT 2008-
    |   |
    |   Project gates-Fri Jul 18 10:11:47 PDT 2008-2 Projections: [1] Overloaded: false FieldSchema: age: bytearray cn: 1 Type: bytearray
    |   Input: Load gates-Fri Jul 18 10:11:47 PDT 2008-
    |
    |---Load gates-Fri Jul 18 10:11:47 PDT 2008-0 Schema: {name: bytearray,age: bytearray,gpa: bytearray} Type: bag
{code}

Projects "Project gates-Fri Jul 18 10:11:47 PDT 2008-5" and "Project gates-Fri Jul 18 10:11:47 PDT 2008-7" should have type bytearray, not type tuple.

> count with multiple group by keys fails
> ---------------------------------------
>
>                 Key: PIG-306
>                 URL: https://issues.apache.org/jira/browse/PIG-306
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>             Fix For: types_branch
>
>
> The query:
> {code}
> a = load 'myfile' as (name, age, gpa);                                                                        
> b = group a by (name, age);                                                                                                                                     
> c = foreach b generate group.name, group.age, COUNT(a.gpa);                                                                                                     
> store c into 'outfile';
> {code}
> generates
> 07-12 16:55:54,348 [main] ERROR org.apache.pig.impl.mapReduceLayer.Launcher - Error message from task (reduce) tip_200807090821_0580_r_000000 java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple
> at org.apache.pig.impl.physicalLayer.expressionOperators.POProject.getNext(POProject.java:262)
> at org.apache.pig.impl.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:189)
> at org.apache.pig.impl.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:148)
> at org.apache.pig.impl.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:164)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:333)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.