You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/07/18 19:24:31 UTC
[jira] Commented: (PIG-306) count with multiple group by keys fails
[ https://issues.apache.org/jira/browse/PIG-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614805#action_12614805 ]
Alan Gates commented on PIG-306:
--------------------------------
The issue is that the type of one of the projects is not being set correctly. The logical plan for this query looks like:
{code}
Logical Plan:
ForEach gates-Fri Jul 18 10:11:47 PDT 2008-12 Schema: {name: (null),age: (null),long} Type: bag
| |
| Project gates-Fri Jul 18 10:11:47 PDT 2008-5 Projections: [0] Overloaded: false FieldSchema: name: tuple Type: tuple
| Input: Project gates-Fri Jul 18 10:11:47 PDT 2008-4 Projections: [0] Overloaded: false|
| |---Project gates-Fri Jul 18 10:11:47 PDT 2008-4 Projections: [0] Overloaded: false FieldSchema: group: tuple({bytearray,bytearray}) Type: tuple
| Input: CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3
| |
| Project gates-Fri Jul 18 10:11:47 PDT 2008-7 Projections: [1] Overloaded: false FieldSchema: age: tuple Type: tuple
| Input: Project gates-Fri Jul 18 10:11:47 PDT 2008-6 Projections: [0] Overloaded: false|
| |---Project gates-Fri Jul 18 10:11:47 PDT 2008-6 Projections: [0] Overloaded: false FieldSchema: group: tuple({bytearray,bytearray}) Type: tuple
| Input: CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3
| |
| UserFunc gates-Fri Jul 18 10:11:47 PDT 2008-10 function: org.apache.pig.builtin.COUNT FieldSchema: long Type: long
| |
| |---Project gates-Fri Jul 18 10:11:47 PDT 2008-9 Projections: [2] Overloaded: false FieldSchema: gpa: bag({gpa: bytearray}) Type: bag
| Input: Project gates-Fri Jul 18 10:11:47 PDT 2008-8 Projections: [1] Overloaded: false|
| |---Project gates-Fri Jul 18 10:11:47 PDT 2008-8 Projections: [1] Overloaded: false FieldSchema: a: bag({name: bytearray,age: bytearray,gpa: bytearray}) Type: bag
| Input: CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3
|
|---CoGroup gates-Fri Jul 18 10:11:47 PDT 2008-3 Schema: {group: (bytearray,bytearray),a: {name: bytearray,age: bytearray,gpa: bytearray}} Type: bag
| |
| Project gates-Fri Jul 18 10:11:47 PDT 2008-1 Projections: [0] Overloaded: false FieldSchema: name: bytearray cn: 0 Type: bytearray
| Input: Load gates-Fri Jul 18 10:11:47 PDT 2008-
| |
| Project gates-Fri Jul 18 10:11:47 PDT 2008-2 Projections: [1] Overloaded: false FieldSchema: age: bytearray cn: 1 Type: bytearray
| Input: Load gates-Fri Jul 18 10:11:47 PDT 2008-
|
|---Load gates-Fri Jul 18 10:11:47 PDT 2008-0 Schema: {name: bytearray,age: bytearray,gpa: bytearray} Type: bag
{code}
Projects "Project gates-Fri Jul 18 10:11:47 PDT 2008-5" and "Project gates-Fri Jul 18 10:11:47 PDT 2008-7" should have type bytearray, not type tuple.
> count with multiple group by keys fails
> ---------------------------------------
>
> Key: PIG-306
> URL: https://issues.apache.org/jira/browse/PIG-306
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Alan Gates
> Fix For: types_branch
>
>
> The query:
> {code}
> a = load 'myfile' as (name, age, gpa);
> b = group a by (name, age);
> c = foreach b generate group.name, group.age, COUNT(a.gpa);
> store c into 'outfile';
> {code}
> generates
> 07-12 16:55:54,348 [main] ERROR org.apache.pig.impl.mapReduceLayer.Launcher - Error message from task (reduce) tip_200807090821_0580_r_000000 java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple
> at org.apache.pig.impl.physicalLayer.expressionOperators.POProject.getNext(POProject.java:262)
> at org.apache.pig.impl.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:189)
> at org.apache.pig.impl.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:148)
> at org.apache.pig.impl.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:164)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:333)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.