You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Anitha Raju (JIRA)" <ji...@apache.org> on 2011/01/05 06:52:45 UTC

[jira] Created: (PIG-1787) Error in logical plan generated

Error in logical plan generated
-------------------------------

                 Key: PIG-1787
                 URL: https://issues.apache.org/jira/browse/PIG-1787
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.0
            Reporter: Anitha Raju


Here is a sample pig script:

set default_parallel 2
ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
C1 = filter ALLDATA by (type == 'p' and
                   (spaceid == '1196250013'
                    or spaceid == '1196250024'
                    or spaceid == '1196250011'));
C2 = group C1 by pcid;
C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
C4 = order C3 by tot desc;
C5 = limit C4 3;
C6 = join C5 by pc_id, C1 by pcid;
dump C6;


sample.txt:
1       1196250013      p       1234
2       1196250024      p       2314
3       1196250011      t       1111
4       1111111111      p       1231
5       1196250013      p       1254
6       1196250024      p       9007


This fails with the error 
java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
org.apache.pig.impl.io.NullableBytesWritable
when both pc_id and pcid are of type bytearray.

The script seems to work when 
	a) replicated join is substituted in the place of the regular join 
	b) pcid is cast to long in the loader 
	c) doing a dump of any statement before C6
	d) setting default_parallel to 1 or removing it.
	
One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1787) Error in logical plan generated

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979897#action_12979897 ] 

Richard Ding commented on PIG-1787:
-----------------------------------

+1.

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>         Attachments: PIG-1787-1.patch, PIG-1787-2.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1787) Error in logical plan generated

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1787:
-----------------------------------

    Assignee: Daniel Dai

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1787) Error in logical plan generated

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978554#action_12978554 ] 

Daniel Dai commented on PIG-1787:
---------------------------------

Simplified test case:
{code}
a = load '1.txt' as (a0, a1);
b = group a by a0;
c = foreach b generate group as c0, COUNT(a) as c1;
d = order c by c1 parallel 2;
e = limit d 10;
f = join e by c0, a by a0;
dump f;
{code}
1.txt:
1       1
1       2

Error message:
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Long
        at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:84)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:113)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:262)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:255)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>         Attachments: PIG-1787-1.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1787) Error in logical plan generated

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979685#action_12979685 ] 

Daniel Dai commented on PIG-1787:
---------------------------------

Review request: https://reviews.apache.org/r/265/

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>         Attachments: PIG-1787-1.patch, PIG-1787-2.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1787) Error in logical plan generated

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1787:
----------------------------

    Attachment: PIG-1787-1.patch

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>         Attachments: PIG-1787-1.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1787) Error in logical plan generated

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979854#action_12979854 ] 

Daniel Dai commented on PIG-1787:
---------------------------------

Note the test case only works in mapreduce mode. In local mode, parallel 2 is not grantted. 

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>         Attachments: PIG-1787-1.patch, PIG-1787-2.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1787) Error in logical plan generated

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1787:
----------------------------

    Attachment: PIG-1787-2.patch

PIG-1787-2.patch fix unit test failures.

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>         Attachments: PIG-1787-1.patch, PIG-1787-2.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-1787) Error in logical plan generated

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-1787.
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Patch committed to both trunk and 0.8 branch.

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>         Attachments: PIG-1787-1.patch, PIG-1787-2.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1787) Error in logical plan generated

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1787:
----------------------------

    Fix Version/s: 0.8.0

> Error in logical plan generated
> -------------------------------
>
>                 Key: PIG-1787
>                 URL: https://issues.apache.org/jira/browse/PIG-1787
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Anitha Raju
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1787-1.patch, PIG-1787-2.patch
>
>
> Here is a sample pig script:
> set default_parallel 2
> ALLDATA = load 'sample.txt' using PigStorage() as (id, spaceid, type, pcid);
> C1 = filter ALLDATA by (type == 'p' and
>                    (spaceid == '1196250013'
>                     or spaceid == '1196250024'
>                     or spaceid == '1196250011'));
> C2 = group C1 by pcid;
> C3 = foreach C2 generate flatten(group) as (pc_id), COUNT(C1) as tot;
> C4 = order C3 by tot desc;
> C5 = limit C4 3;
> C6 = join C5 by pc_id, C1 by pcid;
> dump C6;
> sample.txt:
> 1       1196250013      p       1234
> 2       1196250024      p       2314
> 3       1196250011      t       1111
> 4       1111111111      p       1231
> 5       1196250013      p       1254
> 6       1196250024      p       9007
> This fails with the error 
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableLongWritable, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> when both pc_id and pcid are of type bytearray.
> The script seems to work when 
> 	a) replicated join is substituted in the place of the regular join 
> 	b) pcid is cast to long in the loader 
> 	c) doing a dump of any statement before C6
> 	d) setting default_parallel to 1 or removing it.
> 	
> One possible cause seems to be with the logical plan generation during the projection operation in C4 as can be observed from the describe statement. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.