You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Ankur (JIRA)" <ji...@apache.org> on 2009/11/25 06:52:39 UTC

[jira] Created: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Incorrect map output key type in MultiQuery optimization
--------------------------------------------------------

                 Key: PIG-1108
                 URL: https://issues.apache.org/jira/browse/PIG-1108
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.6.0
            Reporter: Ankur


When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-

java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:159)

Here is a small script to be used a reproducible test case

rmf plan1
rmf plan2
A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
B = GROUP plan1 BY b;
C = FOREACH B {
              tmp = ORDER plan1 BY a desc;
              GENERATE FLATTEN(group) as b, tmp;
              };
D = FILTER C BY b is not null;
STORE D into 'plan1';
STORE plan2 into 'plan2';


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782636#action_12782636 ] 

Olga Natkovich commented on PIG-1108:
-------------------------------------

How much work would it be to fix this?

Also, were you able to verify that disabling MQ works on 0.6.0 branch?

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783876#action_12783876 ] 

Hadoop QA commented on PIG-1108:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426434/PIG-1108.patch
  against trunk revision 885465.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/67/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/67/console

This message is automatically generated.

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>         Attachments: PIG-1108.patch
>
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782575#action_12782575 ] 

Richard Ding commented on PIG-1108:
-----------------------------------

The problme is that the secondary key optimization doesn't work well with multiquery optimization where one splittee has secondary key and the other splittee has no secondary key.

The work around for now is to disable either multiquery optimization (with -M) or secondary key optimization (with -Dpig.exec.nosecondarykey=true).

To fix this, we can modify the multiquery optimizer (since now secondary key optimizer runs before the multiquery optimizer) to take into account the exsitence of secondary key optimization.

 

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1108:
------------------------------

    Status: Patch Available  (was: Open)

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>         Attachments: PIG-1108.patch
>
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1108:
------------------------------

    Attachment: PIG-1108.patch

With this patch, the multiquery optimizer doesn't merge MR jobs that use secondary key. 

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>         Attachments: PIG-1108.patch
>
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Ankur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782787#action_12782787 ] 

Ankur commented on PIG-1108:
----------------------------

In my test run on 0.6.0 branch, disabling MQ did not work. Pig client logs showed that MQ was still kicking in and the mappers failed with the same error message as in description. It will be good if we can add few points about "SecondaryKey" here - http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1108:
--------------------------------

    Fix Version/s: 0.6.0

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782651#action_12782651 ] 

Richard Ding commented on PIG-1108:
-----------------------------------

Either disableing MQ optimization or disabling second key optimization works on 0.6.0 branch.

In the short term, the proposed solution is for MQ optimizer to not merge any MR job that is annotated with "UseSecondaryKey".  We need further investigation to see if there's any performance advantages for MQ optimizer to merge MR jobs with "UseSecondaryKey" annotation.   

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784378#action_12784378 ] 

Olga Natkovich commented on PIG-1108:
-------------------------------------

+1. I will be committing this patch now to both trunk and branch

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>         Attachments: PIG-1108.patch
>
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1108:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

patch committed to trunk and branch 0.6.0. Thanks, Richard!

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>         Attachments: PIG-1108.patch
>
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding reassigned PIG-1108:
---------------------------------

    Assignee: Richard Ding

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary, PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.