You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ankur (JIRA)" <ji...@apache.org> on 2009/11/30 07:43:20 UTC

[jira] Created: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

MultiQuery optimization throws error when merging 2 level splits
----------------------------------------------------------------

                 Key: PIG-1114
                 URL: https://issues.apache.org/jira/browse/PIG-1114
             Project: Pig
          Issue Type: Bug
            Reporter: Ankur
            Priority: Critical


Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error

data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);

ids = FOREACH data GENERATE id;
allId = GROUP ids all;
allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
idGroup = GROUP ids by id;
idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
countTotal = cross idGroupCount, allIdCount;
idCountTotal = foreach countTotal generate
        id,
        count,
        total,
        (double)count / (double)total as proportion;
orderedCounts = order idCountTotal by count desc;
STORE orderedCounts INTO 'mq_problem/ids';

names = FOREACH data GENERATE name;
allNames = GROUP names all;
allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
nameGroup = GROUP names by name;
nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
namesCrossed = cross nameGroupCount, allNamesCount;
nameCountTotal = foreach namesCrossed generate
        name,
        count,
        total,
        (double)count / (double)total as proportion;
nameCountsOrdered = order nameCountTotal by count desc;
STORE nameCountsOrdered INTO 'mq_problem/names';




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783991#action_12783991 ] 

Richard Ding commented on PIG-1114:
-----------------------------------

The reason we got this exception is that the MultiQuery optimizer doesn't recursively set data type in local rearrange operators (it only sets on the first level). This is required in the case where merged jobs don't have the same map key types.

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783553#action_12783553 ] 

Ankur commented on PIG-1114:
----------------------------

The error thrown is 

java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableTuple, recieved org.apache.pig.impl.io.NullableText
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:159)



> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Priority: Critical
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1114:
------------------------------

    Status: Patch Available  (was: Open)

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: PIG-1114.patch, Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784600#action_12784600 ] 

Hadoop QA commented on PIG-1114:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426576/PIG-1114.patch
  against trunk revision 885953.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/72/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/72/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/72/console

This message is automatically generated.

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: PIG-1114.patch, Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784070#action_12784070 ] 

Ankur commented on PIG-1114:
----------------------------

Richard,
             I ran the above script again with -M  option to confirm that Multiquery was not disabled, instead it worked on 2 separated parts of the script. I am attaching the pig client logs from the run for your reference.

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785115#action_12785115 ] 

Olga Natkovich commented on PIG-1114:
-------------------------------------

+1 on the changes. will be committing now to trunk and 0.6 branch.

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: PIG-1114.patch, Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784003#action_12784003 ] 

Richard Ding commented on PIG-1114:
-----------------------------------

Ankur,

Can you look into the log file to make sure that the MQ is not disableed when you use -M option? MultiQuery optimizer always log in info level when it's applied. 

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1114:
-----------------------------------

    Assignee: Richard Ding

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1114:
-----------------------

    Attachment: Pig_1114_Client.log

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1114:
------------------------------

    Attachment: PIG-1114.patch

This patch fixed the problem.

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: PIG-1114.patch, Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1114:
-----------------------

    Fix Version/s: 0.6.0

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Priority: Critical
>             Fix For: 0.6.0
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1114:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

patch committed to trunk and 0.6 branch. Thanks, Richard!

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: PIG-1114.patch, Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784397#action_12784397 ] 

Richard Ding commented on PIG-1114:
-----------------------------------

Ankur, I just checked and the latest pig 0.6 jar doesn't disable MQ optimization completely. The problem was fixed as part of PIG-1060 and the fixed has been checked in to both trunk and 0.6 branch.  

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: PIG-1114.patch, Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783554#action_12783554 ] 

Ankur commented on PIG-1114:
----------------------------

The same script works with -M (multi-query disabled) option, BUT surprisingly the run indicates that now multi-query optimization being applied separately to the first STORE and the second STORE. This is just a workaround but it also indicates that in cases like this, disabling multi-query actually DOES NOT disable it completely instead just makes it run on parts of the script.

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Priority: Critical
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.