You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ankur (JIRA)" <ji...@apache.org> on 2009/12/01 06:45:20 UTC

[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

     [ https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1114:
-----------------------

    Attachment: Pig_1114_Client.log

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.