You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Kyungho Jeon (JIRA)" <ji...@apache.org> on 2014/03/13 19:35:44 UTC

[jira] [Commented] (PIG-3809) AddForEach optimization doesn't set the alias of the added foreach

    [ https://issues.apache.org/jira/browse/PIG-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933844#comment-13933844 ] 

Kyungho Jeon commented on PIG-3809:
-----------------------------------

I became aware of this a few weeks ago, but didn't know it was a bug. :) 

> AddForEach optimization doesn't set the alias of the added foreach
> ------------------------------------------------------------------
>
>                 Key: PIG-3809
>                 URL: https://issues.apache.org/jira/browse/PIG-3809
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.13.0
>
>         Attachments: PIG-3809-1.patch
>
>
> AddForEach inserts a foreach operator into the plan, but it doesn't set the alias of added foreach. This is usually okay, but if the foreach is followed by a join, the missing alias confuses Pig.
> For eg, consider the following query (dummy example to demonstrate the issue)-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> b = LOAD 'bar' AS (i, j, k);
> c = JOIN a BY x, b BY i;
> d = FOREACH c GENERATE a::x, b::i;
> DUMP d;
> {code}
> Without AddForEach optimization, the output schema of 'c' will be as follows-
> {code}
> a::x, a::y, a::z, b::i, b::j, b::k
> {code}
> But since 'a::y', 'a::z', 'b::j', and 'b::k' are not used in 'd', a foreach operator will be inserted after a and b. That is-
> {code}
> a = LOAD 'foo' AS (x, y, z);
> ? = FOREACH a GENERATE x; -- no alias is set
> b = LOAD 'bar' AS (i, j, k);
> ? = FOREACH a GENERATE i; -- no alias is set
> c = JOIN ? BY x, ? BY i;
> d = FOREACH c GENERATE ?::x, ?::i;
> DUMP d;
> {code}
> But due to missing aliases of these added foreach operators, the output schema of join is messed up. In fact, they show up as null, so printing the output schema of join gives 'null::x, null::i'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)