You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2014/02/27 18:15:23 UTC
[jira] [Commented] (PIG-3782) PushDownForEachFlatten +
ColumnMapKeyPrune with user defined schema failing due to incorrect UID
assignment
[ https://issues.apache.org/jira/browse/PIG-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914749#comment-13914749 ]
Koji Noguchi commented on PIG-3782:
-----------------------------------
This error is happening when PushDownForEachFlatten inserts a FOREACH after 'd=join' to move the flatten after the join for optimization. Somehow, this new foreach is containing completely new UIDs for q1 and q2.
You can see below that new foreach has q1#25 and q2#26 instead of q1#13 and q2#14 that are later used.
This breaks the linage tracking of ColumnMapKeyPrune.
BEFORE PushDownForEachFlatten
{noformat}
|---e: (Name: LOForEach Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
| |
| (Name: LOGenerate[false,false,false] Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
| | |
| | c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*))
| | |
| | c::q1:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: (*))
| | |
| | c::q2:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: (*))
| |
| |---(Name: LOInnerLoad[0] Schema: c::a0#1:int)
| |
| |---(Name: LOInnerLoad[1] Schema: c::q1#13:bytearray)
| |
| |---(Name: LOInnerLoad[2] Schema: c::q2#14:bytearray)
|
|---d: (Name: LOJoin(HASH) Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray,b::b0#7:int,b::b1#8:bytearray)
| |
| a0:(Name: Project Type: int Uid: 1 Input: 0 Column: 0)
| |
| b0:(Name: Project Type: int Uid: 7 Input: 1 Column: 0)
|
|---c: (Name: LOForEach Schema: a0#1:int,q1#13:bytearray,q2#14:bytearray)
{noformat}
After PushDownForEachFlatten
{noformat}
|---e: (Name: LOForEach Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
| |
| (Name: LOGenerate[false,false,false] Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
| | |
| | c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*))
| | |
| | c::q1:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: (*))
| | |
| | c::q2:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: (*))
| |
| |---(Name: LOInnerLoad[0] Schema: c::a0#1:int)
| |
| |---(Name: LOInnerLoad[1] Schema: c::q1#13:bytearray)
| |
| |---(Name: LOInnerLoad[2] Schema: c::q2#14:bytearray)
|
|---d: (Name: LOForEach Schema: c::a0#1:int,q1#25:bytearray,q2#26:bytearray,b::b0#7:int,b::b1#8:bytearray)
| |
| (Name: LOGenerate[false,true,false,false] Schema: c::a0#1:int,q1#25:bytearray,q2#26:bytearray,b::b0#7:int,b::b1#8:bytearray)
| | |
| | c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*))
| | |
| | c::a2:(Name: Project Type: bag Uid: 3 Input: 1 Column: (*))
| | |
| | b::b0:(Name: Project Type: int Uid: 7 Input: 2 Column: (*))
| | |
| | b::b1:(Name: Project Type: bytearray Uid: 8 Input: 3 Column: (*))
| |
| |---(Name: LOInnerLoad[0] Schema: c::a0#1:int)
| |
| |---c::a2: (Name: LOInnerLoad[1] Schema: null)
| |
| |---(Name: LOInnerLoad[2] Schema: b::b0#7:int)
| |
| |---(Name: LOInnerLoad[3] Schema: b::b1#8:bytearray)
|
|---d: (Name: LOJoin(HASH) Schema: c::a0#1:int,c::a2#3:bag{#4:tuple()},b::b0#7:int,b::b1#8:bytearray)
| |
| a0:(Name: Project Type: int Uid: 1 Input: 0 Column: 0)
| |
| b0:(Name: Project Type: int Uid: 7 Input: 1 Column: 0)
|
|---c: (Name: LOForEach Schema: a0#1:int,a2#3:bag{#4:tuple()})
{noformat}
> PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment
> -----------------------------------------------------------------------------------------------------------
>
> Key: PIG-3782
> URL: https://issues.apache.org/jira/browse/PIG-3782
> Project: Pig
> Issue Type: Bug
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
>
> {noformat}
> a = load '1.txt' as (a0:int, a1, a2:bag{});
> b = load '2.txt' as (b0:int, b1);
> c = foreach a generate a0, flatten(a2) as (q1, q2);
> d = join c by a0, b by b0;
> e = foreach d generate a0, q1, q2;
> f = foreach e generate a0, (int)q1, (int)q2;
> store f into 'output';
> {noformat}
> This pig script fails with
> 2014-02-27 11:49:45,657 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 13 Input: 0 Column: 1)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)