You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2013/12/16 00:18:07 UTC

[jira] [Commented] (PIG-3620) TezCompiler adds duplicate predecessors of blocking operators to TezPlan

    [ https://issues.apache.org/jira/browse/PIG-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848716#comment-13848716 ] 

Cheolsoo Park commented on PIG-3620:
------------------------------------

+1.

> TezCompiler adds duplicate predecessors of blocking operators to TezPlan
> ------------------------------------------------------------------------
>
>                 Key: PIG-3620
>                 URL: https://issues.apache.org/jira/browse/PIG-3620
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Rohini Palaniswamy
>             Fix For: tez-branch
>
>         Attachments: PIG-3620-1.patch
>
>
> Here is a simplest example that reproduces the issue-
> {code:title=test.pig}
> a = LOAD 'foo' AS (x:int, y:chararray);
> b = GROUP a BY x;
> c = FOREACH b GENERATE a.x;
> STORE c INTO 'c';
> d = FOREACH b GENERATE a.y;
> STORE d INTO 'd';
> {code}
> If you run {{pig \-x tex_local \-e 'explain \-script test.pig'}}, you will see two vertices that contains the following sub-plan- 
> {code}
> Tez vertex scope-27
> # Plan on vertex
> b: Local Rearrange[tuple]{int}(false) - scope-10
> |   |
> |   Project[int][0] - scope-11
> |
> |---a: New For Each(false,false)[bag] - scope-7
>     |   |
>     |   Cast[int] - scope-2
>     |   |
>     |   |---Project[bytearray][0] - scope-1
>     |   |
>     |   Cast[chararray] - scope-5
>     |   |
>     |   |---Project[bytearray][1] - scope-4
>     |
>     |---a: Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage) - scope-0
> {code}
> What's happening is that since there are 2 stores (and thus 2 data flows, i.e. a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them into a single tez plan but adds the same sub-plan twice.
> This is an issue with any blocking operators (join, union, etc) followed by split.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)