You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/02/10 21:30:13 UTC

[jira] [Resolved] (PIG-1734) Pig needs a more efficient DAG execution

     [ https://issues.apache.org/jira/browse/PIG-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy resolved PIG-1734.
-------------------------------------
    Resolution: Duplicate

Closing this jira as PIG-3444 (Pig on Tez) and Pig on Spark (PIG-4059) address this problem.

> Pig needs a more efficient DAG execution
> ----------------------------------------
>
>                 Key: PIG-1734
>                 URL: https://issues.apache.org/jira/browse/PIG-1734
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> The current code uses Hadoop's Job control to execute one stage at a time. The first stage includes all jobs with no dependencies, the second stage jobs that depend only on jobs completed in the first stage, the third stage contains the jobs that depend on jobs from stage 1 and 2, etc.
> The problem with this simplistic approach is that each next stages only starts when the previous stage is over which means means that some branches of the DAG are unnecessarily blocked.
> We would need to do our own DAG management to solve this issue which would be a pretty significant undertaking. Something we should look at in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)