You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/04/07 03:15:06 UTC

[jira] [Created] (PIG-1970) Relax the condition to find output/input dependency

Relax the condition to find output/input dependency
---------------------------------------------------

                 Key: PIG-1970
                 URL: https://issues.apache.org/jira/browse/PIG-1970
             Project: Pig
          Issue Type: Improvement
            Reporter: Daniel Dai
            Priority: Minor


Pig will create an output/input dependency if the output generated by Pig script feeding to a load statement. So that Pig will not launch two jobs simultaneously (which will result a input file not exist error). For example:
{code}
STORE A INTO '/user/myname/myoutputfolder';
D = LOAD '/user/myname/myoutputfolder';
{code}
Load will be in a map-reduce job after /user/myname/myoutputfolder is generated.

However, currently we only do exact match. If we load part of the data, we cannot figure out the dependency. Eg:
{code}
STORE A INTO '/user/myname/myoutputfolder';
D = LOAD '/user/myname/myoutputfolder/part*' ;
{code}

We should be more intelligent to find this dependency. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira