You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/04/07 03:15:06 UTC
[jira] [Created] (PIG-1970) Relax the condition to find
output/input dependency
Relax the condition to find output/input dependency
---------------------------------------------------
Key: PIG-1970
URL: https://issues.apache.org/jira/browse/PIG-1970
Project: Pig
Issue Type: Improvement
Reporter: Daniel Dai
Priority: Minor
Pig will create an output/input dependency if the output generated by Pig script feeding to a load statement. So that Pig will not launch two jobs simultaneously (which will result a input file not exist error). For example:
{code}
STORE A INTO '/user/myname/myoutputfolder';
D = LOAD '/user/myname/myoutputfolder';
{code}
Load will be in a map-reduce job after /user/myname/myoutputfolder is generated.
However, currently we only do exact match. If we load part of the data, we cannot figure out the dependency. Eg:
{code}
STORE A INTO '/user/myname/myoutputfolder';
D = LOAD '/user/myname/myoutputfolder/part*' ;
{code}
We should be more intelligent to find this dependency.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira