You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2014/05/11 00:08:55 UTC
[jira] [Commented] (CRUNCH-390) Planner is not adding dependencies
between jobs when planning is done in more than one stage.
[ https://issues.apache.org/jira/browse/CRUNCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992486#comment-13992486 ]
Josh Wills commented on CRUNCH-390:
-----------------------------------
[~cmarius] good looking patch, thank you so much! I'm running it through integration tests now and will commit it when it passes.
> Planner is not adding dependencies between jobs when planning is done in more than one stage.
> ---------------------------------------------------------------------------------------------
>
> Key: CRUNCH-390
> URL: https://issues.apache.org/jira/browse/CRUNCH-390
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.2
> Reporter: Ioan Marius Curelariu
> Assignee: Josh Wills
> Attachments: 0001-Patched-the-MSCRPlanner-to-correctly-add-dependencie.patch
>
>
> The planner splits does the planning in multiple stages when it finds job dependencies on ReadableData. One example of this case is when using the BloomFilterJoinStrategy.
> While the generated plan dot file looks good, the planner actually does not add dependencies between jobs that are created in different planning stages.
> I have a pipeline that reads 3 input sources. It joins 2 of them using a bloom filter join strategy. Later on, it joins this with the output of a job coming from the third source path.
> In the case the jobs on the branch using the bloom filter finish before the one reading the third source, the executor attempts to start the 4-th job that is supposed to join everything before the 3-rd one finish, resulting in a input Path not found exception.
--
This message was sent by Atlassian JIRA
(v6.2#6252)