You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2014/08/07 03:15:11 UTC

[jira] [Updated] (CRUNCH-458) Eliminate potentially random MR split-point decisions

     [ https://issues.apache.org/jira/browse/CRUNCH-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills updated CRUNCH-458:
------------------------------

    Attachment: CRUNCH-458.patch

First cut at this: using LinkedHashSet and LinkedHashMap inside of Edge to ensure that we always process the node paths/PCollections in the same order when making split decisions.

> Eliminate potentially random MR split-point decisions
> -----------------------------------------------------
>
>                 Key: CRUNCH-458
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-458
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Josh Wills
>         Attachments: CRUNCH-458.patch
>
>
> I'm running into a pipeline in which the decision of where to split two dependent jobs seems to be random from run-to-run (I only noticed it b/c one of the runs causes the pipeline to throw an NPE, and the other does not.) I'd like to investigate this and try to eliminate any potential sources of randomness in the way that two dependent GBK operations are split.



--
This message was sent by Atlassian JIRA
(v6.2#6252)