You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Maja Kabiljo (JIRA)" <ji...@apache.org> on 2016/06/23 15:46:16 UTC

[jira] [Resolved] (GIRAPH-1077) Jobs getting stuck after channel failure

     [ https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo resolved GIRAPH-1077.
----------------------------------
    Resolution: Fixed

> Jobs getting stuck after channel failure
> ----------------------------------------
>
>                 Key: GIRAPH-1077
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1077
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>
> When a channel fails currently we just log the failure. Since we don't wait on open requests from every place, checking requests doesn't get called always, and we've seen issues with jobs staying stuck, for example during the input stage when request for split to read from worker to master fails. When we know that channel failed, we should try to resend the requests from that channel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)