You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by majakabiljo <gi...@git.apache.org> on 2018/11/29 19:38:50 UTC

[GitHub] giraph pull request #96: GIRAPH-1213: Fix issues with network requests retri...

GitHub user majakabiljo opened a pull request:

    https://github.com/apache/giraph/pull/96

    GIRAPH-1213: Fix issues with network requests retries and add more logging

    Fixing two bugs:
    - When channel fails, we are currently retrying all requests towards the destination machine from the channel, instead of just ones which are happening on the concrete channel.
    - In practice, we've noticed BlockingOperationException can get thrown when we wait to connect on channel in which case we silently don't send the request we are trying to send, so catching this exception and retrying instead.
    Also added logging of channel ids to be able to debug issues related to network requests not delivering easier.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/majakabiljo/giraph giraph-1213

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/giraph/pull/96.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #96
    
----
commit 581dd9bbf47d02ceddf0aba2e8c97e80d7d6f44c
Author: Maja Kabiljo <ma...@...>
Date:   2018-11-29T19:35:53Z

    GIRAPH-1213: Fix issues with network requests retries and add more logging
    
    Fixing two bugs:
    - When channel fails, we are currently retrying all requests towards the destination machine from the channel, instead of just ones which are happening on the concrete channel.
    - In practice, we've noticed BlockingOperationException can get thrown when we wait to connect on channel in which case we silently don't send the request we are trying to send, so catching this exception and retrying instead.
    Also added logging of channel ids to be able to debug issues related to network requests not delivering easier.

----


---

[GitHub] giraph pull request #96: GIRAPH-1213: Fix issues with network requests retri...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/giraph/pull/96


---

[GitHub] giraph issue #96: GIRAPH-1213: Fix issues with network requests retries and ...

Posted by dlogothetis <gi...@git.apache.org>.
Github user dlogothetis commented on the issue:

    https://github.com/apache/giraph/pull/96
  
    Can you also add a couple of comments about how this was tested?


---

[GitHub] giraph pull request #96: GIRAPH-1213: Fix issues with network requests retri...

Posted by dlogothetis <gi...@git.apache.org>.
Github user dlogothetis commented on a diff in the pull request:

    https://github.com/apache/giraph/pull/96#discussion_r239255770
  
    --- Diff: giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ---
    @@ -1147,8 +1158,11 @@ private void checkRequestsAfterChannelFailure(final Channel channel) {
         resendRequestsWhenNeeded(new Predicate<RequestInfo>() {
           @Override
           public boolean apply(RequestInfo requestInfo) {
    -        return requestInfo.getDestinationAddress().equals(
    -            channel.remoteAddress());
    +        if (requestInfo.getWriteFuture() == null ||
    --- End diff --
    
    When is this condition true?


---

[GitHub] giraph pull request #96: GIRAPH-1213: Fix issues with network requests retri...

Posted by majakabiljo <gi...@git.apache.org>.
Github user majakabiljo commented on a diff in the pull request:

    https://github.com/apache/giraph/pull/96#discussion_r240745235
  
    --- Diff: giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ---
    @@ -1147,8 +1158,11 @@ private void checkRequestsAfterChannelFailure(final Channel channel) {
         resendRequestsWhenNeeded(new Predicate<RequestInfo>() {
           @Override
           public boolean apply(RequestInfo requestInfo) {
    -        return requestInfo.getDestinationAddress().equals(
    -            channel.remoteAddress());
    +        if (requestInfo.getWriteFuture() == null ||
    --- End diff --
    
    It can happen if the request wasn't sent out yet, not sure if there is some other scenario.


---

[GitHub] giraph issue #96: GIRAPH-1213: Fix issues with network requests retries and ...

Posted by majakabiljo <gi...@git.apache.org>.
Github user majakabiljo commented on the issue:

    https://github.com/apache/giraph/pull/96
  
    I used a pipeline which runs 100 jobs and was always getting at least a few jobs stuck with open network requests. Running it with more logging helped identify these two issues, and after the change it was 100% successful.


---