You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Maja Kabiljo (JIRA)" <ji...@apache.org> on 2012/11/29 20:20:58 UTC

[jira] [Created] (GIRAPH-437) Missing progress calls when stopping Netty server

Maja Kabiljo created GIRAPH-437:
-----------------------------------

             Summary: Missing progress calls when stopping Netty server
                 Key: GIRAPH-437
                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
             Project: Giraph
          Issue Type: Improvement
            Reporter: Maja Kabiljo
         Attachments: GIRAPH-437.patch

At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-437) Missing progress calls when stopping Netty server

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509096#comment-13509096 ] 

Maja Kabiljo commented on GIRAPH-437:
-------------------------------------

Eli, thank you for your comment. We did get the timeout here again.
 
I've been looking through netty.io site, but it states that we can use ChannelGroup.close()
http://static.netty.io/3.5/api/org/jboss/netty/channel/socket/nio/NioServerSocketChannelFactory.html
The problem seems to be that we are not keeping track of all connected channels, so we don't try to close all of them. 

Please take a look at GIRAPH-441.

One thing about NettyClient.stop() - we don't wait for all the connections to close and resources to be released before returning from the call. Is that intentional?

                
> Missing progress calls when stopping Netty server
> -------------------------------------------------
>
>                 Key: GIRAPH-437
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-437.patch
>
>
> At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-437) Missing progress calls when stopping Netty server

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507611#comment-13507611 ] 

Eli Reisman commented on GIRAPH-437:
------------------------------------

You know the problem you're monitoring here, the lockup on awaitUniteruptably() for NettyServer, might be caused by that call itself. This is how the NettyClient shuts down, perhaps this is the pattern we need to add to the NettyServer ChannelGroup shutdown? There is more info about this pattern and why to use it on the netty.io site. The snippet is part of the netty.io way to avoid using awaitUninterruptably as in NettyServer. Here's the NettyClient#stop() method for reference:

{code}
public void stop() {
    // Close connections asynchronously, in a Netty-approved
    // way, without cleaning up thread pools until all channels
    // in addressChannelMap are closed (success or failure)
    int channelCount = 0;
    for (ChannelRotater channelRotater : addressChannelMap.values()) {
      channelCount += channelRotater.size();
    }
    final int done = channelCount;
    final AtomicInteger count = new AtomicInteger(0);
    for (ChannelRotater channelRotater : addressChannelMap.values()) {
      channelRotater.closeChannels(new ChannelFutureListener() {
        @Override
        public void operationComplete(ChannelFuture cf) {
          context.progress();
          if (count.incrementAndGet() == done) {
            if (LOG.isInfoEnabled()) {
              LOG.info("stop: reached wait threshold, " +
                  done + " connections closed, releasing " +
                  "NettyClient.bootstrap resources now.");
            }
            bossExecutorService.shutdownNow();
            workerExecutorService.shutdownNow();
            bootstrap.releaseExternalResources();
          }
        }
      });
    }
  }
{code}

I might be way off here, but when I implemented the original version of this pattern I only did it in one of the two files (Client not Server) so this could maybe be the reason for the hangup in your logs? ...Or not! Anyway, just a thought. 
                
> Missing progress calls when stopping Netty server
> -------------------------------------------------
>
>                 Key: GIRAPH-437
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-437.patch
>
>
> At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-437) Missing progress calls when stopping Netty server

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506960#comment-13506960 ] 

Maja Kabiljo commented on GIRAPH-437:
-------------------------------------

Sure, here it is: https://reviews.apache.org/r/8286/
                
> Missing progress calls when stopping Netty server
> -------------------------------------------------
>
>                 Key: GIRAPH-437
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-437.patch
>
>
> At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-437) Missing progress calls when stopping Netty server

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506944#comment-13506944 ] 

Avery Ching commented on GIRAPH-437:
------------------------------------

This is useful.  Can you please add a reviewboard as well?
                
> Missing progress calls when stopping Netty server
> -------------------------------------------------
>
>                 Key: GIRAPH-437
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-437.patch
>
>
> At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (GIRAPH-437) Missing progress calls when stopping Netty server

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo reassigned GIRAPH-437:
-----------------------------------

    Assignee: Maja Kabiljo
    
> Missing progress calls when stopping Netty server
> -------------------------------------------------
>
>                 Key: GIRAPH-437
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-437.patch
>
>
> At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-437) Missing progress calls when stopping Netty server

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507617#comment-13507617 ] 

Hudson commented on GIRAPH-437:
-------------------------------

Integrated in Giraph-trunk-Commit #300 (See [https://builds.apache.org/job/Giraph-trunk-Commit/300/])
    GIRAPH-437: Missing progress calls when stopping Netty server (Revision 1415806)

     Result = SUCCESS
maja : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1415806
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ProgressableUtils.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java

                
> Missing progress calls when stopping Netty server
> -------------------------------------------------
>
>                 Key: GIRAPH-437
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-437.patch
>
>
> At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-437) Missing progress calls when stopping Netty server

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo updated GIRAPH-437:
--------------------------------

    Attachment: GIRAPH-437.patch

Changing awaitUninterruptibly to periodical calls of await. Added a log line to the end of NettyServer.stop so if it happens again we can be sure where the problem is. I also refactored ProgressableUtils so new cases where we need to wait for something will be easier to write.

Passes mvn verify.
                
> Missing progress calls when stopping Netty server
> -------------------------------------------------
>
>                 Key: GIRAPH-437
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-437
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-437.patch
>
>
> At the end of a long running job I got an exception about not reporting progress. The last log line was: "stop: Halting netty server", so I suspect it's because awaitUninterruptibly() call there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira