You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Sergey Edunov <ed...@gmail.com> on 2014/06/06 00:26:44 UTC
Re: Review Request 21987: Detect crashes of Netty threads
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21987/
-----------------------------------------------------------
(Updated June 5, 2014, 10:26 p.m.)
Review request for giraph.
Repository: giraph-git
Description
-------
When some of the request processing threads fails, the worker gets stuck but the job doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail the job automatically.
Diffs (updated)
-----
findbugs-exclude.xml e0466f7
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ae40c3b
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java c982209
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java cb36c3e
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java 14d4ea8
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java 7541418
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java adb96cb
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ExceptionHandler.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java 601cd2f
giraph-core/src/main/java/org/apache/giraph/graph/GraphMapper.java c86a024
giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java ad5fc91
giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 90dc9f3
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java aff7084
giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java f4719cc
giraph-core/src/test/java/org/apache/giraph/comm/ConnectionTest.java e771e36
giraph-core/src/test/java/org/apache/giraph/comm/MockExceptionHandler.java PRE-CREATION
giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 236bc88
giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java fcdfa5c
giraph-core/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java c026cf8
Diff: https://reviews.apache.org/r/21987/diff/
Testing
-------
Run some production jobs with this change.
Also introduced random bugs in deserialization logic and confirmed that job fails.
Thanks,
Sergey Edunov
Re: Review Request 21987: Detect crashes of Netty threads
Posted by Pavan Kumar Athivarapu <pa...@outlook.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21987/#review46705
-----------------------------------------------------------
Ship it!
Ship It!
- Pavan Kumar Athivarapu
On June 25, 2014, 7:52 p.m., Sergey Edunov wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21987/
> -----------------------------------------------------------
>
> (Updated June 25, 2014, 7:52 p.m.)
>
>
> Review request for giraph.
>
>
> Repository: giraph-git
>
>
> Description
> -------
>
> When some of the request processing threads fails, the worker gets stuck but the job doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail the job automatically.
>
>
> Diffs
> -----
>
> findbugs-exclude.xml e0466f7
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ae40c3b
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java c982209
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java cb36c3e
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java 14d4ea8
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java 7541418
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java adb96cb
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/MasterRequestServerHandler.java 3e06026
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java b6d0533
> giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/WorkerRequestServerHandler.java f64c373
> giraph-core/src/main/java/org/apache/giraph/graph/GraphMapper.java c86a024
> giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java e13eedd
> giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 02d4f2b
> giraph-core/src/main/java/org/apache/giraph/utils/ThreadUtils.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java dbe6a45
> giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java f4719cc
> giraph-core/src/test/java/org/apache/giraph/comm/ConnectionTest.java e771e36
> giraph-core/src/test/java/org/apache/giraph/comm/MockExceptionHandler.java PRE-CREATION
> giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 157a543
> giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java 32454f4
> giraph-core/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java c026cf8
>
> Diff: https://reviews.apache.org/r/21987/diff/
>
>
> Testing
> -------
>
> Run some production jobs with this change.
> Also introduced random bugs in deserialization logic and confirmed that job fails.
>
>
> Thanks,
>
> Sergey Edunov
>
>
Re: Review Request 21987: Detect crashes of Netty threads
Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21987/
-----------------------------------------------------------
(Updated June 25, 2014, 7:52 p.m.)
Review request for giraph.
Changes
-------
Addressing CR issues. I added exception tracking on client side (didn't add app termination as it will break resending logic, that could be broken btw)
Repository: giraph-git
Description
-------
When some of the request processing threads fails, the worker gets stuck but the job doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail the job automatically.
Diffs (updated)
-----
findbugs-exclude.xml e0466f7
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ae40c3b
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java c982209
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java cb36c3e
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java 14d4ea8
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java 7541418
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java adb96cb
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/MasterRequestServerHandler.java 3e06026
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java b6d0533
giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/WorkerRequestServerHandler.java f64c373
giraph-core/src/main/java/org/apache/giraph/graph/GraphMapper.java c86a024
giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java e13eedd
giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 02d4f2b
giraph-core/src/main/java/org/apache/giraph/utils/ThreadUtils.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java dbe6a45
giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java f4719cc
giraph-core/src/test/java/org/apache/giraph/comm/ConnectionTest.java e771e36
giraph-core/src/test/java/org/apache/giraph/comm/MockExceptionHandler.java PRE-CREATION
giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 157a543
giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java 32454f4
giraph-core/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java c026cf8
Diff: https://reviews.apache.org/r/21987/diff/
Testing
-------
Run some production jobs with this change.
Also introduced random bugs in deserialization logic and confirmed that job fails.
Thanks,
Sergey Edunov