You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Sergey Edunov <ed...@gmail.com> on 2014/06/06 00:26:44 UTC

Re: Review Request 21987: Detect crashes of Netty threads

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21987/
-----------------------------------------------------------

(Updated June 5, 2014, 10:26 p.m.)


Review request for giraph.


Repository: giraph-git


Description
-------

When some of the request processing threads fails, the worker gets stuck but the job doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail the job automatically.


Diffs (updated)
-----

  findbugs-exclude.xml e0466f7 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ae40c3b 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java c982209 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java cb36c3e 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java 14d4ea8 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java 7541418 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java adb96cb 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ExceptionHandler.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java 601cd2f 
  giraph-core/src/main/java/org/apache/giraph/graph/GraphMapper.java c86a024 
  giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java ad5fc91 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 90dc9f3 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java aff7084 
  giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java f4719cc 
  giraph-core/src/test/java/org/apache/giraph/comm/ConnectionTest.java e771e36 
  giraph-core/src/test/java/org/apache/giraph/comm/MockExceptionHandler.java PRE-CREATION 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 236bc88 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java fcdfa5c 
  giraph-core/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java c026cf8 

Diff: https://reviews.apache.org/r/21987/diff/


Testing
-------

Run some production jobs with this change. 
Also introduced random bugs in deserialization logic and confirmed that job fails. 


Thanks,

Sergey Edunov


Re: Review Request 21987: Detect crashes of Netty threads

Posted by Pavan Kumar Athivarapu <pa...@outlook.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21987/#review46705
-----------------------------------------------------------

Ship it!


Ship It!

- Pavan Kumar Athivarapu


On June 25, 2014, 7:52 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21987/
> -----------------------------------------------------------
> 
> (Updated June 25, 2014, 7:52 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> When some of the request processing threads fails, the worker gets stuck but the job doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail the job automatically.
> 
> 
> Diffs
> -----
> 
>   findbugs-exclude.xml e0466f7 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ae40c3b 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java c982209 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java cb36c3e 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java 14d4ea8 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java 7541418 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java adb96cb 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/MasterRequestServerHandler.java 3e06026 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java b6d0533 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/WorkerRequestServerHandler.java f64c373 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphMapper.java c86a024 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java e13eedd 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 02d4f2b 
>   giraph-core/src/main/java/org/apache/giraph/utils/ThreadUtils.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java dbe6a45 
>   giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java f4719cc 
>   giraph-core/src/test/java/org/apache/giraph/comm/ConnectionTest.java e771e36 
>   giraph-core/src/test/java/org/apache/giraph/comm/MockExceptionHandler.java PRE-CREATION 
>   giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 157a543 
>   giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java 32454f4 
>   giraph-core/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java c026cf8 
> 
> Diff: https://reviews.apache.org/r/21987/diff/
> 
> 
> Testing
> -------
> 
> Run some production jobs with this change. 
> Also introduced random bugs in deserialization logic and confirmed that job fails. 
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Re: Review Request 21987: Detect crashes of Netty threads

Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21987/
-----------------------------------------------------------

(Updated June 25, 2014, 7:52 p.m.)


Review request for giraph.


Changes
-------

Addressing CR issues. I added exception tracking on client side (didn't add app termination as it will break resending logic, that could be broken btw) 


Repository: giraph-git


Description
-------

When some of the request processing threads fails, the worker gets stuck but the job doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail the job automatically.


Diffs (updated)
-----

  findbugs-exclude.xml e0466f7 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ae40c3b 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java c982209 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java cb36c3e 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java 14d4ea8 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java 7541418 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java adb96cb 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/MasterRequestServerHandler.java 3e06026 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java b6d0533 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/WorkerRequestServerHandler.java f64c373 
  giraph-core/src/main/java/org/apache/giraph/graph/GraphMapper.java c86a024 
  giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java e13eedd 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 02d4f2b 
  giraph-core/src/main/java/org/apache/giraph/utils/ThreadUtils.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java dbe6a45 
  giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java f4719cc 
  giraph-core/src/test/java/org/apache/giraph/comm/ConnectionTest.java e771e36 
  giraph-core/src/test/java/org/apache/giraph/comm/MockExceptionHandler.java PRE-CREATION 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 157a543 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java 32454f4 
  giraph-core/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java c026cf8 

Diff: https://reviews.apache.org/r/21987/diff/


Testing
-------

Run some production jobs with this change. 
Also introduced random bugs in deserialization logic and confirmed that job fails. 


Thanks,

Sergey Edunov