You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2019/12/20 03:48:17 UTC

[GitHub] [samza] abhishekshivanna opened a new pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

abhishekshivanna opened a new pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240
 
 
   **Symptom:** 
   When a container heartbeat fails, the container shutdown
   sequence is triggered and the Container is never restarted.
   
   **Cause:**
   When a container heartbeat fails, the container shutdown
   sequence exists the Container with an exit code of `0` which
   marks the container as `Completed` - preventing the JobCoordinator
   from restarting the container.
   
   **Changes:** 
   The container can shutdown exceptionally in the following two ways:
   1) Exception in the container
   2) Heartbeat Expired
   In both paths the ContainerLaunchUtil previously expected a
   shared static variable to hold the exception. The change introduced
   gets rid of the static variable and checks each path explicitly
   and exits with code `1` in both cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] abhishekshivanna commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
abhishekshivanna commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240#issuecomment-568024607
 
 
   @mynameborat Correct ! 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] abhishekshivanna edited a comment on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
abhishekshivanna edited a comment on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240#issuecomment-568024607
 
 
   @mynameborat Correct ! I updated the description to include this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] abhishekshivanna opened a new pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
abhishekshivanna opened a new pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240
 
 
   **Symptom:** 
   When a container heartbeat fails, the container shutdown
   sequence is triggered and the Container is never restarted.
   
   **Cause:**
   When a container heartbeat fails, the container shutdown
   sequence exists the Container with an exit code of `0` which
   marks the container as `Completed` - preventing the JobCoordinator
   from restarting the container.
   The bug is caused by `containerException` overwritten with the value
   returned by `listener.getContainerException` without checking if 
   `containerException` was already set by the heartbeat monitor
   
   **Changes:** 
   The container can shutdown exceptionally in the following two ways:
   1) Exception in the container
   2) Heartbeat Expired
   In both paths the ContainerLaunchUtil previously expected a
   shared static variable to hold the exception. The change introduced
   gets rid of the static variable and checks each path explicitly
   and exits with code `1` in both cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat removed a comment on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
mynameborat removed a comment on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240#issuecomment-568097544
 
 
   > can we add a unit test for this?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
mynameborat commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240#issuecomment-568097544
 
 
   > can we add a unit test for this?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat merged pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
mynameborat merged pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat closed pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
mynameborat closed pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] abhishekshivanna commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
abhishekshivanna commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240#issuecomment-568105529
 
 
   I agree, looks like there is no way to mock objects that `ContainerLaunchUtil` instantiates in order to test this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown

Posted by GitBox <gi...@apache.org>.
mynameborat commented on issue #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240#issuecomment-568078148
 
 
   can we add a unit test for this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services