You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by GitBox <gi...@apache.org> on 2021/06/26 19:14:16 UTC

[GitHub] [mesos] cf-natali opened a new pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.

cf-natali opened a new pull request #397:
URL: https://github.com/apache/mesos/pull/397


   Pending `ticks` are used by `scheduleTick` to decide whether to schedule
   an event loop tick when a new timer is scheduled - since we only need to
   schedule the event loop tick if the new timer is supposed to expire
   earlier than the current earliest timer.
   Unfortunately `Clock::finalize` didn't clear `ticks`, which means that
   the following could happen:
   - schedule a timer T0 for expiration at time t0
   - call `process::reinitalize`, which calls `Clock::finalize` but doesn't
     clear `ticks`
   - schedule a new timer T1 for expiration at time t1 > t0: since
     `scheduleTick` would see that there was already the earlier pending
     tick for T0 in `ticks` with t0 < t1, it wouldn't actually schedule a
     tick of the event loop
   
   Therefore new timers would never fire again.
   
   See https://github.com/apache/mesos/blob/1b445c392a25e6c556acbf2bd8ed5214250ddace/3rdparty/libprocess/src/clock.cpp#L127
   for the check causing subsequent timers to never fire.
   
   This caused e.g.
   `DockerContainerizerIPv6Test.ROOT_DOCKER_LaunchIPv6HostNetwork` test to
   hang since it called `process::reinitialized` while having some active
   timers - e.g. the reaper periodic timer.
   
   Also added a test specifically for this bug.
   
   Original test hanging (log abridged for clarity):
   ```
   root@thinkpad:/home/cf/src/mesos/build# GLOG_v=9 ./bin/mesos-tests.sh --verbose gtest_filter=DockerContainerizerIPv6Test.ROOT_DOCKER_LaunchIPv6HostNetwork                                                       
   [...]
   I0626 19:12:49.205178 514538 process.cpp:3088] Cleaning up __reaper__(1)@192.168.1.3:45143                                                                                                                         
   I0626 19:12:49.205469 514545 process.cpp:935] Stopped the socket accept loop       
   [...]
   W0626 19:12:49.268715 514523 docker_containerizer_tests.cpp:205] Pulling alpine and mesosphere/inky. This might take a while...
   I0626 19:12:49.279342 514523 process.cpp:2918] Spawned process __latch__(5)@192.168.1.3:42489
   I0626 19:12:49.279457 514581 process.cpp:2928] Resuming __latch__(5)@192.168.1.3:42489 at 2021-06-26 18:12:49.279472128+00:00
   I0626 19:12:49.279515 514581 process.cpp:2928] Resuming __waiter__(5)@192.168.1.3:42489 at 2021-06-26 18:12:49.279522048+00:00
   I0626 19:12:49.279533 514581 process.cpp:4017] Running waiter process for __latch__(5)@192.168.1.3:42489
   I0626 19:12:49.279700 514581 clock.cpp:279] Created a timer for __waiter__(5)@192.168.1.3:42489 in 10mins in the future (2021-06-26 18:22:49.279692800+00:00)
   I0626 19:12:49.267994 514587 clock.cpp:279] Created a timer for __reaper__(2)@192.168.1.3:42489 in 100ms in the future (2021-06-26 18:12:49.367970048+00:00)
   I0626 19:12:49.280148 514523 process.cpp:2918] Spawned process __waiter__(5)@192.168.1.3:42489
   I0626 19:12:49.267753 514580 process.cpp:2928] Resuming __collect__(1)@192.168.1.3:42489 at 2021-06-26 18:12:49.267812864+00:00
   ^C
   root@thinkpad:/home/cf/src/mesos/build# 
   ```
   
   We can see that timers scheduled never fire.
   
   And new test:
   ```
   root@thinkpad:/home/cf/src/mesos/build# ./3rdparty/libprocess/libprocess-tests --gtest_filter=*Reinit*   
   Note: Google Test filter = *Reinit*-                                                                     
   [==========] Running 1 test from 1 test case.                                                                                                                                                                      
   [----------] Global test environment set-up.                                                             
   [----------] 1 test from ProcessTest                                                                                                                                                                               
   [ RUN      ] ProcessTest.TimerAfterReinitialize                                                                                                                                                                    
   WARNING: Logging before InitGoogleLogging() is written to STDERR                                                                                                                                                   
   I0626 18:34:56.618572 500810 process.cpp:935] Stopped the socket accept loop                                                                                                                                       
   ^C                                                                                                       
   root@thinkpad:/home/cf/src/mesos/build#
   ```
   
   @asekretenko @qianzhangxa @surahman 
   
   And also maybe @bmahler - I know you don't work on Mesos anymore but I think you might be quite familiar with this code and it's a small change so hopefully you can take a quick look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] bmahler commented on pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.

Posted by GitBox <gi...@apache.org>.
bmahler commented on pull request #397:
URL: https://github.com/apache/mesos/pull/397#issuecomment-870813317


   makes sense, thanks for fixing this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] cf-natali commented on pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.

Posted by GitBox <gi...@apache.org>.
cf-natali commented on pull request #397:
URL: https://github.com/apache/mesos/pull/397#issuecomment-870806793


   It's really  a simple change, so if anyone could have a look, that'd be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [mesos] bmahler merged pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.

Posted by GitBox <gi...@apache.org>.
bmahler merged pull request #397:
URL: https://github.com/apache/mesos/pull/397


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org