You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by GitBox <gi...@apache.org> on 2021/06/26 19:14:16 UTC
[GitHub] [mesos] cf-natali opened a new pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.
cf-natali opened a new pull request #397:
URL: https://github.com/apache/mesos/pull/397
Pending `ticks` are used by `scheduleTick` to decide whether to schedule
an event loop tick when a new timer is scheduled - since we only need to
schedule the event loop tick if the new timer is supposed to expire
earlier than the current earliest timer.
Unfortunately `Clock::finalize` didn't clear `ticks`, which means that
the following could happen:
- schedule a timer T0 for expiration at time t0
- call `process::reinitalize`, which calls `Clock::finalize` but doesn't
clear `ticks`
- schedule a new timer T1 for expiration at time t1 > t0: since
`scheduleTick` would see that there was already the earlier pending
tick for T0 in `ticks` with t0 < t1, it wouldn't actually schedule a
tick of the event loop
Therefore new timers would never fire again.
See https://github.com/apache/mesos/blob/1b445c392a25e6c556acbf2bd8ed5214250ddace/3rdparty/libprocess/src/clock.cpp#L127
for the check causing subsequent timers to never fire.
This caused e.g.
`DockerContainerizerIPv6Test.ROOT_DOCKER_LaunchIPv6HostNetwork` test to
hang since it called `process::reinitialized` while having some active
timers - e.g. the reaper periodic timer.
Also added a test specifically for this bug.
Original test hanging (log abridged for clarity):
```
root@thinkpad:/home/cf/src/mesos/build# GLOG_v=9 ./bin/mesos-tests.sh --verbose gtest_filter=DockerContainerizerIPv6Test.ROOT_DOCKER_LaunchIPv6HostNetwork
[...]
I0626 19:12:49.205178 514538 process.cpp:3088] Cleaning up __reaper__(1)@192.168.1.3:45143
I0626 19:12:49.205469 514545 process.cpp:935] Stopped the socket accept loop
[...]
W0626 19:12:49.268715 514523 docker_containerizer_tests.cpp:205] Pulling alpine and mesosphere/inky. This might take a while...
I0626 19:12:49.279342 514523 process.cpp:2918] Spawned process __latch__(5)@192.168.1.3:42489
I0626 19:12:49.279457 514581 process.cpp:2928] Resuming __latch__(5)@192.168.1.3:42489 at 2021-06-26 18:12:49.279472128+00:00
I0626 19:12:49.279515 514581 process.cpp:2928] Resuming __waiter__(5)@192.168.1.3:42489 at 2021-06-26 18:12:49.279522048+00:00
I0626 19:12:49.279533 514581 process.cpp:4017] Running waiter process for __latch__(5)@192.168.1.3:42489
I0626 19:12:49.279700 514581 clock.cpp:279] Created a timer for __waiter__(5)@192.168.1.3:42489 in 10mins in the future (2021-06-26 18:22:49.279692800+00:00)
I0626 19:12:49.267994 514587 clock.cpp:279] Created a timer for __reaper__(2)@192.168.1.3:42489 in 100ms in the future (2021-06-26 18:12:49.367970048+00:00)
I0626 19:12:49.280148 514523 process.cpp:2918] Spawned process __waiter__(5)@192.168.1.3:42489
I0626 19:12:49.267753 514580 process.cpp:2928] Resuming __collect__(1)@192.168.1.3:42489 at 2021-06-26 18:12:49.267812864+00:00
^C
root@thinkpad:/home/cf/src/mesos/build#
```
We can see that timers scheduled never fire.
And new test:
```
root@thinkpad:/home/cf/src/mesos/build# ./3rdparty/libprocess/libprocess-tests --gtest_filter=*Reinit*
Note: Google Test filter = *Reinit*-
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ProcessTest
[ RUN ] ProcessTest.TimerAfterReinitialize
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0626 18:34:56.618572 500810 process.cpp:935] Stopped the socket accept loop
^C
root@thinkpad:/home/cf/src/mesos/build#
```
@asekretenko @qianzhangxa @surahman
And also maybe @bmahler - I know you don't work on Mesos anymore but I think you might be quite familiar with this code and it's a small change so hopefully you can take a quick look?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [mesos] bmahler commented on pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.
Posted by GitBox <gi...@apache.org>.
bmahler commented on pull request #397:
URL: https://github.com/apache/mesos/pull/397#issuecomment-870813317
makes sense, thanks for fixing this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [mesos] cf-natali commented on pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.
Posted by GitBox <gi...@apache.org>.
cf-natali commented on pull request #397:
URL: https://github.com/apache/mesos/pull/397#issuecomment-870806793
It's really a simple change, so if anyone could have a look, that'd be great.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [mesos] bmahler merged pull request #397: Fixed a bug where timers wouldn't expire after `process:reinitialize`.
Posted by GitBox <gi...@apache.org>.
bmahler merged pull request #397:
URL: https://github.com/apache/mesos/pull/397
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org