You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Armand Grillet <ag...@mesosphere.io> on 2017/12/06 15:57:58 UTC
Review Request 64379: Improved logs displayed after a slave failed
recovery.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/
-----------------------------------------------------------
Review request for mesos and Alexander Rukletsov.
Repository: mesos
Description
-------
Add some steps to clean the Docker daemon
state used by the Docker containerizer.
Diffs
-----
src/slave/slave.cpp 49270013537356c8fe9150d757b064bc3bbae3cb
Diff: https://reviews.apache.org/r/64379/diff/1/
Testing
-------
None. The previous logs were:
```
Nov: To remedy this do as follows:
Nov: Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
Nov: This ensures agent doesn't recover old live executors.
Nov: Step 2: Restart the agent.
```
I have thus removed the tab before `This ensures agent doesn't recover` as it did not appear in the logs.
Thanks,
Armand Grillet
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/#review193278
-----------------------------------------------------------
FAIL: Some Mesos libprocess-tests failed.
Reviews applied: `['64379']`
Failed command: `C:\DCOS\mesos\3rdparty\libprocess\src\tests\Debug\libprocess-tests.exe`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64379
Relevant logs:
- [libprocess-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64379/logs/libprocess-tests-stdout.log):
```
[ OK ] LimiterTest.THREADSAFE_Acquire (3 ms)
[ RUN ] LimiterTest.THREADSAFE_DiscardMiddle
[ OK ] LimiterTest.THREADSAFE_DiscardMiddle (3 ms)
[ RUN ] LimiterTest.THREADSAFE_DiscardLast
[ OK ] LimiterTest.THREADSAFE_DiscardLast (2 ms)
[----------] 3 tests from LimiterTest (101 ms total)
[----------] 4 tests from LoopTest
[ RUN ] LoopTest.Sync
[ OK ] LoopTest.Sync (1 ms)
[ RUN ] LoopTest.Async
[ OK ] LoopTest.Async (1 ms)
[ RUN ] LoopTest.DiscardIterate
[ OK ] LoopTest.DiscardIterate (1 ms)
[ RUN ] LoopTest.DiscardBody
[ OK ] LoopTest.DiscardBody (2 ms)
[----------] 4 tests from LoopTest (108 ms total)
[----------] 9 tests from MetricsTest
[ RUN ] MetricsTest.Counter
[ OK ] MetricsTest.Counter (3 ms)
[ RUN ] MetricsTest.THREADSAFE_Gauge
[ OK ] MetricsTest.THREADSAFE_Gauge (2 ms)
[ RUN ] MetricsTest.Statistics
[ OK ] MetricsTest.Statistics (7 ms)
[ RUN ] MetricsTest.THREADSAFE_Snapshot
[ OK ] MetricsTest.THREADSAFE_Snapshot (53 ms)
[ RUN ] MetricsTest.THREADSAFE_SnapshotTimeout
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\metrics_tests.cpp(335): error: Failed to wait 15secs for response
```
- Mesos Reviewbot Windows
On Dec. 8, 2017, 1:23 p.m., Armand Grillet wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64379/
> -----------------------------------------------------------
>
> (Updated Dec. 8, 2017, 1:23 p.m.)
>
>
> Review request for mesos, Alexander Rukletsov and Benno Evers.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Add some steps to clean the Docker daemon
> state used by the Docker containerizer.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 54d8bcc035227dd6896ffa6e692a91749c0b56a6
>
>
> Diff: https://reviews.apache.org/r/64379/diff/2/
>
>
> Testing
> -------
>
> None. The previous logs were:
> ```
> Nov: To remedy this do as follows:
> Nov: Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
> Nov: This ensures agent doesn't recover old live executors.
> Nov: Step 2: Restart the agent.
> ```
> I have thus removed the tab before `This ensures agent doesn't recover` as it did not appear in the logs.
>
>
> Thanks,
>
> Armand Grillet
>
>
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/#review193266
-----------------------------------------------------------
Please check how the output actually looks in console and paste it in the testing done section.
src/slave/slave.cpp
Lines 6795 (patched)
<https://reviews.apache.org/r/64379/#comment271806>
..., not just those started by Mesos!
- Alexander Rukletsov
On Dec. 8, 2017, 1:23 p.m., Armand Grillet wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64379/
> -----------------------------------------------------------
>
> (Updated Dec. 8, 2017, 1:23 p.m.)
>
>
> Review request for mesos, Alexander Rukletsov and Benno Evers.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Add some steps to clean the Docker daemon
> state used by the Docker containerizer.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 54d8bcc035227dd6896ffa6e692a91749c0b56a6
>
>
> Diff: https://reviews.apache.org/r/64379/diff/2/
>
>
> Testing
> -------
>
> None. The previous logs were:
> ```
> Nov: To remedy this do as follows:
> Nov: Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
> Nov: This ensures agent doesn't recover old live executors.
> Nov: Step 2: Restart the agent.
> ```
> I have thus removed the tab before `This ensures agent doesn't recover` as it did not appear in the logs.
>
>
> Thanks,
>
> Armand Grillet
>
>
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/#review193781
-----------------------------------------------------------
FAIL: Some Mesos tests failed.
Reviews applied: `['64379']`
Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64379
Relevant logs:
- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64379/logs/mesos-tests-stdout.log):
```
[ RUN ] SlaveTest.RunTaskGroupFailedSecretGeneration
[ OK ] SlaveTest.RunTaskGroupFailedSecretGeneration (232 ms)
[ RUN ] SlaveTest.RunTaskGroupInvalidExecutorSecret
[ OK ] SlaveTest.RunTaskGroupInvalidExecutorSecret (240 ms)
[ RUN ] SlaveTest.RunTaskGroupReferenceTypeSecret
[ OK ] SlaveTest.RunTaskGroupReferenceTypeSecret (238 ms)
[ RUN ] SlaveTest.RunTaskGroupGenerateSecretAfterShutdown
[ OK ] SlaveTest.RunTaskGroupGenerateSecretAfterShutdown (251 ms)
[ RUN ] SlaveTest.KillTaskGroupBetweenRunTaskParts
[ OK ] SlaveTest.KillTaskGroupBetweenRunTaskParts (223 ms)
[ RUN ] SlaveTest.KillQueuedTaskGroup
[ OK ] SlaveTest.KillQueuedTaskGroup (290 ms)
[ RUN ] SlaveTest.MaxCompletedExecutorsPerFrameworkFlag
[ OK ] SlaveTest.MaxCompletedExecutorsPerFrameworkFlag (1034 ms)
[ RUN ] SlaveTest.ShutdownV0ExecutorIfItReregistersWithoutReconnect
[ OK ] SlaveTest.ShutdownV0ExecutorIfItReregistersWithoutReconnect (261 ms)
[ RUN ] SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect
[ OK ] SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect (266 ms)
[ RUN ] SlaveTest.BrowseExecutorSandboxByVirtualPath
[ OK ] SlaveTest.BrowseExecutorSandboxByVirtualPath (307 ms)
[ RUN ] SlaveTest.DisconnectedExecutorDropsMessages
[ OK ] SlaveTest.DisconnectedExecutorDropsMessages (280 ms)
[ RUN ] SlaveTest.ResourceProviderSubscribe
[ OK ] SlaveTest.ResourceProviderSubscribe (200 ms)
[ RUN ] SlaveTest.ResourceVersions
[ OK ] SlaveTest.ResourceVersions (167 ms)
[ RUN ] SlaveTest.ReconfigurationPolicy
[ OK ] SlaveTest.ReconfigurationPolicy (252 ms)
[ RUN ] SlaveTest.ResourceProviderReconciliation
```
- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64379/logs/mesos-tests-stderr.log):
```
@ 00007FF785429FB0 google::LogMessage::SendToLog
@ 00007FF785429797 google::LogMessage::Flush
@ 00007FF78542B2D1 google::LogMessageFatal::~LogMessageFatal
@ 00007FF7833429AC mesos::internal::slave::Slave::handleResourceProviderMessage
@ 00007FF7834EDB05 ??
@ 00007FF7833DB318 std::_Invoker_functor::_Call<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,process::ProcessBase * __ptr64>
@ 00007FF783461008 std::invoke<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,process::ProcessBase * __ptr64>
@ 00007FF78347120B lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >::invoke_expand<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,std::tuple<process::Future<mesos::internal::ResourceProvi
@ 00007FF7833B756A )<process::ProcessBase * __ptr64
@ 00007FF7833E196C std::_Invoker_functor::_Call<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64>
@ 00007FF78346746C std::invoke<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64>
@ 00007FF7833BF4D1 )<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64
@ 00007FF7834FBA26 process::ProcessBase * __ptr64)>::CallableFn<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> > >::operator(
@ 00007FF784F199ED process::ProcessBase * __ptr64)>::operator(
@ 00007FF784DF2AC9 process::ProcessBase::consume
@ 00007FF784F6DB4A process::DispatchEvent::consume
@ 00007FF7813B43B7 process::ProcessBase::serve
@ 00007FF784E007AB process::ProcessManager::resume
@ 00007FF784F0A211 ??
@ 00007FF784E48CF0 std::_Invoker_functor::_Call<<lambda_124422ac022fa041208b80c1460630d7> >
@ 00007FF784E9E690 std::invoke<<lambda_124422ac022fa041208b80c1460630d7> >
@ 00007FF784E57AAC std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Execute<0>
@ 00007FF784F55B4A std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Run
@ 00007FF784F425D8 std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Go
@ 00007FF784F2A2BD std::_Pad::_Call_func
@ 00007FF7860CC078 invoke_thread_procedure
@ 00007FF7860CBB21 __cdecl*)(void * __ptr64)
@ 00007FFB34601FE4 BaseThreadInitThunk
@ 00007FFB3532EF91 RtlUserThreadStart
```
- Mesos Reviewbot Windows
On Dec. 13, 2017, 7:26 p.m., Armand Grillet wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64379/
> -----------------------------------------------------------
>
> (Updated Dec. 13, 2017, 7:26 p.m.)
>
>
> Review request for mesos, Alexander Rukletsov and Benno Evers.
>
>
> Bugs: MESOS-8328
> https://issues.apache.org/jira/browse/MESOS-8328
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Add some steps to clean the Docker daemon
> state used by the Docker containerizer.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp d997b4272578efffed05d38771f17df387ccac48
>
>
> Diff: https://reviews.apache.org/r/64379/diff/4/
>
>
> Testing
> -------
>
> New logs:
> ```
> E1213 10:58:10.826020 10057 slave.cpp:6738] EXIT with status 1: Failed to perform recovery: <error>
> If recovery failed due to a change in configuration and you want to
> keep the current agent id, you might want to change the
> `--reconfiguration_policy` flag to a more permissive value.
>
> To restart this agent with a new agent id instead, do as follows:
> rm -f /tmp/agent/meta/slaves/latest
> This ensures that the agent does not recover old live executors.
>
> If you use the Docker containerizer and think that the Docker
> daemon state is broken, you can try to clear it. But be careful:
> these commands will erase all containers and images from this host,
> not just those started by Mesos!
> docker kill $(docker ps -q)
> docker rm $(docker ps -a -q)
> docker rmi $(docker images -q)
>
> Finally, restart the agent.
> ```
>
>
> Thanks,
>
> Armand Grillet
>
>
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/#review194155
-----------------------------------------------------------
Ship it!
Ship It!
- Alexander Rukletsov
On Dec. 13, 2017, 7:26 p.m., Armand Grillet wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64379/
> -----------------------------------------------------------
>
> (Updated Dec. 13, 2017, 7:26 p.m.)
>
>
> Review request for mesos, Alexander Rukletsov and Benno Evers.
>
>
> Bugs: MESOS-8328
> https://issues.apache.org/jira/browse/MESOS-8328
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Add some steps to clean the Docker daemon
> state used by the Docker containerizer.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp d997b4272578efffed05d38771f17df387ccac48
>
>
> Diff: https://reviews.apache.org/r/64379/diff/4/
>
>
> Testing
> -------
>
> New logs:
> ```
> E1213 10:58:10.826020 10057 slave.cpp:6738] EXIT with status 1: Failed to perform recovery: <error>
> If recovery failed due to a change in configuration and you want to
> keep the current agent id, you might want to change the
> `--reconfiguration_policy` flag to a more permissive value.
>
> To restart this agent with a new agent id instead, do as follows:
> rm -f /tmp/agent/meta/slaves/latest
> This ensures that the agent does not recover old live executors.
>
> If you use the Docker containerizer and think that the Docker
> daemon state is broken, you can try to clear it. But be careful:
> these commands will erase all containers and images from this host,
> not just those started by Mesos!
> docker kill $(docker ps -q)
> docker rm $(docker ps -a -q)
> docker rmi $(docker images -q)
>
> Finally, restart the agent.
> ```
>
>
> Thanks,
>
> Armand Grillet
>
>
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/#review193903
-----------------------------------------------------------
Ship it!
Ship It!
- Benno Evers
On Dec. 13, 2017, 7:26 p.m., Armand Grillet wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64379/
> -----------------------------------------------------------
>
> (Updated Dec. 13, 2017, 7:26 p.m.)
>
>
> Review request for mesos, Alexander Rukletsov and Benno Evers.
>
>
> Bugs: MESOS-8328
> https://issues.apache.org/jira/browse/MESOS-8328
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Add some steps to clean the Docker daemon
> state used by the Docker containerizer.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp d997b4272578efffed05d38771f17df387ccac48
>
>
> Diff: https://reviews.apache.org/r/64379/diff/4/
>
>
> Testing
> -------
>
> New logs:
> ```
> E1213 10:58:10.826020 10057 slave.cpp:6738] EXIT with status 1: Failed to perform recovery: <error>
> If recovery failed due to a change in configuration and you want to
> keep the current agent id, you might want to change the
> `--reconfiguration_policy` flag to a more permissive value.
>
> To restart this agent with a new agent id instead, do as follows:
> rm -f /tmp/agent/meta/slaves/latest
> This ensures that the agent does not recover old live executors.
>
> If you use the Docker containerizer and think that the Docker
> daemon state is broken, you can try to clear it. But be careful:
> these commands will erase all containers and images from this host,
> not just those started by Mesos!
> docker kill $(docker ps -q)
> docker rm $(docker ps -a -q)
> docker rmi $(docker images -q)
>
> Finally, restart the agent.
> ```
>
>
> Thanks,
>
> Armand Grillet
>
>
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Armand Grillet <ag...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/
-----------------------------------------------------------
(Updated Dec. 13, 2017, 7:26 p.m.)
Review request for mesos, Alexander Rukletsov and Benno Evers.
Changes
-------
Improved message.
Bugs: MESOS-8328
https://issues.apache.org/jira/browse/MESOS-8328
Repository: mesos
Description
-------
Add some steps to clean the Docker daemon
state used by the Docker containerizer.
Diffs (updated)
-----
src/slave/slave.cpp d997b4272578efffed05d38771f17df387ccac48
Diff: https://reviews.apache.org/r/64379/diff/4/
Changes: https://reviews.apache.org/r/64379/diff/3-4/
Testing (updated)
-------
New logs:
```
E1213 10:58:10.826020 10057 slave.cpp:6738] EXIT with status 1: Failed to perform recovery: <error>
If recovery failed due to a change in configuration and you want to
keep the current agent id, you might want to change the
`--reconfiguration_policy` flag to a more permissive value.
To restart this agent with a new agent id instead, do as follows:
rm -f /tmp/agent/meta/slaves/latest
This ensures that the agent does not recover old live executors.
If you use the Docker containerizer and think that the Docker
daemon state is broken, you can try to clear it. But be careful:
these commands will erase all containers and images from this host,
not just those started by Mesos!
docker kill $(docker ps -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)
Finally, restart the agent.
```
Thanks,
Armand Grillet
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Armand Grillet <ag...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/
-----------------------------------------------------------
(Updated Dec. 13, 2017, 4:04 p.m.)
Review request for mesos, Alexander Rukletsov and Benno Evers.
Changes
-------
Fixed issue.
Bugs: MESOS-8328
https://issues.apache.org/jira/browse/MESOS-8328
Repository: mesos
Description
-------
Add some steps to clean the Docker daemon
state used by the Docker containerizer.
Diffs (updated)
-----
src/slave/slave.cpp d997b4272578efffed05d38771f17df387ccac48
Diff: https://reviews.apache.org/r/64379/diff/3/
Changes: https://reviews.apache.org/r/64379/diff/2-3/
Testing (updated)
-------
New logs:
```
E1213 10:58:10.826020 10057 slave.cpp:6738] EXIT with status 1: Failed to perform recovery: <error>
If recovery failed due to a change in configuration and you want to
keep the current agent id, you might want to change the
`--reconfiguration_policy` flag to a more permissive value.
To restart this agent with a new agent id instead, do as follows:
rm -f /tmp/agent/meta/slaves/latest
This ensures that the agent does not recover old live executors.
If you were using the docker containerizer, you might want to clear
the docker daemon state. These commands will erase all containers
and images from this host, not just those started by Mesos!
docker kill $(docker ps -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)
Finally, restart the agent.
```
Thanks,
Armand Grillet
Re: Review Request 64379: Improved logs displayed after a slave failed
recovery.
Posted by Armand Grillet <ag...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64379/
-----------------------------------------------------------
(Updated Dec. 8, 2017, 1:23 p.m.)
Review request for mesos, Alexander Rukletsov and Benno Evers.
Changes
-------
Updated message.
Repository: mesos
Description
-------
Add some steps to clean the Docker daemon
state used by the Docker containerizer.
Diffs (updated)
-----
src/slave/slave.cpp 54d8bcc035227dd6896ffa6e692a91749c0b56a6
Diff: https://reviews.apache.org/r/64379/diff/2/
Changes: https://reviews.apache.org/r/64379/diff/1-2/
Testing
-------
None. The previous logs were:
```
Nov: To remedy this do as follows:
Nov: Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
Nov: This ensures agent doesn't recover old live executors.
Nov: Step 2: Restart the agent.
```
I have thus removed the tab before `This ensures agent doesn't recover` as it did not appear in the logs.
Thanks,
Armand Grillet