You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2016/04/05 19:16:25 UTC
[jira] [Comment Edited] (MESOS-5114) empty quorum config causes
masters to fail replica recovery and fail
[ https://issues.apache.org/jira/browse/MESOS-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225569#comment-15225569 ]
Michael Park edited comment on MESOS-5114 at 4/5/16 5:16 PM:
-------------------------------------------------------------
[~jieyu] In the test that you posted above, it should be an error caused during {{load}} rather than {{Option}} being set to {{None}}.
{{None}} should represent an absence of the value, as opposed to a parse failure from a specified value.
{code}
TEST(FlagsTest, LoadFromEnvironmentEmptyInteger)
{
TestFlags flags;
Option<int> name6;
flags.add(&name6,
"name6",
"Optional name6");
os::setenv("FLAGSTEST_name6", "");
Try<Nothing> load = flags.load("FLAGSTEST_");
EXPECT_ERROR(load);
os::unsetenv("FLAGSTEST_name6");
}
{code}
As we discussed briefly offline, {{!in.good() ||| !in.eof()}} will always return {{true}}. Refer to the matrix of states in http://en.cppreference.com/w/cpp/io/ios_base/iostate.
{{good()}} is a pretty useless state to check for, what we really want is to use the {{operator bool()}} along with the {{eof}} check to make sure that there are no left-over value.
{code}
if (in && in.eof()) {
return t;
}
return Error(...);
{code}
was (Author: mcypark):
[~jieyu] In the test that you posted above, it should be an error caused during {{load}} rather than {{Option}} being set to {{None}}.
{{None}} should represent an absence of the value, as opposed to a parse failure from a specified value.
{code}
TEST(FlagsTest, LoadFromEnvironmentEmptyInteger)
{
TestFlags flags;
Option<int> name6;
flags.add(&name6,
"name6",
"Optional name6");
os::setenv("FLAGSTEST_name6", "");
Try<Nothing> load = flags.load("FLAGSTEST_");
EXPECT_ERROR(load);
}
{code}
As we discussed briefly offline, {{!in.good() ||| !in.eof()}} will always return {{true}}. Refer to the matrix of states in http://en.cppreference.com/w/cpp/io/ios_base/iostate.
{{good()}} is a pretty useless state to check for, what we really want is to use the {{operator bool()}} along with the {{eof}} check to make sure that there are no left-over value.
{code}
if (in && in.eof()) {
return t;
}
return Error(...);
{code}
> empty quorum config causes masters to fail replica recovery and fail
> --------------------------------------------------------------------
>
> Key: MESOS-5114
> URL: https://issues.apache.org/jira/browse/MESOS-5114
> Project: Mesos
> Issue Type: Bug
> Components: stout
> Affects Versions: 0.23.1, 0.24.1, 0.25.0, 0.26.0, 0.28.0, 0.27.2
> Environment: CentOS 7.1
> Reporter: Cosmin Lehene
> Assignee: Michael Park
> Labels: mesosphere
> Fix For: 0.26.1, 0.25.1, 0.24.2, 0.28.1, 0.27.3, 0.23.2
>
>
> A missing default for quorum size has generated the following master config
> {code}
> MESOS_WORK_DIR="/var/lib/mesos/master"
> MESOS_ZK="zk://zk1:2181,zk2:2181,zk3:2181/mesos"
> MESOS_QUORUM=
> MESOS_PORT=5050
> MESOS_CLUSTER="mesos"
> MESOS_LOG_DIR="/var/log/mesos"
> MESOS_LOGBUFSECS=1
> MESOS_LOGGING_LEVEL="INFO"
> {code}
> This was causing each elected leader to attempt replica recovery.
> E.g. {{group.cpp:700] Trying to get '/mesos/log_replicas/0000000012' in ZooKeeper}}
> And eventually:
> {{master.cpp:1458] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins}}
> Full log on one of the masters https://gist.github.com/clehene/09a9ddfe49b92a5deb4c1b421f63479e
> All masters and zk nodes were reachable over the network.
> Also once the quorum was configured the master recovery protocol finished gracefully.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)