You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by roshu10 <gi...@git.apache.org> on 2018/05/25 18:38:08 UTC
[GitHub] mesos pull request #292: Adding one more master to the cluster (quorum 1 to ...
GitHub user roshu10 opened a pull request:
https://github.com/apache/mesos/pull/292
Adding one more master to the cluster (quorum 1 to 2)
Hey,
Currently we are using 2 mesos master in our infrau. i.e quoram is 1.
We are planning to add one more master to the cluster to make quoram 2.
Does it need any downtime on our production ?
What strategies should we follow to avoid the downtime.
Thanks for any help !!
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/mesos master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/mesos/pull/292.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #292
----
commit 081c3114fefa18c6acd1e884e6d6583232e30d5c
Author: Harold Dost <h....@...>
Date: 2018-05-07T15:39:29Z
Documented the `--xfs-kill-containers` flag.
Added a description of the `--xfs-kill-containers` flag to the
`disk/xfs` isolator page and listed it in the upgrade documentation.
Review: https://reviews.apache.org/r/66975/
commit 01b618c416507b7d43bf435b870dc26b2361e9f4
Author: James Peach <jp...@...>
Date: 2018-05-07T16:03:42Z
Fixed a disk space check in the XFS tests.
The XFS tests were requiring that at least 2MB of data was written,
but the test can still correctly pass after only 1MB had been written.
Review: https://reviews.apache.org/r/66986/
commit aaf51e007044ad295949ad719776ef49d62dc3bf
Author: Qian Zhang <zh...@...>
Date: 2018-05-07T20:48:39Z
Increased the timeout for waiting for `reaped` to be invoked.
Previously after the container process is reaped by the Docker
executor, we will wait 3 seconds for `reaped` to be invoked.
However in some cases (e.g., launch a Docker container to use
an external rexray volume), there will be more than 3 seconds
after the container process exits and before the `docker run`
command returns (i.e., `reaped` invoked). So in this patch,
the timeout is increased to 60 seconds.
Review: https://reviews.apache.org/r/66947/
commit b5b970381bc8e616720bb9ff7917920f0ab0974c
Author: Gilbert Song <so...@...>
Date: 2018-05-07T21:11:30Z
Added MESOS-8876 to 1.6.0 CHANGELOG.
commit bad10a88f4a7c291add62dfb91d7c2c077582c44
Author: Gilbert Song <so...@...>
Date: 2018-05-07T21:12:45Z
Added MESOS-8876 to 1.5.1 CHANGELOG.
commit 8b5b35b9f3a792d18c6ddb5dadb0df0657eb4e25
Author: Gilbert Song <so...@...>
Date: 2018-05-07T21:13:42Z
Added MESOS-8876 to 1.4.2 CHANGELOG.
commit 36522a29adcffebcbaaee66d67ee66b40dcdee8f
Author: Gilbert Song <so...@...>
Date: 2018-05-07T21:14:00Z
Added MESOS-8876 to 1.3.3 CHANGELOG.
commit 0262b41f8e3b40c63c1de42d556241f889320e7d
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T01:09:02Z
Re-enable epoll support for libevent.
Epoll support was disabled due to some undocumented "issues". Since
the original author is not responsive and a lot of libevent / SSL
issues have been fixed, we can try re-enabling epoll support.
Should this be an issue, epoll can be disabled once again using the
EVENT_NOEPOLL environment variable.
Review: https://reviews.apache.org/r/66977
commit 44c1321827e25a2ee2210954b7d180bca8cf5232
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T02:03:07Z
Disabled debug mode for libevent.
Debug mode enables additional tracking in libevent to check for common
errors. It is recommended to "only enable debug mode when actually
debugging your program" because "tracking which events are initialized
requires that Libevent use extra memory and CPU".
http://www.wangafu.net/~nickm/libevent-book/Ref1_libsetup.html
We could consider introducing libevent flags in order to be able to
toggle this behavior with an environment variable since it appears
that libevent does not provide one. However, since I don't believe
these assertions have been of value, we can just remove the debug mode
for now.
Review: https://reviews.apache.org/r/66978
commit c692354d0f374de0c94cd04e8e1f7f00a1a1ba36
Author: Greg Mann <gr...@...>
Date: 2018-05-07T21:24:22Z
Updated Mesos version to 1.7.0.
commit f9aadd03011b0dfcbba4a60f0e40ac79e889e954
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:20:26Z
Added MESOS-8881 to the 1.6.0 CHANGELOG.
commit 55ef28564c077470729a5bf04ca1674a52c7c5d7
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:25:42Z
Added MESOS-8885 to the 1.6.0 CHANGELOG.
commit 17454a62bbe5c8b4cfabcbd0b64f22acc0cf8704
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:29:16Z
Added MESOS-8881 to the 1.5.1 CHANGELOG.
commit 91ae3eb7f722cec75479404364c2735ac7156507
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:29:34Z
Added MESOS-8885 to the 1.5.1 CHANGELOG.
commit 82ffb94650c3b059c0862e4017cd2240544a1c52
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:38:01Z
Added MESOS-8881 to the 1.4.2 CHANGELOG.
commit e4919c44b1ef180461b587892ea1d644b66a5112
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:38:16Z
Added MESOS-8885 to the 1.4.2 CHANGELOG.
commit e313487c04f30587c1a42d56fbb1cc15cc708b3d
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:46:18Z
Added MESOS-8881 to the 1.3.3 CHANGELOG.
commit b11f7aefe28d8e221d976ec9a73417661f5e4629
Author: Benjamin Mahler <bm...@...>
Date: 2018-05-07T21:46:26Z
Added MESOS-8885 to the 1.3.3 CHANGELOG.
commit 5b42b52f5c932ad0d32f9718d544f75b604cb508
Author: Xudong Ni <xu...@...>
Date: 2018-05-07T21:39:46Z
Failure to update registry should abort the master process.
When the registrar fails to update the registry it would abort the
actor and fail all future operations. However when the registrar
update is requested by an operator API such as a maintenance update,
the master process doesn't shut down (a 500 error is returned to the
client instead) and all subsequent operations will fail.
This patch fixes the specific maintenance API case but we can follow
up with other call sites or put a fix in for the registrar itself.
Review: https://reviews.apache.org/r/66919/
commit e6298aef83039dacc80b8e2a8778efacbaa63efc
Author: Jiang Yan Xu <xu...@...>
Date: 2018-05-08T00:05:06Z
Minor style fix.
commit 39b27e1bb90aab3f10c1203d8f4f65de4f32e774
Author: Greg Mann <gr...@...>
Date: 2018-05-08T00:31:55Z
Made the 'SchedulerDriver' abort when operation's 'id' field is set.
Since the 'SchedulerDriver' does not support operation status updates,
this patch adds a check to the driver which will abort the scheduler
if the 'id' field is set in an offer operation.
Review: https://reviews.apache.org/r/66938/
commit b4c541b4d9677e2b84d8538f319a3dfe7987e327
Author: Gaston Kleiman <ga...@...>
Date: 2018-05-08T00:32:15Z
Made the master drop operations with an ID on agent default resources.
Review: https://reviews.apache.org/r/66992/
commit a570f9436b816d40ba3d01455211f5d61f77d66d
Author: Gaston Kleiman <ga...@...>
Date: 2018-05-08T00:32:56Z
Made the master include the operation ID in OPERATION_DROPPED updates.
Review: https://reviews.apache.org/r/66924/
commit 9d897259a39dc9f90e8fad191732a3fe45d63458
Author: Gaston Kleiman <ga...@...>
Date: 2018-05-08T00:33:32Z
Prevented master from sending operation updates to v0 frameworks.
Review: https://reviews.apache.org/r/66995/
commit 52ae7f0e6dd6952d243c37e8b8aa98ce7752a17d
Author: Gaston Kleiman <ga...@...>
Date: 2018-05-08T00:33:56Z
Improved validation messages for some operations.
Review: https://reviews.apache.org/r/66939/
commit 9c54841cbdb77a5c8f5fba0089b70330eed2e80b
Author: Greg Mann <gr...@...>
Date: 2018-05-08T01:20:08Z
Added MESOS-8784 to the CHANGELOG.
commit 25176ed1b30a9f7fb82a71bca16a423343ba6d5c
Author: Benjamin Bannier <be...@...>
Date: 2018-05-08T15:58:12Z
Fixed flakiness in a `MasterSlaveReconciliationTest`.
The test `ReconcileDroppedOperation` uses detection of a
`ReconcileOperationsMessage` to confirm correct agent reregistration
behavior. For that it drops an operation on its way to the agent, and
then tries to observe the `ReconcileOperationsMessage` when the agent
reregisters after a simulated master failover.
Since `ReconcileOperationsMessage` is sent whenever the master detects
discrepancy between its own operation state of the agent and the
information sent by the agent in an `UpdateSlaveMessage` we need to
make sure to only drop the operation once the agent has sent the
update which is part of its initial registration sequence.
Review: https://reviews.apache.org/r/67003/
commit 6d97f68e5a4bbd22a0b72cf7c2c1826e45142de4
Author: Benno Evers <be...@...>
Date: 2018-05-08T17:00:53Z
Added an option to keep downloaded patches to apply-reviews.py.
By default, the apply-reviews.py script will delete all patch files
it downloads. However, when a patch fails to apply, it might be
desired to edit and apply it manually. This change will make it easier
to do so.
Review: https://reviews.apache.org/r/67004/
commit 86523d3157d36bdaf4f7ce8fe001ae241e690a5f
Author: Benno Evers <be...@...>
Date: 2018-05-08T17:02:51Z
Fixed flakyness in 'MasterAPITest.MasterFailover'.
This test used to be sporadically segfault as described in MESOS-8687.
The suspected cause is that a in a master actor, the `httpSequence`
field was lazily initialized in `ProcessBase::consume()` and afterwards
a call to `ProcessBase::_consume()` was dispatched, where it was
assumed that `httpSequence` is already initialized.
However, during this test the master actor would be destroyed and a
new actor would be spawned with the same PID. The dispatched method
would be called on this new actor and find `httpSequence` to be not
initialized, leading to a crash.
This patch introduces a call to `Clock::settle()` after the master
is shut down to ensure the outstanding `_consume()` gets discarded
before starting the new master actor.
Review: https://reviews.apache.org/r/66799/
commit 50f561a29a897004f5865aa8ee38ba5cf1e49410
Author: Jan Schlicht <ja...@...>
Date: 2018-05-09T10:38:56Z
Added token-based authentication for resource providers.
If a token is provided, it will be used in HTTP requests to the resource
provider manager. This allows JWT-based authentication and authorization
for resource providers.
The (unimplemented) credential support in `resource_provider::Driver`
has been removed in favor of the token-based approach.
Review: https://reviews.apache.org/r/66932/
----
---
[GitHub] mesos pull request #292: Adding one more master to the cluster (quorum 1 to ...
Posted by roshu10 <gi...@git.apache.org>.
Github user roshu10 closed the pull request at:
https://github.com/apache/mesos/pull/292
---