You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/09/11 22:01:05 UTC

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Andrew Wong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8032

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................

docs: clarify steps for removing master from multi-master deployment

The current docs for multi-master migration discuss moving up from a
single-master deployment to multi-master, but some users may want to
move in the other direction. We've had to rely on the existing docs and
have these users use their imagination to go through this. I've added
docs specifying the process and parameters to do so.

Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
---
M docs/administration.adoc
1 file changed, 24 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/8032/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8032

to look at the new patch set (#3).

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................

docs: clarify steps for changing master from multi-master deployment

The current docs for multi-master migration discuss moving up from a
single-master deployment to multi-master, but some users may want to
move in the other direction. We've had to rely on the existing docs and
have these users use their imagination to go through this. I've added
docs specifying the process and parameters to do so.

Additionally, this patch clarifies steps for multi-master recovery in
case the cluster was configured without DNS aliases.

Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
---
M docs/administration.adoc
1 file changed, 77 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/8032/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8032

to look at the new patch set (#2).

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................

docs: clarify steps for removing master from multi-master deployment

The current docs for multi-master migration discuss moving up from a
single-master deployment to multi-master, but some users may want to
move in the other direction. We've had to rely on the existing docs and
have these users use their imagination to go through this. I've added
docs specifying the process and parameters to do so.

Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
---
M docs/administration.adoc
1 file changed, 38 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/8032/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3:

(5 comments)

See the rendering here:
https://github.com/andrwng/kudu/blob/006ca06da2a91f178ba21a31fa19d01e710c9fd8/docs/administration.adoc

http://gerrit.cloudera.org:8080/#/c/8032/2/docs/administration.adoc
File docs/administration.adoc:

Line 379: master.
> Nit: other WARNING text begins with a capital letter. Below too.
Done


PS2, Line 382: this workflow without also restarting the live masters. As such, the workflow requires a
             : maintenance window, albeit a 
> You are technically correct (the best kind of correct) but there are nuance
I added a warning to ensure the leader will be kept (at the otherwise risk of sever data loss).


PS2, Line 382: this workflow without also restarting the live masters. As such, the workflow requires a
             : maintenance window, albeit a 
> Please double check this with Mike.
Done


PS2, Line 392: 
> nit: master nodes?
Removing this line since I agree with Adar.


PS2, Line 392: 
> I don't really understand why this instruction is worth including. Yes, it 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
             :   will be unavailable.
> So this works around KUDU-1620, right? What if we were to restart the remai
Original nodes A (dead), B*, C, and attempted to replace A with D. Tried going through this process and a few things to note:
* When I brought up D, both B and C's /masters pages successfully updated A's address to D's.
* Looking at B's logs, this was not the case; it was still trying to contact A, as expected.
* Looking at D's logs, I could see it losing a bunch of pre-elections since the remaining two masters already had a quorum (also, the D's web UI showed four masters, its UUID duplicated, both showing D's address).
* After updating the DNS aliases, I restarted C. Once it came up, B continue being leader, and D still was not allowed in.
* After restarting B, a C was elected, and the logs appeared normal across B, C*, D.
* Interestingly, at the end of this all, B's, C's, and D's web UIs all showed an exact duplicate for D (rpc address and all).

So it seems like nothing "goes wrong" with this approach, but I think while C was restarting, wewe were unavailable: single leader but no voters, and an effectively bricked replacement node, resulting in an extremely familiar window of unavailability of size <length of master restart>.

If, after I updated the DNS aliases, I'd restarted B* instead, would things have been different? With no leader, would we have been forced into an election? No; things would be the pretty much the same--D and C would not have been able to accept ops individually, and would not have elected a leader for the same unfortunate DNS alias reasons.

TL;DR: Doesn't seem like it.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8032/1/docs/administration.adoc
File docs/administration.adoc:

Line 374: ==== Removing masters from a Multi-Master Deployment
> Done.
I don't know; I think you should ask Mike for clarification. I don't think ksck can tell you which master is most 'up-to-date'.


http://gerrit.cloudera.org:8080/#/c/8032/2/docs/administration.adoc
File docs/administration.adoc:

Line 379: WARNING: in planning the new multi-master configuration, keep in mind that the number of masters
Nit: other WARNING text begins with a capital letter. Below too.


PS2, Line 382: WARNING: dropping the number of masters below the number of masters currently needed for a Raft
             : majority can incur data loss.
Please double check this with Mike.


PS2, Line 392: masters
> nit: master nodes?
I don't really understand why this instruction is worth including. Yes, it makes sense to ensure that the masters that you're removing aren't going to come back. But I think that's sort of implicit in these instructions, especially in the "Remove the directories..." step.

If you look at the other master workflows, none of them call this out explicitly, so at the very least, this workflow ought to be consistent with them.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
             :   will be unavailable.
> Thanks for running that experiment. What you observed makes sense.
> due to caching

What caching?


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/2/docs/administration.adoc
File docs/administration.adoc:

PS2, Line 382: WARNING: dropping the number of masters below the number of masters currently needed for a Raft
             : majority can incur data loss.
> Please double check this with Mike.
You are technically correct (the best kind of correct) but there are nuances we might consider explaining here.

Data loss is kind of a funny concept in this case because we're really talking about *metadata* loss, which can actually be catastrophic: for example it can cause the master to end up dropping whole tables, I think, depending on how far behind the remaining node was, like if it went offline before you created a new table or partition. However, I'm not sure whether we've ever tested this scenario and I'm pretty confident that we don't have an *automated* test for it either.

This is particularly scary if you had one master that was down or partitioned for a long time, and you end up removing the other two, and then this stale remaining master comes back online and is all we have left.

In general, if you remove more than a majority at once and then do some kind of manual repair, we don't give you any durability guarantees at all.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
             :   will be unavailable.
> Original nodes A (dead), B*, C, and attempted to replace A with D. Tried go
The problem is you only updated the config files on disk. You didn't add the configuration change to the WAL, and Raft replicates what's in the WAL. Come to think of it, it's quite dangerous to do it in a rolling fashion because the configuration is changing without updating the WAL and therefore 2 servers can have different configs but either one can get elected since they have the same last-logged opid in the WAL.

In short, until we support config change on the master we should bring down all of the masters before modifying all of their configs and bringing them all back up.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
             :   will be unavailable.
So this works around KUDU-1620, right? What if we were to restart the remaining masters one at a time? Would that allow us to avoid any downtime?


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8032/1/docs/administration.adoc
File docs/administration.adoc:

Line 374: ==== Removing masters from a Multi-Master Deployment
> You should incorporate the warning from earlier about how the number of mas
Done.

I left the warning nearly verbatim, do you think that's fine? Or is there a more actionable warning (e.g. run `ksck` and copy over the raft config from the most up-to-date master) that would be more fitting?


PS1, Line 376:  has
> nit: 'had' or just drop?
I think "been" is fine here; I'm using it here as a verb, as in:
"in the event that a multi-master deployment has been configured with too many nodes"


PS1, Line 376: t
> that a
Done


PS1, Line 381: 
> Is it worth adding a notion on disabling the kudu-master services for the m
Done.


Line 383: majority can incur data loss.
> Let's be a little more explicit and say that in order to remove the unwante
Done


Line 385: . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
> We should add a step to blow away the data/WAL directories of the unwanted 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


docs: clarify steps for changing master from multi-master deployment

The current docs for multi-master migration discuss moving up from a
single-master deployment to multi-master, but some users may want to
move in the other direction. We've had to rely on the existing docs and
have these users use their imagination to go through this. I've added
docs specifying the process and parameters to do so.

Additionally, this patch clarifies steps for multi-master recovery in
case the cluster was configured without DNS aliases.

Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Reviewed-on: http://gerrit.cloudera.org:8080/8032
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>
---
M docs/administration.adoc
1 file changed, 77 insertions(+), 9 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
             :   will be unavailable.
> In this DNS alias approach, though, there is no rewriting of configs any ma
Thanks for running that experiment. What you observed makes sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8032/1/docs/administration.adoc
File docs/administration.adoc:

PS1, Line 376:  has
> I think "been" is fine here; I'm using it here as a verb, as in:
SGTM


http://gerrit.cloudera.org:8080/#/c/8032/2/docs/administration.adoc
File docs/administration.adoc:

PS2, Line 392: masters
nit: master nodes?


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8032/1/docs/administration.adoc
File docs/administration.adoc:

PS1, Line 376: been
nit: 'had' or just drop?


PS1, Line 381: If using CM, remove the unwanted Kudu master roles.
Is it worth adding a notion on disabling the kudu-master services for the masters to be removed if not using CM?


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
             :   will be unavailable.
> The problem is you only updated the config files on disk. You didn't add th
In this DNS alias approach, though, there is no rewriting of configs any master. We just copy over the WAL and change the DNS aliases to point to the new master. Due to caching, the old masters don't see the new aliases, hence the need to restart.

Perhaps the above applies to the removal steps below though (which does have the user stop all masters first and rewrite configs).


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for changing master from multi-master deployment

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: clarify steps for changing master from multi-master deployment
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8032/3/docs/administration.adoc
File docs/administration.adoc:

PS3, Line 399: * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
             :   will be unavailable.
> > due to caching
Related to KUDU-1620, we don't rebuild the consensus peer proxies and we assume that peers will come back at the same locations, which isn't the always the case (e.g. if we change the DNS aliases to point to a different location). This "caching" of master locations necessitates restarting the masters to get the newly-updated DNS alias hostnames


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: clarify steps for removing master from multi-master deployment

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: docs: clarify steps for removing master from multi-master deployment
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8032/1/docs/administration.adoc
File docs/administration.adoc:

Line 374: ==== Removing masters from a Multi-Master Deployment
You should incorporate the warning from earlier about how the number of masters should be odd.

Also, since this workflow brings down all the cluster, add a step about establishing a maintenance window (see above).

Also, certain kinds of changes (i.e. dropping to a number of masters below the number of masters currently needed for a Raft majority) can incur data loss. We should find some way to incorporate that as a warning.


PS1, Line 376: a
that a


Line 383: . Rewrite the Raft configuration on the remaining masters to remove the unwanted masters. See Step
Let's be a little more explicit and say that in order to remove the unwanted masters, you rewrite the raft config to include just the remaining masters.


Line 385: 
We should add a step to blow away the data/WAL directories of the unwanted masters, to prevent them from coming back to life and interfering with the new Raft configuration.


-- 
To view, visit http://gerrit.cloudera.org:8080/8032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4196dbb2f8a185e868a6906c7cf917d79c404c0d
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes