You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/09/06 22:07:12 UTC

[kudu-CR] docs: update disk failure recovery notes

Andrew Wong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7984

Change subject: docs: update disk failure recovery notes
......................................................................

docs: update disk failure recovery notes

Adds more context around disk failures and separates out steps to
rebuild a server with a different directory configuration.

Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
---
M docs/administration.adoc
1 file changed, 36 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/7984/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] docs: update disk failure recovery notes

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: docs: update disk failure recovery notes
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7984/1/docs/administration.adoc
File docs/administration.adoc:

Line 597: empty all of the server's existing directories. For example, if a tablet server
do you think a WARNING bar is appropriate here? I wonder if someone would follow these directions without making sure they have other good replicas or backups of their data?


PS1, Line 608: along with any newly-added data
             : directories
you mean 'and new data directories have been created with appropriate permissions' or something?


PS1, Line 630: suicide_on_eio=false`. When set, tablets with data on a failed disk
             : will not be opened and will be re-replicated as necessary
given that we are currently defaulting to spreading tablets across all disks, is this really relevant? ie this would start up with no tablets, but still have a bunch of data on those disks?

I also wonder whether we should be documenting this flag considering it is tagged experimental. Also I wonder if some people may find the name offensive/upsetting and we short rename to 'abort_on_eio'?


-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: docs: split disk failure from disk config changes
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7984/4//COMMIT_MSG
Commit Message:

PS4, Line 9: administartion
administration


-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: omit discussing disk failure in admin docs

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: docs: omit discussing disk failure in admin docs
......................................................................


Patch Set 3:

I'm not in favor of removing the 'Recovering from Disk Failure' section.  disk failure recovery is something we get a _lot_ of questions about, and the section was written directly in response to that.  I don't think it will be obvious to administrators to look at the 'Changing Directory Configurations' section when failure occurs.  Perhaps you could have both sections, with one referring to the other?  I think it's much more common for administrators to keep disk configs, anyway, when a disk goes down (it gets replaced).

-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7984

to look at the new patch set (#4).

Change subject: docs: split disk failure from disk config changes
......................................................................

docs: split disk failure from disk config changes

The administartion notes commented on Kudu's handling of disk failures
with instructions to rebuild a tserver with a new directory
configuration. While related, these two are separate and should be
documented as such.

Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
---
M docs/administration.adoc
1 file changed, 32 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/7984/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] docs: update disk failure recovery notes

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: update disk failure recovery notes
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7984/1/docs/administration.adoc
File docs/administration.adoc:

Line 597: empty all of the server's existing directories. For example, if a tablet server
> do you think a WARNING bar is appropriate here? I wonder if someone would f
Done


PS1, Line 608: along with any newly-added data
             : directories
> you mean 'and new data directories have been created with appropriate permi
Done


PS1, Line 630: suicide_on_eio=false`. When set, tablets with data on a failed disk
             : will not be opened and will be re-replicated as necessary
> given that we are currently defaulting to spreading tablets across all disk
Fair point about striping. Made it less of a point.

Also changed to --crash_on_eio; hopefully less controversial and made this patch dependent on the separate patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: update disk failure recovery notes

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7984

to look at the new patch set (#2).

Change subject: docs: update disk failure recovery notes
......................................................................

docs: update disk failure recovery notes

Adds more context around disk failures and separates out steps to
rebuild a server with a different directory configuration.

Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
---
M docs/administration.adoc
1 file changed, 38 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/7984/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7984

to look at the new patch set (#5).

Change subject: docs: split disk failure from disk config changes
......................................................................

docs: split disk failure from disk config changes

The administration notes commented on Kudu's handling of disk failures
with instructions to rebuild a tserver with a new directory
configuration. While related, these two are separate and should be
documented as such.

Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
---
M docs/administration.adoc
1 file changed, 32 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/7984/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: split disk failure from disk config changes
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7984/4//COMMIT_MSG
Commit Message:

PS4, Line 9: administartion
> administration
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: docs: split disk failure from disk config changes
......................................................................


Patch Set 4:

Fair point, took your suggestion and kept both.

I've seen a few questions about changing disk configurations (more than I've seen about disk failures, recently anyway), so I think it makes sense to have docs specifically for this (instead of pointing to "disk failure" docs when they're asking about how to add a disk).

-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: docs: split disk failure from disk config changes
......................................................................


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: docs: split disk failure from disk config changes
......................................................................


docs: split disk failure from disk config changes

The administration notes commented on Kudu's handling of disk failures
with instructions to rebuild a tserver with a new directory
configuration. While related, these two are separate and should be
documented as such.

Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Reviewed-on: http://gerrit.cloudera.org:8080/7984
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M docs/administration.adoc
1 file changed, 32 insertions(+), 19 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] docs: omit discussing disk failure in admin docs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7984

to look at the new patch set (#3).

Change subject: docs: omit discussing disk failure in admin docs
......................................................................

docs: omit discussing disk failure in admin docs

The administration notes commented on Kudu's inability to handle disk
failures, which will iteratively be fixed, and is somewhat orthogonal to
what the meat of the section describes (i.e. how to "rebuild" a tserver
with a new directory configuration).

Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
---
M docs/administration.adoc
1 file changed, 17 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/7984/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] docs: split disk failure from disk config changes

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: docs: split disk failure from disk config changes
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No