You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2018/01/09 20:12:30 UTC

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Hello Andrew Wong, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/8978

to review the following change.


Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. PathInstanceMetadataFile::CheckIntegrity can now be run without checking
   for missing instance files. This is used when removing a data dir (its
   instance file will be missing).
2. The DataDirManager and FsManager may now be constructed with both
   read_only=true and update_on_disk=true. The CLI tool leverages this
   functionality to create a "speculative" FsManager using the new data dir
   configuration that is used to check whether existing tablets depend on
   the data dir to be removed.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 317 insertions(+), 81 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc@699
PS8, Line 699: 
             : 
             :   // All instances are present and accounted for. Time to create the in-memory
             :   // data directory structures.
> Sorry I wasn't explicit, I meant moving this outside of the if block and re
Oh, I understand what you mean now. Yes, I agree this makes more sense. Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 22 Jan 2018 03:44:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8978

to look at the new patch set (#7).

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. PathInstanceMetadataFile::CheckIntegrity can now be run without checking
   for missing instance files. This is used when removing a data dir (its
   instance file will be missing).
2. The DataDirManager and FsManager may now be constructed with both
   read_only=true and update_on_disk=true. The CLI tool leverages this
   functionality to create a "speculative" FsManager using the new data dir
   configuration that is used to check whether existing tablets depend on
   the data dir to be removed.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 344 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 22 Jan 2018 03:52:38 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.h@220
PS1, Line 220: the DataDirManager is
             :   // opened as per the new contents of of the provided directories but
> nit: extra "of"
Changed to:

  // If 'update_on_disk' and 'read_only' are both true, the directory manager
  // allows the provided directories to deviate from their on-disk data
  // structures without updating the latter to match the former.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.h@223
PS1, Line 223:   bool update_on_disk;
> nit: should still note the default
Done


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc@657
PS1, Line 657: Note: If 'has_missing_instance' is true, opts_.update_on_disk is
             :   // guaranteed to be true.
> nit: Maybe note that this is due to the ternary parameterization of CheckIn
Hmm, I originally included this so that it isn't confusing that L659-660 omits a check for opts_.update_on_disk.

I'll change it as per your first suggestion.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc@805
PS1, Line 805: // This uuid's directory is missing outright, which can happen when
             :         // opts_.read_only and opts_.update_on_disk are both true.
> After another look, I think this is ok.
OK, I took another stab at this comment. Let me know what you think.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/fs_manager.h@102
PS1, Line 102:   bool update_on_disk;
> nit: note the default
Done


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc@2232
PS1, Line 2232: ASSERT_OK(mts->WaitStarted());
> Could we check that we still have all the tablets we expected?
We could, but is it necessary? We wouldn't expect tablets to just disappear, right? We don't check them after adding a data directory above.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc@2240
PS1, Line 2240:   // Reconfigure the tserver to drop the data directory and restart it.
> nit: mind adding to the comment something along the lines of, "waiting for 
Yeah, it's not intuitive that WaitStarted will return the first bootstrap failure. I'll note that.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/tool_action_fs.cc
File src/kudu/tools/tool_action_fs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/tool_action_fs.cc@854
PS1, Line 854: "Updates the set of data directories in an existing Kudu filesystem")
             :       .ExtraDescription("If a data directory is in use by a tablet and is "
             :           "removed, the operation will fail unless --force is also used")
             :       .AddOptionalParameter("fs_wal_dir")
             :       .AddOptionalParameter("fs_data_dirs")
             :       .AddOptionalParameter("force", boost::none, string("If true, permits "
             :           "the removal of a data directory that is configured for use by "
             :           "existing tablets. Those tablets will fail the next time the server "
             :           "is started"))
> micro-nit: I know in docs we favor being explicit, that local replicas are 
I'm not sure; as you can see, List uses 'tablets' instead of 'local replicas'.



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 11 Jan 2018 00:13:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8978

to look at the new patch set (#4).

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. PathInstanceMetadataFile::CheckIntegrity can now be run without checking
   for missing instance files. This is used when removing a data dir (its
   instance file will be missing).
2. The DataDirManager and FsManager may now be constructed with both
   read_only=true and update_on_disk=true. The CLI tool leverages this
   functionality to create a "speculative" FsManager using the new data dir
   configuration that is used to check whether existing tablets depend on
   the data dir to be removed.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 343 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 8: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 20 Jan 2018 00:49:18 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8978/4/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/4/src/kudu/fs/data_dirs.cc@810
PS4, Line 810: list of data directories now
             :         // includes the missing directory.
> nit: If I'm understanding this correctly, the update should have removed th
Ah yes, will update.



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 16 Jan 2018 19:17:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.h@222
PS3, Line 222: without updating the latter
> nit: the naming here gives me pause since the name is no longer accurate, b
Yeah, I couldn't think of a better name either.


http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.cc@810
PS3, Line 810:  the missing directory made its way
             :         // into the list of data directories.
> nit: at first glance, I read this as, "We have a missing directory and it i
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 11 Jan 2018 03:47:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 9: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc@699
PS8, Line 699: RETURN_NOT_OK_PREPEND(
             :         PathInstanceMetadataFile::CheckIntegrity(loaded_instances),
             :         Substitute("could not verify integrity of files after data directory update: $0",
             :                    JoinStrings(GetDataDirs(), ",")));
> Yeah the added encapsulation isn't a bad thing, but bear in mind that we'll
Sorry I wasn't explicit, I meant moving this outside of the if block and removing the check at L651.



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 22 Jan 2018 01:16:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 8: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc@699
PS8, Line 699: RETURN_NOT_OK_PREPEND(
             :         PathInstanceMetadataFile::CheckIntegrity(loaded_instances),
             :         Substitute("could not verify integrity of files after data directory update: $0",
             :                    JoinStrings(GetDataDirs(), ",")));
nit: could merge this CheckIntegrity with the above with:

 if (opts_.consistency_check != ConsistencyCheckBehavior::IGNORE_INCONSISTENCY) {
   RETURN_NOT_OK(CheckIntegrity());
 }

This seems more explicitly aligned with the ConsistencyCheckBehavior enums (ie if we're ignoring inconsistencies, there's no need to check integrity). I guess with the current approach you can distinguish the error messages. Not a critical change at all, so up to you.


http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc@812
PS8, Line 812: IGNORE_CONSISTENCY
nit: IGNORE_INCONSISTENCY



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 20 Jan 2018 01:13:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc@805
PS1, Line 805: // This uuid's directory is missing outright, which can happen when
             :         // opts_.read_only and opts_.update_on_disk are both true.
> So this can only happen when we open our "staging" FsManager and we're miss
After another look, I think this is ok.

Since we're only opening up the speculative FsManager, we shouldn't need to worry about potential mismatched UUIDs (e.g. if there's a missing directory AND a failed directory, we might end up assigning the failed directory the wrong UUID).

When we open the FsManager up for real, this shouldn't be the case because at that point, we'll update the path_sets and shouldn't reach this code block. Can you add a comment explaining why this missing directory doesn't matter?



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Jan 2018 19:57:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 7:

(1 comment)

With the metadata dir patch in, it's probably worth at least discussing (and maybe testing) the ways in which affects the remove tool's behavior. I don't think much changes, but something like:

- Metadata dirs in WAL dir: removing any data dir is fine. No configuration changes necessary.
- Metadata dirs in the first data dir: This is also fine, but further deployments must now specify the metadata dir, as it is no longer referenced in fs_data_dirs.
- Metadata dirs in any other directory: same as the WAL case.

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.cc@347
PS7, Line 347:   if (!missing_roots.empty() && !opts_.read_only) {
> huh, is this an existing bug? if you use an fs_manager tool and pass an ext
missing_roots will only be non-empty if opts.update_on_disk is true (as per L292). This should probably be in a comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 18 Jan 2018 22:44:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.h@104
PS7, Line 104:   // If 'update_on_disk' and 'read_only' are both true, the FsManager allows
             :   // 'data_roots' to deviate from their on-disk data structures without
             :   // updating the latter to match the former.
> I'm not really a fan of these semantics -- it seems somewhat confusing to b
I had been thinking of just renaming update_on_disk to something like ignore_inconsistencies (and doc that it may update the on-disk structures), but I like your suggestion more. Done.


http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.cc@347
PS7, Line 347:   if (!missing_roots.empty() && !opts_.read_only) {
> missing_roots will only be non-empty if opts.update_on_disk is true (as per
Yeah, I had added a DCHECK and comment to that effect to the equivalent change in data_dirs.cc, but not here:

  // Add new or remove existing data directories, if desired.
  if (!opts_.read_only &&
      (!missing_roots.empty() || has_missing_instance)) {
    DCHECK(opts_.update_on_disk); // guaranteed by LoadInstances and
                                  // the ternary in CheckIntegrity above

Anyway, we don't need the read_only check any more because read_only + update_on_disk is forbidden once again. I'll update this.


http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/kudu-tool-test.cc@2251
PS7, Line 2251:   // second table should have all failed.
> can we add one more step in this test which re-adds the force-removed direc
It should work, but the current code fails in the integrity checks when re-adding a previously removed data directory.

So I modified DataDirManager::Open to elide all integrity checks when updating. This also allowed me to get rid of the CheckIntegrity changes which were pretty ugly to begin with. I also beefed up this test and added a couple more fs_manager-test cases.

Note that in real life, although the tserver can unfail these tablets, the master may kick off rereplication for them anyway.


http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/tool_action_fs.cc
File src/kudu/tools/tool_action_fs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/tool_action_fs.cc@330
PS7, Line 330:   // If the user specifies --force, we assume they know what they're doing and
             :   // skip this check.
> would it be possible to do the check in a new function, and something like:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 20 Jan 2018 00:12:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8978

to look at the new patch set (#3).

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. PathInstanceMetadataFile::CheckIntegrity can now be run without checking
   for missing instance files. This is used when removing a data dir (its
   instance file will be missing).
2. The DataDirManager and FsManager may now be constructed with both
   read_only=true and update_on_disk=true. The CLI tool leverages this
   functionality to create a "speculative" FsManager using the new data dir
   configuration that is used to check whether existing tablets depend on
   the data dir to be removed.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 343 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8978

to look at the new patch set (#8).

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. The DataDirManager and FsManager's update_on_disk boolean option has been
   converted into a tri-state. The new IGNORE_INCONSISTENCY state is used by
   the CLI tool to create a "speculative" FsManager using the new data dir
   configuration. This FsManager is used to check whether existing tablets
   depend on the data dir to be removed without making destructive changes
   to on-disk data structures.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
7 files changed, 443 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.h@220
PS1, Line 220: the DataDirManager is
             :   // opened as per the new contents of of the provided directories but
nit: extra "of"

Also it wasn't immediately clear what "new contents of the provided directories" was referring to. Maybe consider rewording "...the DataDirManager is opened to reflect the provided directories but..." or something? (in fs_manager.cc too)


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.h@223
PS1, Line 223:   bool update_on_disk;
nit: should still note the default


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc@657
PS1, Line 657: Note: If 'has_missing_instance' is true, opts_.update_on_disk is
             :   // guaranteed to be true.
nit: Maybe note that this is due to the ternary parameterization of CheckIntegrity?

Or note why this is important, e.g. "...so we can go ahead an update the instance files".


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc@805
PS1, Line 805: // This uuid's directory is missing outright, which can happen when
             :         // opts_.read_only and opts_.update_on_disk are both true.
So this can only happen when we open our "staging" FsManager and we're missing a directory?

I need to ponder this a bit more, not sure I've fully grokked this snippet.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/fs_manager.h@102
PS1, Line 102:   bool update_on_disk;
nit: note the default


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc@2232
PS1, Line 2232: ASSERT_OK(mts->WaitStarted());
Could we check that we still have all the tablets we expected?


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc@2240
PS1, Line 2240:   // Reconfigure the tserver to drop the data directory and restart it.
nit: mind adding to the comment something along the lines of, "waiting for the tablets to bootstrap should fail"?

WaitStarted() and Start() are so similar, I was a surprised for a second that WaitStarted() would fail until I realized that we're waiting for the bootstraps and not for the FsManager to start up.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/tool_action_fs.cc
File src/kudu/tools/tool_action_fs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/tool_action_fs.cc@854
PS1, Line 854: "Updates the set of data directories in an existing Kudu filesystem")
             :       .ExtraDescription("If a data directory is in use by a tablet and is "
             :           "removed, the operation will fail unless --force is also used")
             :       .AddOptionalParameter("fs_wal_dir")
             :       .AddOptionalParameter("fs_data_dirs")
             :       .AddOptionalParameter("force", boost::none, string("If true, permits "
             :           "the removal of a data directory that is configured for use by "
             :           "existing tablets. Those tablets will fail the next time the server "
             :           "is started"))
micro-nit: I know in docs we favor being explicit, that local replicas are called replicas and tablets are a logical concept. Does the same apply for tooling?



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Jan 2018 03:22:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 2: Verified+1

The test failures were a batch of clock synchronization errors on one distributed slave.


-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Jan 2018 00:12:00 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.h@104
PS7, Line 104:   // If 'update_on_disk' and 'read_only' are both true, the FsManager allows
             :   // 'data_roots' to deviate from their on-disk data structures without
             :   // updating the latter to match the former.
I'm not really a fan of these semantics -- it seems somewhat confusing to both ask to "update" and also to "not write". I prefer read_only to just enforce a sort of "fail any write ops" rather than in some cases also meaning "skip the write op."

Would it be feasible to change this to some kind of enum like:

enum ConsistencyCheckBehavior {
  // If data_roots doesn't match the on-disk path sets, fail.
  ENFORCE_CONSISTENCY,

  // If data_roots doesn't match the on-disk path sets, update the
  // on-disk data to match. Requires 'read_only' to be false.
  UPDATE_ON_DISK,

  // If data_roots doesn't match the on-disk path sets,
  // continue without updating the on-disk data.
  IGNORE_INCONSISTENCY
};


http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/fs/fs_manager.cc@347
PS7, Line 347:   if (!missing_roots.empty() && !opts_.read_only) {
huh, is this an existing bug? if you use an fs_manager tool and pass an extra dir, it gets created?


http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/kudu-tool-test.cc@2251
PS7, Line 2251:   // second table should have all failed.
can we add one more step in this test which re-adds the force-removed directory? Would that properly re-enable the missing tablets? Seems like a useful "oops!" recovery option for an admin who runs the wrong command.


http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/tool_action_fs.cc
File src/kudu/tools/tool_action_fs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/7/src/kudu/tools/tool_action_fs.cc@330
PS7, Line 330:   // If the user specifies --force, we assume they know what they're doing and
             :   // skip this check.
would it be possible to do the check in a new function, and something like:

Status check_result = CheckTabletsWontFailWithUpdate(...);
if (FLAGS_force) {
   WARN_NOT_OK(check_result, "blah blah continuing because you said force");
} else {
  RETURN_NOT_OK_PREPEND(check_result, "blah blah");
}

This way if someone uses --force and then pastes us the logs of what they did, we'll at least see what happened



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 18 Jan 2018 19:35:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc@699
PS8, Line 699: RETURN_NOT_OK_PREPEND(
             :         PathInstanceMetadataFile::CheckIntegrity(loaded_instances),
             :         Substitute("could not verify integrity of files after data directory update: $0",
             :                    JoinStrings(GetDataDirs(), ",")));
> nit: could merge this CheckIntegrity with the above with:
Yeah the added encapsulation isn't a bad thing, but bear in mind that we'll only end up down here if we're in UPDATE_ON_DISK mode (L659), so the consistency_check condition is guaranteed to be true.


http://gerrit.cloudera.org:8080/#/c/8978/8/src/kudu/fs/data_dirs.cc@812
PS8, Line 812: IGNORE_CONSISTENCY
> nit: IGNORE_INCONSISTENCY
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sun, 21 Jan 2018 20:14:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8978/4/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/4/src/kudu/fs/data_dirs.cc@810
PS4, Line 810: list of data directories now
             :         // includes the missing directory.
nit: If I'm understanding this correctly, the update should have removed the UUID, so maybe replace "includes" with "excludes"?



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 13 Jan 2018 03:42:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8978

to look at the new patch set (#10).

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. The DataDirManager and FsManager's update_on_disk boolean option has been
   converted into a tri-state. The new IGNORE_INCONSISTENCY state is used by
   the CLI tool to create a "speculative" FsManager using the new data dir
   configuration. This FsManager is used to check whether existing tablets
   depend on the data dir to be removed without making destructive changes
   to on-disk data structures.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
7 files changed, 441 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/10
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. The DataDirManager and FsManager's update_on_disk boolean option has been
   converted into a tri-state. The new IGNORE_INCONSISTENCY state is used by
   the CLI tool to create a "speculative" FsManager using the new data dir
   configuration. This FsManager is used to check whether existing tablets
   depend on the data dir to be removed without making destructive changes
   to on-disk data structures.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Reviewed-on: http://gerrit.cloudera.org:8080/8978
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
7 files changed, 441 insertions(+), 92 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 11
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8978

to look at the new patch set (#5).

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. PathInstanceMetadataFile::CheckIntegrity can now be run without checking
   for missing instance files. This is used when removing a data dir (its
   instance file will be missing).
2. The DataDirManager and FsManager may now be constructed with both
   read_only=true and update_on_disk=true. The CLI tool leverages this
   functionality to create a "speculative" FsManager using the new data dir
   configuration that is used to check whether existing tablets depend on
   the data dir to be removed.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
10 files changed, 344 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8978

to look at the new patch set (#9).

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................

KUDU-2202: support for removing data directories (take two)

This commit extends "kudu fs update_dirs" to allow the removal of data
directories. A data dir may be removed provided there are no tablets
configured to use it. The --force flag may be used to override this safety
check; tablets that depend on the removed data dir will fail to bootstrap
the next time the server is started.

I'll be the first to admit that this functionality isn't terribly useful at
the moment because Kudu deployments are configured to stripe data across all
data directories, so removing a data dir will fail all tablets, which is
equivalent to reformatting the node. Nevertheless, it will become more
useful as users customize the new --fs_target_data_dirs_per_tablet gflag.

This is the second attempt at implementing this functionality. The first
used a different interface and was vulnerable to a data loss issue
documented in KUDU-2202. Andrew fixed that in commit 47b81c4, which reopens
the door for safely removing data directories.

I considered a stronger check that only prohibits data dir removal if there
were data blocks on the to-be-removed data dir, but decided against it:
- It's rare to find a tablet configured to use a data dir but without any
  actual blocks on it. In fact, that's only likely for empty tables, or
  tables with empty range partitions.
- We'd still need to rewrite any unaffected tablet superblocks and remove
  the data dir from their data dir groups.

There are several interesting modifications here:
1. The DataDirManager and FsManager's update_on_disk boolean option has been
   converted into a tri-state. The new IGNORE_INCONSISTENCY state is used by
   the CLI tool to create a "speculative" FsManager using the new data dir
   configuration. This FsManager is used to check whether existing tablets
   depend on the data dir to be removed without making destructive changes
   to on-disk data structures.
3. The DataDirManager now checks for integrity after making on-disk changes.
   While not strictly necessary, this seems like a good sanity check to
   avoid introducing bugs.
4. The "kudu fs update_dirs" action was updated for data dir removal. It now
   accepts the --force option, without which removal will only be allowed if
   no tablets are configured to use the removed data dir.

Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
7 files changed, 443 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/8978/9
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202: support for removing data directories (take two)

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8978 )

Change subject: KUDU-2202: support for removing data directories (take two)
......................................................................


Patch Set 3:

(5 comments)

Just a couple more nits from me

http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.h@222
PS3, Line 222: without updating the latter
nit: the naming here gives me pause since the name is no longer accurate, but I can't settle on a more fitting name. Maybe update_instances? update_directories? Though maybe this comment is clear enough to not warrant any changes.


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/fs/data_dirs.cc@805
PS1, Line 805: // happen when opts_.read_only and opts_.update_on_disk are true (i.e.
             :         // if this DataDirManager is trialing a new data dir confi
> OK, I took another stab at this comment. Let me know what you think.
Much clearer, thanks!


http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/3/src/kudu/fs/data_dirs.cc@810
PS3, Line 810:  the missing directory made its way
             :         // into the list of data directories.
nit: at first glance, I read this as, "We have a missing directory and it is added to the list of data directories". Maybe flip it for clarity: "...the list of [available uuids?] accounts for the missing directory".


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/kudu-tool-test.cc@2232
PS1, Line 2232: // Make sure the failure reall
> We could, but is it necessary? We wouldn't expect tablets to just disappear
Ack


http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/tool_action_fs.cc
File src/kudu/tools/tool_action_fs.cc:

http://gerrit.cloudera.org:8080/#/c/8978/1/src/kudu/tools/tool_action_fs.cc@854
PS1, Line 854: "Updates the set of data directories in an existing Kudu filesystem")
             :       .ExtraDescription("If a data directory is in use by a tablet and is "
             :           "removed, the operation will fail unless --force is also used")
             :       .AddOptionalParameter("fs_wal_dir")
             :       .AddOptionalParameter("fs_data_dirs")
             :       .AddOptionalParameter("force", boost::none, string("If true, permits "
             :           "the removal of a data directory that is configured for use by "
             :           "existing tablets. Those tablets will fail the next time the server "
             :           "is started"))
> I'm not sure; as you can see, List uses 'tablets' instead of 'local replica
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/8978
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacb650115bcdf055542ef2777641a3201fc8e30b
Gerrit-Change-Number: 8978
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 11 Jan 2018 00:58:45 +0000
Gerrit-HasComments: Yes