You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/11/03 23:00:12 UTC

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8465


Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but it may not
know about some blocks, and may end up assigning them to a new tablet.
If the blocks ID is used by two tablets, we can get into situations
where deleting one will delete data for the other.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. The largest block ID is kept track of when opening and reading a
tablet's metadata. Once all rowsets and orphaned blocks have been seen,
the block manager is notified of the highest block ID in the tablet's
superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

An integration test is added to ts_recovery-itest to ensure block ID
reuse does not happen.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet_metadata.cc
11 files changed, 210 insertions(+), 58 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8465

to look at the new patch set (#2).

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but it may not
know about some blocks, and may end up assigning them to a new tablet.
If the blocks ID is used by two tablets, we can get into situations
where deleting one will delete data for the other.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. The largest block ID is kept track of when opening and reading a
tablet's metadata. Once all rowsets and orphaned blocks have been seen,
the block manager is notified of the highest block ID in the tablet's
superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

An integration test is added to ts_recovery-itest to ensure block ID
reuse does not happen.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
14 files changed, 213 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG@16
PS5, Line 16: for new
> for a new
Missed this, sorry



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 04 Nov 2017 05:01:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/file_block_manager.cc
File src/kudu/fs/file_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/file_block_manager.cc@942
PS1, Line 942: void FileBlockManager::NotifyBlockId(const BlockId& /* block_id */) {
> warning: parameter 'block_id' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@103
PS1, Line 103: using tablet::TabletSuperBlockPB;
> warning: using decl 'Substitute' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@104
PS1, Line 104: 
> warning: using decl 'RowSetDataPB' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/rowset_metadata.cc
File src/kudu/tablet/rowset_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/rowset_metadata.cc@49
PS1, Line 49:   *metadata = std::move(ret);
> warning: prefer ptr1 = std::move(ptr2) over ptr1.reset(ptr2.release()) [mis
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@678
PS1, Line 678:   AtomicWord rowset_idx = Barrier_AtomicIncrement(&next_rowset_idx_, 1) - 1;
> warning: parameter 'schema' is unused [misc-unused-parameters]
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:32:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but will fail any
tablet configured to use that directory. By not scanning the failed
directory, the server will "forget" about some blocks.

Since LBM block IDs are sequentially allocated just past the end of the
greatest known ID, it's possible for a new tablet to create blocks with
IDs already assigned to a failed tablet. If that failed tablet is
deleted, the live data belonging to the new tablet will be deleted.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. In loading a tablet's metadata, once all rowsets and orphaned
blocks have been seen, the block manager is notified of the highest
block ID in the superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

A unit test is added to log_block_manager-test and an integration test
is added to ts_recovery-itest to ensure block ID reuse does not happen.
Also includes some C++11/stylistic cleanup.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Reviewed-on: http://gerrit.cloudera.org:8080/8465
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
15 files changed, 247 insertions(+), 58 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 06 Nov 2017 23:03:37 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h@264
PS5, Line 264: should use
> The new wording isn't quite right; it suggests that the block manager itsel
Done


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc@500
PS5, Line 500:   // Simulate a complete reset of the block manager's block ID record, e.g.
             :   // restarting but with all the blocks gone.
             :   bm_->next_block_id_.Store(1);
> Rather than simulating it in this way, how about deleting the containers fr
I agree that'd be not too complicated: DoGetContainers() and then delete some files.

There's a hack in log_block_manager.cc:1706 that makes this not the case.

I could maybe add a new block manager option to turn of the seeding, but I'll hold off given we get the same behavior reaching in.


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc@149
PS5, Line 149:   // Set up a basic server that flushes often so we create blocks quickly.
> BTW, it might be useful to wait for some compactions too, or some other ope
Done


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@403
PS5, Line 403:     BlockId max_block_id;
> Probably worth adding a comment to this block of code (where you aggregate 
Done


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@405
PS5, Line 405:     if (!block_ids.empty()) {
> Is this check needed?
With some restructuring, no. But there could be no block_ids, and there would be a segfault a L406.


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@418
PS5, Line 418:     fs_manager()->block_manager()->NotifyBlockId(max_block_id);
> What happens if we make it down here and max_block_id is still uninitialize
The default is 0 (invalid), so it should no-op.



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 04 Nov 2017 04:05:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8465

to look at the new patch set (#4).

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but will fail any
tablet configured to use that directory. By not scanning the failed
directory, the server will "forget" about some blocks.

Since LBM block IDs are sequentially allocated just past the end of the
greatest known ID, it's possible for new tablet to create blocks with
IDs already assigned to a failed tablet. If that failed tablet is
deleted, the live data belonging to the second tablet will be deleted.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. In loading a tablet's metadata, once all rowsets and orphaned
blocks have been seen, the block manager is notified of the highest
block ID in the its superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

A unit test is added to log_block_manager-test and an integration test
is added to ts_recovery-itest to ensure block ID reuse does not happen.
Also includes some C++11/stylistic cleanup.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
15 files changed, 240 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 7:

(3 comments)

Looks good, just some nits on comments.

http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG@24
PS7, Line 24: the
Double the


http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h@265
PS7, Line 265:   // externally-referenced ids that they may not have previously found.
Getting better. Perhaps it would be good to add an example that will help people understand how the block manager may have previously not found a block? Like "...avoid reusing externally-referenced ids that they may not have previously found (e.g. because those ids' blocks were on a data directory that failed)".


http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc@404
PS7, Line 404: if a disk
             :     // was not read
Can you reword this example so it's more relatable to the relevant Kudu concepts? Like, it should probably talk about data directories rather than disks, and maybe it should explain why a directory would not be "read" (because it failed).



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 04 Nov 2017 05:16:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc@420
PS8, Line 420:     fs_manager()->block_manager()->NotifyBlockId(max_block_id);
> it seems possible that max_block_id is uninitialized at this point if there
Yeah, it gets initialized to 0 (see fs/block_id.cc:32)



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 06 Nov 2017 22:49:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8465

to look at the new patch set (#3).

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but will fail any
tablet configured to use that directory. By not scanning the failed
directory, the server will "forget" about some blocks.

Since LBM block IDs are sequentially allocated just past the end of the
greatest known ID, it's possible for new tablet to create blocks with
IDs already assigned to a failed tablet. If that failed tablet is
deleted, the live data belonging to the second tablet will be deleted.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. The largest block ID is kept track of when opening and reading a
tablet's metadata. Once all rowsets and orphaned blocks have been seen,
the block manager is notified of the highest block ID in the tablet's
superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

A unit test is added to log_block_manager-test and an integration test
is added to ts_recovery-itest to ensure block ID reuse does not happen.
Also includes some C++11/stylistic cleanup.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
15 files changed, 237 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 1:

(11 comments)

I think it would also be nice to have an LBM-level test that ensures block IDs are not reused. It should be far simpler to write.

http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG@10
PS1, Line 10: If the opening of a directory fails (e.g.
            : due to a disk failure), the server can still start up, but it may not
            : know about some blocks, and may end up assigning them to a new tablet.
            : If the blocks ID is used by two tablets, we can get into situations
            : where deleting one will delete data for the other.
I'd say that the server will still start up but fail any tablet configured to use that directory. Additionally, by not scanning the failed directory, the server will "forget" about some blocks, and since LBM block IDs are sequentially allocated beginning just past the block with the greatest ID, it's possible for a second tablet to create a block and be allocated a block ID assigned to a failed tablet. Then, if the failed tablet is deleted, it'll also delete live data belonging to that second tablet.

You can put this more concisely, but the keys are to talk about how block ID selection works, and to draw a more explicit connection between the reuse by a second tablet and the data corruption when deleting the failed tablet.


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@264
PS1, Line 264: can
should


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@266
PS1, Line 266:   virtual void NotifyBlockId(const BlockId& block_id) = 0;
Since BlockIds are basically uint64_ts, we should just pass them by copy.


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@144
PS1, Line 144:   // Set up a basic multi-directory server with some rows that flushes often so
             :   // we get some blocks.
Where's the multi-directory part?


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@199
PS1, Line 199: 
             :   // Now empty the data directories of blocks. The server will start up, not
             :   // read any blocks, but should still avoid the old tablet's blocks.
This will fail the old tablet, right? Is that worth asserting for?


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@207
PS1, Line 207: "block_manager_instance"
You can get this special string from data_dirs.h (or should be able to, anyway).


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@208
PS1, Line 208: WARN_NOT_OK
ASSERT_OK?


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@233
PS1, Line 233:   LOG(INFO) << "Intersection: " << BlockId::JoinStrings(block_id_intersection);
You could SCOPED_TRACE this (and the first/second sets) to avoid logging this except in the event of failure.


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@233
PS1, Line 233:   LOG(INFO) << "Intersection: " << BlockId::JoinStrings(block_id_intersection);
You could SCOPED_TRACE this (and the first/second sets) to avoid logging this except in the event of failure.


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@399
PS1, Line 399:       RETURN_NOT_OK(RowSetMetadata::Load(this, rowset_pb, &max_block_id, &rowset_meta));
I'm curious whether this plumbing the max "down" like this is more efficient than using RowSetMetadata::GetAllBlocks (or even TabletMetadata::CollectAllBlockIDs) to fetch all the IDs, and selecting the maximum amongst them. If not, the bulk approach is certainly less code.


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@413
PS1, Line 413: This is only relevant for block managers that assign
             :     // sequential block IDs, 
Not really necessary for this comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:32:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8465

to look at the new patch set (#8).

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but will fail any
tablet configured to use that directory. By not scanning the failed
directory, the server will "forget" about some blocks.

Since LBM block IDs are sequentially allocated just past the end of the
greatest known ID, it's possible for a new tablet to create blocks with
IDs already assigned to a failed tablet. If that failed tablet is
deleted, the live data belonging to the new tablet will be deleted.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. In loading a tablet's metadata, once all rowsets and orphaned
blocks have been seen, the block manager is notified of the highest
block ID in the superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

A unit test is added to log_block_manager-test and an integration test
is added to ts_recovery-itest to ensure block ID reuse does not happen.
Also includes some C++11/stylistic cleanup.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
15 files changed, 247 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8465/7//COMMIT_MSG@24
PS7, Line 24: the
> Double the
Done


http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/fs/block_manager.h@265
PS7, Line 265:   // externally-referenced ids that they may not have previously found.
> Getting better. Perhaps it would be good to add an example that will help p
Done


http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/7/src/kudu/tablet/tablet_metadata.cc@404
PS7, Line 404: if a disk
             :     // was not read
> Can you reword this example so it's more relatable to the relevant Kudu con
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 04 Nov 2017 05:37:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8465/5//COMMIT_MSG@16
PS5, Line 16: for new
for a new


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/block_manager.h@264
PS5, Line 264: should use
The new wording isn't quite right; it suggests that the block manager itself "uses" NotifyBlockId, but it's actually other components that do.


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/fs/log_block_manager-test.cc@500
PS5, Line 500:   // Simulate a complete reset of the block manager's block ID record, e.g.
             :   // restarting but with all the blocks gone.
             :   bm_->next_block_id_.Store(1);
Rather than simulating it in this way, how about deleting the containers from disk and reloading the block manager? That's a little bit more "real world" without making the test too complicated, I hope. Would let you avoid direct access to next_block_id_ too.


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/integration-tests/ts_recovery-itest.cc@149
PS5, Line 149:   // Set up a basic server that flushes often so we create blocks quickly.
BTW, it might be useful to wait for some compactions too, or some other operation that would delete some blocks, so that some block IDs make it into the orphaned block list.


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@403
PS5, Line 403:     BlockId max_block_id;
Probably worth adding a comment to this block of code (where you aggregate all the block IDs and compute the max) explaining the intent.


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@405
PS5, Line 405:     if (!block_ids.empty()) {
Is this check needed?


http://gerrit.cloudera.org:8080/#/c/8465/5/src/kudu/tablet/tablet_metadata.cc@418
PS5, Line 418:     fs_manager()->block_manager()->NotifyBlockId(max_block_id);
What happens if we make it down here and max_block_id is still uninitialized? What value is stored by the block manager?



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 04 Nov 2017 02:58:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/8/src/kudu/tablet/tablet_metadata.cc@420
PS8, Line 420:     fs_manager()->block_manager()->NotifyBlockId(max_block_id);
it seems possible that max_block_id is uninitialized at this point if there were no blocks. Does it have a well-defined uninitialized value that makes this safe?



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 06 Nov 2017 22:47:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8465 )

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8465/1//COMMIT_MSG@10
PS1, Line 10: If the opening of a directory fails (e.g.
            : due to a disk failure), the server can still start up, but will fail any
            : tablet configured to use that directory. By not scanning the failed
            : directory, the server will "forget" about some blocks.
            : 
> I'd say that the server will still start up but fail any tablet configured 
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@264
PS1, Line 264:  to
> should
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/fs/block_manager.h@266
PS1, Line 266:   virtual void NotifyBlockId(BlockId block_id) = 0;
> Since BlockIds are basically uint64_ts, we should just pass them by copy.
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@144
PS1, Line 144:   if (GetParam() != "log") {
             :     LOG(INFO) << "Missin
> Where's the multi-directory part?
Was originally structured to fail a directory. No longer!


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@199
PS1, Line 199:   for (const string& dir : ts->data_dirs()) {
             :     const string& data_dir = JoinPathSegments(dir, "data");
             :     vector<string> children;
> This will fail the old tablet, right? Is that worth asserting for?
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@207
PS1, Line 207: 
> You can get this special string from data_dirs.h (or should be able to, any
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@208
PS1, Line 208: 
> ASSERT_OK?
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/integration-tests/ts_recovery-itest.cc@233
PS1, Line 233:                         new_block_ids.begin(), new_block_ids.end(),
> You could SCOPED_TRACE this (and the first/second sets) to avoid logging th
Done


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@399
PS1, Line 399:       next_rowset_idx_ = std::max(next_rowset_idx_, rowset_meta->id() + 1);
> I'm curious whether this plumbing the max "down" like this is more efficien
Ran a few runs of dense_node-itest and it seems like it doesn't make a huge difference.
Good call though, bootstrap is dominated by disk access, the difference in the O(N) copy here is pretty negligible. WiIl go with that approach.


http://gerrit.cloudera.org:8080/#/c/8465/1/src/kudu/tablet/tablet_metadata.cc@413
PS1, Line 413: ager of blocks it may not be aware of (e.g. if a
             :     // disk was not read).
> Not really necessary for this comment.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 04 Nov 2017 01:13:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8465

to look at the new patch set (#5).

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but will fail any
tablet configured to use that directory. By not scanning the failed
directory, the server will "forget" about some blocks.

Since LBM block IDs are sequentially allocated just past the end of the
greatest known ID, it's possible for new tablet to create blocks with
IDs already assigned to a failed tablet. If that failed tablet is
deleted, the live data belonging to the second tablet will be deleted.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. In loading a tablet's metadata, once all rowsets and orphaned
blocks have been seen, the block manager is notified of the highest
block ID in the its superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

A unit test is added to log_block_manager-test and an integration test
is added to ts_recovery-itest to ensure block ID reuse does not happen.
Also includes some C++11/stylistic cleanup.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
15 files changed, 233 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8465

to look at the new patch set (#6).

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but will fail any
tablet configured to use that directory. By not scanning the failed
directory, the server will "forget" about some blocks.

Since LBM block IDs are sequentially allocated just past the end of the
greatest known ID, it's possible for new tablet to create blocks with
IDs already assigned to a failed tablet. If that failed tablet is
deleted, the live data belonging to the second tablet will be deleted.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. In loading a tablet's metadata, once all rowsets and orphaned
blocks have been seen, the block manager is notified of the highest
block ID in the the superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

A unit test is added to log_block_manager-test and an integration test
is added to ts_recovery-itest to ensure block ID reuse does not happen.
Also includes some C++11/stylistic cleanup.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
15 files changed, 246 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2202 avoid block ID reuse for missing dirs

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8465

to look at the new patch set (#7).

Change subject: KUDU-2202 avoid block ID reuse for missing dirs
......................................................................

KUDU-2202 avoid block ID reuse for missing dirs

The block manager is initially only aware of the blocks it reads when
first loading the FS layout. If the opening of a directory fails (e.g.
due to a disk failure), the server can still start up, but will fail any
tablet configured to use that directory. By not scanning the failed
directory, the server will "forget" about some blocks.

Since LBM block IDs are sequentially allocated just past the end of the
greatest known ID, it's possible for a new tablet to create blocks with
IDs already assigned to a failed tablet. If that failed tablet is
deleted, the live data belonging to the new tablet will be deleted.

To prevent this, a new API has been added to the block managers to grant
external components the ability to notify the block manager of block
IDs. In loading a tablet's metadata, once all rowsets and orphaned
blocks have been seen, the block manager is notified of the highest
block ID in the the superblock.

This patch only targets the LBM, which assigns blocks sequentially,
and therefore only needs to know about the largest block ID referenced
by tablets.

A unit test is added to log_block_manager-test and an integration test
is added to ts_recovery-itest to ensure block ID reuse does not happen.
Also includes some C++11/stylistic cleanup.

Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
---
M src/kudu/fs/block_id.h
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-test-util.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
15 files changed, 246 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/8465/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06f10e95278dafdedcab432a7d4d1dc5c59bf4cc
Gerrit-Change-Number: 8465
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot