You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2019/07/13 02:59:45 UTC

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13858


Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................

KUDU-2635: treat failure to delete failed blocks as OK

It was previously possible to leave some orphaned blocks in the
in-memory orphaned blocks list by failing to delete some blocks due to a
disk failure. Upon deleting a tablet with such orphaned blocks, Kudu
could crash.

This patch addresses this by treating such failures to delete blocks
from failed data directories as OK, from the stance of actually deleting
the block. For all intents and purposes, the blocks aren't coming back.

A test is added that passed 1/100 times when run on dist-test in debug
mode. With the patch, it passes 100/100 times.

Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/tserver/tablet_server-test.cc
2 files changed, 35 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/13858/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13858

to look at the new patch set (#4).

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................

KUDU-2635: ignore failures to delete orphaned blocks

It was previously possible to leave some orphaned blocks in the
in-memory orphaned blocks list by failing to delete some blocks due to a
disk failure. Upon deleting a tablet with such orphaned blocks, Kudu
could crash, as this breaks the assumption in the TabletMetadata that
once we've made the call to delete orphaned blocks, then the orphaned
blocks list will be empty.

This patch addresses this by removing the blocks from the orphaned
blocks list regardless of whether the blocks were deleted at the block
manager level. At worst, this leaves us with untracked blocks, but it's
better than crashing due to a bogus assumption.

A test is added that passed 1/100 times when run on dist-test in debug
mode with stress. With the patch, it passes 100/100 times.

Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
---
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_server-test.cc
2 files changed, 31 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/13858/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13858/1/src/kudu/fs/log_block_manager.cc@2273
PS1, Line 2273:       if (!s.ok() && !s.IsNotFound() && !s.IsDiskFailure()) {
              :         if (first_failure.ok()) first_failure = s;
              :       } else if (s.ok()) {
              :         malloc_space += kudu_malloc_usable_size(lb.get());
              :         blocks_length += lb->length();
              :         lbs.emplace_back(std::move(lb));
nit: would it be easier to read if it were written the following way:

if (PREDICT_TRUE(s.ok())) {
  ...
} else if (s.IsNotFound() || s.IsDiskFailure()) {
  deleted->emplace_back(block_id);
} else if (first_failure.ok()) {
  first_failure = s;
}

?



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 17:34:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc@521
PS4, Line 521:     for (const BlockId& b : blocks) {
             :       orphaned_blocks_.erase(b);
             :     }
> Well, the documentation says that the range [first; last) "must be a valid 
Indeed, Google foo page 2 https://en.cppreference.com/w/cpp/container/unordered_set/erase



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 17 Jul 2019 00:01:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13858/2/src/kudu/fs/log_block_manager.cc@2277
PS2, Line 2277: s.IsDiskFailure()
> Ah, it seems I missed just another nit that in prior review iterations: do 
We log in the RemoveLogBlockUnlocked() call that there's a disk error.



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 19:03:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc@521
PS4, Line 521:     for (const BlockId& b : blocks) {
             :       orphaned_blocks_.erase(b);
             :     }
Does this work instead?

  orphaned_blocks_.erase(blocks.begin(), blocks.end());



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 23:18:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13858

to look at the new patch set (#3).

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................

KUDU-2635: ignore failures to delete orphaned blocks

It was previously possible to leave some orphaned blocks in the
in-memory orphaned blocks list by failing to delete some blocks due to a
disk failure. Upon deleting a tablet with such orphaned blocks, Kudu
could crash, as this breaks the assumption in the TabletMetadata that
once we've made the call to delete orphaned blocks, then the orphaned
blocks list will be empty.

This patch addresses this by removing the blocks from the orphaned
blocks list regardless of whether the blocks were deleted at the block
manager level. At worst, this leaves us with untracked blocks, but it's
better than crashing due to a bogus assumption.

A test is added that passed 1/100 times when run on dist-test in debug
mode with stress. With the patch, it passes 100/100 times.

Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
---
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_server-test.cc
2 files changed, 30 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/13858/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13858/1/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/13858/1/src/kudu/tserver/tablet_server-test.cc@712
PS1, Line 712: WARN_NOT_OK
nit: how much significance is in outputting this warning?  If not much, maybe don't output it at all?


http://gerrit.cloudera.org:8080/#/c/13858/1/src/kudu/tserver/tablet_server-test.cc@725
PS1, Line 725:     SCOPED_TRACE(SecureDebugString(resp));
Is it going to be ever printed?  If not, maybe drop this?



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 17:44:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc@521
PS4, Line 521:     for (const BlockId& b : blocks) {
             :       orphaned_blocks_.erase(b);
             :     }
> Does this work instead?
I don't see why it doesn't, but it doesn't. I see in http://www.cplusplus.com/reference/unordered_set/unordered_set/erase/ it's supported. Maybe it's not a fan of the mismatched types. This works:

    BlockIdSet blocks_set;
    orphaned_blocks_.erase(blocks_set.begin(), blocks_set.end());

but as suggested, I see this error:


/opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/unordered_set.h:450:7: note:   candidate expects 1 argument, 2 provided
/opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/unordered_set.h:455:7: note: std::unordered_set<_Value, _Hash, _Pred, _Alloc>::iterator std::unordered_set<_Value, _Hash, _Pred, _Alloc>::erase(std::unordered_set<_Value, _Hash, _Pred, _Alloc>::iterator) [with _Value = kudu::BlockId; _Hash = kudu::BlockIdHash; _Pred = kudu::BlockIdEqual; _Alloc = std::allocator<kudu::BlockId>; std::unordered_set<_Value, _Hash, _Pred, _Alloc>::iterator = std::__detail::_Node_iterator<kudu::BlockId, true, true>]
       erase(iterator __it)
       ^



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 23:54:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13858/2//COMMIT_MSG@14
PS2, Line 14: This patch addresses this by treating such failures to delete blocks
            : from failed data directories as OK, from the stance of actually deleting
            : the block. For all intents and purposes, the blocks aren't coming back.
Why did you choose to handle this in the LBM vs. the equivalent in TabletMetadata::DeleteOrphanedBlocks? To me it felt more natural for the LBM to "tell the truth" and for the TabletMetadata to ignore what the LBM said and to always remove all orphaned blocks from the superblock.



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 21:48:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13858/2/src/kudu/fs/log_block_manager.cc@2277
PS2, Line 2277: s.IsDiskFailure()
Ah, it seems I missed just another nit that in prior review iterations: do you think we want/need to log any information w.r.t. treating the block deleted in case there was a disk failure error?



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 18:09:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................


Patch Set 1: Code-Review+1

Good job, Andrew!
Looks good to me. By the way, a backport to 1.9.x is necessary.


-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Sat, 13 Jul 2019 05:06:15 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................

KUDU-2635: ignore failures to delete orphaned blocks

It was previously possible to leave some orphaned blocks in the
in-memory orphaned blocks list by failing to delete some blocks due to a
disk failure. Upon deleting a tablet with such orphaned blocks, Kudu
could crash, as this breaks the assumption in the TabletMetadata that
once we've made the call to delete orphaned blocks, then the orphaned
blocks list will be empty.

This patch addresses this by removing the blocks from the orphaned
blocks list regardless of whether the blocks were deleted at the block
manager level. At worst, this leaves us with untracked blocks, but it's
better than crashing due to a bogus assumption.

A test is added that passed 1/100 times when run on dist-test in debug
mode with stress. With the patch, it passes 100/100 times.

Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Reviewed-on: http://gerrit.cloudera.org:8080/13858
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tserver/tablet_server-test.cc
2 files changed, 31 insertions(+), 2 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/13858/4/src/kudu/tablet/tablet_metadata.cc@521
PS4, Line 521:     for (const BlockId& b : blocks) {
             :       orphaned_blocks_.erase(b);
             :     }
> I don't see why it doesn't, but it doesn't. I see in http://www.cplusplus.c
Well, the documentation says that the range [first; last) "must be a valid range in *this". So yeah maybe the iterator types need to be the same too.



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 23:59:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13858

to look at the new patch set (#2).

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................

KUDU-2635: treat failure to delete failed blocks as OK

It was previously possible to leave some orphaned blocks in the
in-memory orphaned blocks list by failing to delete some blocks due to a
disk failure. Upon deleting a tablet with such orphaned blocks, Kudu
could crash.

This patch addresses this by treating such failures to delete blocks
from failed data directories as OK, from the stance of actually deleting
the block. For all intents and purposes, the blocks aren't coming back.

A test is added that passed 1/100 times when run on dist-test in debug
mode. With the patch, it passes 100/100 times.

Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/tserver/tablet_server-test.cc
2 files changed, 38 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/13858/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] KUDU-2635: treat failure to delete failed blocks as OK

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: treat failure to delete failed blocks as OK
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

Overall LGTM, just a few nits in the test.

http://gerrit.cloudera.org:8080/#/c/13858/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13858/2/src/kudu/fs/log_block_manager.cc@2277
PS2, Line 2277: s.IsDiskFailure()
> We log in the RemoveLogBlockUnlocked() call that there's a disk error.
Ah, indeed.



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 15 Jul 2019 20:49:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2635: ignore failures to delete orphaned blocks

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13858 )

Change subject: KUDU-2635: ignore failures to delete orphaned blocks
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13858/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13858/2//COMMIT_MSG@14
PS2, Line 14: blocks list will be empty.
            : 
            : This patch addresses this by removing the blocks from the orphaned
> Why did you choose to handle this in the LBM vs. the equivalent in TabletMe
The TabletMetadata code is currently structured so we can easily persist that we've deleted 1) all orphaned blocks that didn't hit an error, or 2) all orphaned blocks.

The code is structured as #1, and it is conservative in terms of error-checking by handling only errors that we might expect (not found, or disk failure). #2 is more along the lines of what you're suggesting; it's less conservative or more permissive because it ignores errors altogether.

Though, the invariant in the TabletMetadata is that once DeleteOrphanedBlocks() is called, the orphaned blocks list is empty. So on second thought, I'm inclined to agree, and let's do this in DeleteOrphanedBlocks() to ensure that is true.

Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice78f41d6d367d42ad31c2127ceb5fc57a244e34
Gerrit-Change-Number: 13858
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 06:43:05 +0000
Gerrit-HasComments: Yes