You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yao Xu (Code Review)" <ge...@cloudera.org> on 2019/08/14 05:24:01 UTC

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Yao Xu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14061


Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS.

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/mt-tablet-test.cc
3 files changed, 14 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu <oc...@gmail.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#7).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 172 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 7
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#3).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 148 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 3
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG@38
PS2, Line 38: This is because there is DeltaTracker lack of lock protection when modify
            : the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
            : This caused deleted_row_count_ to be duplicated when calculating the number
            : of live rows of DRS.
It took me a while to understand the race. Below is a thread interleaving that I sketched which describes it:


T1                               T2
---                              ---
+ In DT::Flush
Take compact_flush_lock_ (excl)
Take component_lock_ (excl)
deleted_row_count_ = ...
Release component_lock_
+ In DT::FlushDMS
Call RSMD::IncrementLiveRows
--> RSMD::live_row_count now
    includes deleted_row_count_
                                 + In DRS::CountLiveRows
                                 Take component_lock_ (shared)
                                 Call RSMD::live_row_count - DT::CountDeletedRows
                                 --> RSMD::live_row_count - deleted_row_count_
                                 --> we double counted deleted_row_count_ !!!
Take component_lock_ (excl)
deleted_row_count_ = 0
Release component_lock_

Maybe you can clean this up and add it to the commit message?


http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746:   RETURN_NOT_OK(rowset_metadata_->CommitRedoDeltaDataBlock(dms->id(), block_id));
Is it problematic that the live row count isn't changed atomically as part of this operation? Consider the following sequence:
1. One MM thread is in the middle of flushing a DMS, and makes it as far as IncrementLiveRows before a context switch. Meaning, the RSMD's live row count has changed, but neither the RSMD's last durable DMS ID nor its list of redo block IDs has been updated.
2. Meanwhile, another MM thread does a compaction, flushing the tablet metadata (and all RSMDs), including the updated live row count from #1.
3. The server crashes before the DMS flushing thread can call CommitRedoDeltaDataBlock and flush the tablet metadata.
4. On restart, the tablet is bootstrapped. Its metadata includes a live row count that reflects the DMS flush, but it's as if the DMS flush "never happened" because a) the new redo block ID wasn't committed, and b) the rowset's last durable DMS ID wasn't updated.

As I see it, it seems like all of the RSMD changes in the DMS flush need to happen atomically; otherwise the live row count may not reflect reality after a crash. Do you think this is a real problem? If so, could we prove it in a unit test in some way?



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 14 Aug 2019 17:23:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc@239
PS9, Line 239: std::
Nit: add a using for this.



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 9
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 20 Aug 2019 16:30:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#5).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 170 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 5
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc@239
PS9, Line 239: 
> Nit: add a using for this.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 21 Aug 2019 05:19:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#9).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 171 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 9
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#10).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 177 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@969
PS4, Line 969:   // Otherwise, test deleting data on MRS.
             :   int32_t write_interval_millis = 0
> Why are these necessary? Please doc.
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@3139
PS4, Line 3139:     WriteRequestPB w_req;
> Nit: did you mean to add this empty line at the end?
a slip of a pen


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h
File src/kudu/integration-tests/test_workload.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h@152
PS4, Line 152:   enum WritePattern {
> Nit: "Insert random rows, then delete them."
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@140
PS4, Line 140:   while (should_run_.Load()) {
> Since you have to pass 'this', how about you make this a named member funct
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@168
PS4, Line 168:   // Note: overridi
> Nit: the previous style ('KuduPartialRow*' ) is more correct.
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@237
PS4, Line 237: 
             : size_t TestWorkload::GetNumberOfErrors(kudu::client::KuduSession* session) {
> Should note that when INSERT_RANDOM_ROWS_WITH_DELETE is used, ReadThread do
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc@748
PS4, Line 748:     // Merge the deleted row count of the old DMS to the RowSetMetadata
             :     // and reset deleted_row_count_ should be atomic, so we lock the
             :     // component_lock_ in exclusive mode.
             :     std::lock_guard<rw_spinlock> loc
> These comments just duplicate the actual code. It would be more interesting
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h
File src/kudu/tablet/rowset_metadata.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h@120
PS4, Line 120:   // Atomically commit the new redo delta block to RowSetMetadata.
> Could you doc this function?
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc
File src/kudu/tablet/rowset_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc@291
PS4, Line 291: std::lock_guard<LockType> l(lock_);
> Additional locks to old tables that do not support live row
 > counting.

Emm, I think our newly created table will turn this on by default, so this should be acceptable.



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 9
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 20 Aug 2019 11:11:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 4:

I implemented an integration test case that simulated this scenario with high-frequency flush, tablet server crashes, and heartbeats(count live rows).


-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Thu, 15 Aug 2019 15:00:35 +0000
Gerrit-HasComments: No

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

There is a gap between updating "rowset_metadata_" and resetting the "deleted_row_count_" indeed. Yeah, it is a bug! Thanks to Yao Xu.

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@786
PS2, Line 786: rowset_metadata_->IncrementLiveRows(-old_dms->deleted_row_count());
It seems to be the only way although it looks a bit obtrusive.



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 14 Aug 2019 07:59:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Reviewed-on: http://gerrit.cloudera.org:8080/14061
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 177 insertions(+), 80 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 11
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG@38
PS2, Line 38: This is because there is DeltaTracker lack of lock protection when modify
            : the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
            : This caused deleted_row_count_ to be duplicated when calculating the number
            : of live rows of DRS.
> It took me a while to understand the race. Below is a thread interleaving t
Well, it's really clear. Got a new skill !!


http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746:   RETURN_NOT_OK(rowset_metadata_->CommitRedoDeltaDataBlock(dms->id(), block_id));
> Is it problematic that the live row count isn't changed atomically as part 
Mmm, this is a real problem, and I'll try to add some test cases to my integration tests. Great Thanks :)



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Thu, 15 Aug 2019 03:44:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#2).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS.

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/mt-tablet-test.cc
3 files changed, 15 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#8).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 171 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 8
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746: 
> Is it problematic that the live row count isn't changed atomically as part 
Yeah, very good case!!
And I'm curious about the result: what will happen when the tserver crashes between DMS flushing thread calls CommitRedoDeltaDataBlock and flush the tablet metadata? Meaning, crash between Line746 and Line747. Thanks in advance.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc
File src/kudu/tablet/rowset_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc@291
PS4, Line 291: std::lock_guard<LockType> l(lock_);
Additional locks to old tables that do not support live row counting.



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 19 Aug 2019 11:31:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed a vote on this change.

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 10: Verified+1 Code-Review+2

Overriding Jenkins, failure seems unrelated to this fix.


-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 21 Aug 2019 17:59:08 +0000
Gerrit-HasComments: No

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#6).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 170 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 6
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Yao Xu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#4).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date ***
PC: @     0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: ***
    @     0x7f9bdaff6100 (unknown)
    @     0x7f9bd20425f7 __GI_raise
    @     0x7f9bd2043ce8 __GI_abort
    @     0x7f9bd4540c99 google::logging_fail()
    @     0x7f9bd454246d google::LogMessage::Fail()
    @     0x7f9bd45443c3 google::LogMessage::SendToLog()
    @     0x7f9bd4541fc9 google::LogMessage::Flush()
    @     0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
    @     0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
    @           0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
    @           0x4ae34b boost::_mfi::mf1<>::operator()()
    @           0x4add25 boost::_bi::list2<>::operator()<>()
    @           0x4acfe9 boost::_bi::bind_t<>::operator()()
    @           0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @     0x7f9bd7116492 boost::function0<>::operator()()
    @     0x7f9bd62e5324 kudu::Thread::SuperviseThread()
    @     0x7f9bdafeedc5 start_thread
    @     0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1                                | T2
|----------                         |----------
|+ In DT::Flush                     |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = ...         |
|  Release component_lock_          |
|  + In DT::FlushDMS                |
|    Call RSMD::IncrementLiveRows   |
|    --> RSMD::live_row_count - deleted_row_count_
|                                   |+ In DRS::CountLiveRows
|                                   |  Take component_lock_ (shared)
|                                   |  Call RSMD::live_row_count - DT::CountDeletedRows
|                                   |  --> RSMD::live_row_count - deleted_row_count_
|                                   |  --> we double counted deleted_row_count_ !!!
|  Take component_lock_ (excl)      |
|  deleted_row_count_ = 0           |
|  Release component_lock_          |
|  Release compact_flush_lock_      |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 145 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@969
PS4, Line 969:   workload.set_write_pattern(TestWorkload::INSERT_RANDOM_ROWS_WITH_DELETE);
             :   workload.set_write_batch_size(3);
Why are these necessary? Please doc.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@3139
PS4, Line 3139: 
Nit: did you mean to add this empty line at the end?


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h
File src/kudu/integration-tests/test_workload.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h@152
PS4, Line 152:     // Insert random row keys and delete them.
Nit: "Insert random rows, then delete them."


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@140
PS4, Line 140:   auto check_error_func = [](KuduSession* session, TestWorkload* workload) {
Since you have to pass 'this', how about you make this a named member function instead?


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@168
PS4, Line 168: KuduPartialRow *row
Nit: the previous style ('KuduPartialRow*' ) is more correct.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@237
PS4, Line 237:     int64_t expected_row_count =
             :         write_pattern_ == INSERT_RANDOM_ROWS_WITH_DELETE ? 0 : rows_inserted_.Load();
Should note that when INSERT_RANDOM_ROWS_WITH_DELETE is used, ReadThread doesn't really verify anything except that a scan works.


http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746: 
> Yeah, very good case!!
L746 updates the in-memory tablet superblock to reflect the DMS flush.
L747 atomically writes that new tablet superblock to disk.

If the tserver crashes in between, the changes made by L746 are lost. But that's OK; the post-crash state is correct as long as it reflects either the entirety of the DMS flush, or none of it.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc@748
PS4, Line 748:     // Lock the component_lock_ in exclusive mode.
             :     std::lock_guard<rw_spinlock> lock(component_lock_);
             :     // Merge the deleted row count of the old DMS to the RowSetMetadata
             :     // and reset deleted_row_count_.
These comments just duplicate the actual code. It would be more interesting to instead describe _why_ this is necessary. i.e. why do we need to take component_lock_ in exclusive mode before doing this?


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h
File src/kudu/tablet/rowset_metadata.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h@120
PS4, Line 120:   Status CommitRedoDeltaDataBlock(int64_t dms_id,
Could you doc this function?



-- 
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <oc...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 19 Aug 2019 19:19:38 +0000
Gerrit-HasComments: Yes