You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2018/01/04 18:19:20 UTC

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Dan Burkert has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8941


Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/tablet/rowset_info.h
1 file changed, 3 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8941

to look at the new patch set (#4).

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

Since we appear to have poor coverage of the update-heavy workload
required to produce this bug, I opted to structure the test as an
integration test, instead of a targeted unit test.

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/heavy-update-compaction-test.cc
M src/kudu/tablet/rowset_info.h
3 files changed, 191 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 8
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 11 Jan 2018 22:15:38 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 18:42:51 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8941/2/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:

http://gerrit.cloudera.org:8080/#/c/8941/2/src/kudu/tablet/rowset_info.h@86
PS2, Line 86:   // Cached version of rowset_->OnDiskDataSizeNoUndos().
this is caching rs->OnDiskBaseDataSizeWithRedos() right?



-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 18:31:49 +0000
Gerrit-HasComments: Yes

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8941

to look at the new patch set (#8).

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

Testing: included is a targeted unit-test which reproduces the overflow
quickly and deterministically. I also reproduced the issue using an
integration test, however that test exposed other issues which need to
be addressed before it can land (KUDU-2253). I'll be working on that in
a follow-up commit.

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset_info.h
3 files changed, 29 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 8
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................


Patch Set 8:

Two straight failures of what appears to be KUDU-1736.  Bad luck, or related?


-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 8
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 11 Jan 2018 21:30:30 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................


Patch Set 2:

hm, thought you had a test that reproed this also. Worth backporting that as well, if so, unless it has a lot of test dependencies not available in this branch


-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 21:27:50 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8941

to look at the new patch set (#3).

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

Since we appear to have poor coverage of the update-heavy workload
required to produce this bug, I opted to structure the test as an
integration test, instead of a targeted unit test.

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/heavy-update-compaction-test.cc
M src/kudu/tablet/rowset_info.h
3 files changed, 191 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8941

to look at the new patch set (#5).

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

Since we appear to have poor coverage of the update-heavy workload
required to produce this bug, I opted to structure the test as an
integration test, instead of a targeted unit test.

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/heavy-update-compaction-test.cc
M src/kudu/tablet/rowset_info.h
3 files changed, 191 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 5
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8941

to look at the new patch set (#7).

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

This commit contains two repro tests: one targeted unit-test which
reproduces the overflow quickly and deterministically, and an
integration test that gives us more coverage of the update-heavy write
workloads required to hit this bug. The integration test has mixed
success on triggering the overflow, depending on how fast the machine
it's running on is (particularly disk throughput), and thus how fast the
MM can do compactions. On my laptop it triggered the overflow nearly
100% of the time, but on dist-test it triggered nearly 0% of the time.

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/heavy-update-compaction-itest.cc
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset_info.h
5 files changed, 264 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 7
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

Testing: included is a targeted unit-test which reproduces the overflow
quickly and deterministically. I also reproduced the issue using an
integration test, however that test exposed other issues which need to
be addressed before it can land (KUDU-2253). I'll be working on that in
a follow-up commit.

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Reviewed-on: http://gerrit.cloudera.org:8080/8941
Tested-by: Kudu Jenkins
Reviewed-by: Jean-Daniel Cryans <jd...@apache.org>
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset_info.h
3 files changed, 29 insertions(+), 4 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Jean-Daniel Cryans: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 9
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................


Patch Set 2:

Yep, I'll be adding the repro to this patch shortly.  Just giving it a few runs on dist-test.


-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 21:33:09 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8941 )

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8941/2/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:

http://gerrit.cloudera.org:8080/#/c/8941/2/src/kudu/tablet/rowset_info.h@86
PS2, Line 86:   // Cached version of rowset_->OnDiskDataSizeNoUndos().
> this is caching rs->OnDiskBaseDataSizeWithRedos() right?
No, on the 1.5.x branch it's called OnDiskDataSizeNoUndos.  It's been renamed on master.



-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 18:42:32 +0000
Gerrit-HasComments: Yes

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8941

to look at the new patch set (#6).

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

This commit contains two repro tests: one targeted unit-test which
reproduces the overflow quickly and deterministically, and an
integration test that gives us more coverage of the update-heavy write
workloads required to hit this bug. The integration test has mixed
success on triggering the overflow, depending on how fast the machine
it's running on is (particularly disk throughput), and thus how fast the
MM can do compactions. On my laptop it triggered the overflow nearly
100% of the time, but on dist-test it triggered nearly 0% of the time.

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/heavy-update-compaction-test.cc
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset_info.h
5 files changed, 263 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 6
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2251: rowset size can overflow int in RowSetInfo

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Jean-Daniel Cryans, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8941

to look at the new patch set (#2).

Change subject: KUDU-2251: rowset size can overflow int in RowSetInfo
......................................................................

KUDU-2251: rowset size can overflow int in RowSetInfo

This overflow causes a CHECK failure from rowset compaction planning in
tablets with rowsets with more than 2GiB of REDO deltafiles:

*** SIGABRT (@0x3ce00007614) received by PID 30228 (TID 0x7fbb52a5e700) from PID 30228; stack trace: ***
    @     0x7fbb977cb100 (unknown)
    @     0x7fbb95a985f7 __GI_raise
    @     0x7fbb95a99ce8 __GI_abort
    @          0x1af56d9 (unknown)
    @           0x8baf3d google::LogMessage::Fail()
    @           0x8bce93 google::LogMessage::SendToLog()
    @           0x8baa99 google::LogMessage::Flush()
    @           0x8bd81f google::LogMessageFatal::~LogMessageFatal()
    @           0x9f71d6 kudu::tablet::RowSetInfo::CollectOrdered()
    @           0x9d42d9 kudu::tablet::BudgetedCompactionPolicy::SetupKnapsackInput()
    @           0x9d5a3a kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
    @           0x98e28f kudu::tablet::Tablet::UpdateCompactionStats()
    @           0x9aff08 kudu::tablet::CompactRowSetsOp::UpdateStats()
    @          0x1ae02b5 kudu::MaintenanceManager::FindBestOp()
    @          0x1ae2bce kudu::MaintenanceManager::RunSchedulerThread()
    @          0x1b27eda kudu::Thread::SuperviseThread()
    @     0x7fbb977c3dc5 start_thread
    @     0x7fbb95b5921d __clone
    @                0x0 (unknown)

Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
---
M src/kudu/tablet/rowset_info.h
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8941/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I598344e22fc1ecbb482bfe85ea3867ddf63963b4
Gerrit-Change-Number: 8941
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins