You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org> on 2016/11/08 08:49:29 UTC

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

David Ribeiro Alves has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4995

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Of interest here is that
this process requires a copy. Since the oldest ghost row is
discarded from the compaction input we must make sure that its
data survives until the most recent is compacted. This is done
by copying the oldest to the youngest's arena.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 355 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#11).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the newest one will suvive by the time the
older one is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 405 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 13:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 349
> hmm, why remove these assertions? they've been quite useful in catching bug
oh, I guess this is leftover from the version of the patch that didn't have ghosts in multiple DRSs, put them back.


PS12, Line 309: 
> I guess this function is no longer meant to be update-specific, given you r
Done


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction-test.cc
File src/kudu/tablet/compaction-test.cc:

Line 528:             "Undos: [@1(DELETE)] "
> I think it's worth extending this test or adding a new one which does somet
I added a new test that covers this stuff more thoroughly.


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 90:       row_block_.reset(new RowBlock(iter_->schema(), num_in_block, &arena_));
> hm, was this a bug?
I guess not really since we don't do flushing compactions (where we would need the MRS's arena in MergeCompactionInput::PrepareBlock()) but I figured it couldn't hurt, right? seems silly that we're putting the mutations in the arena but not the copy of the row.


Line 255: // Compares two duplicate rows before compaction (and before the REDO->UNDO transformation).
> At first before reading the code, I expected there was a precondition here 
yeah this happen before the transformation and serves the sole purpose of comparing two rows with the same key from two row sets to set 'previous_ghost' appropriately. One of the rows might have a log redo history.
Made this more explicit.


PS12, Line 256: -1 if 'left' is less recent than 'r
> this phrasing is a bit odd. Is it equivalent to say: Returns -1 if 'left' i
Done


Line 261:     DCHECK(right.redo_head != nullptr);
> for the purposes of this DCHECK, seems like it's worth actually doing Advan
Done


PS12, Line 284:     return ret;
              :   }
              : 
> can you give the specific scenario here? iirc it's:
Done


PS12, Line 295: 
> its
Done


Line 299:   // Update row 'a' @ 10
> nit: maybe more intuitive to write this as
Done


PS12, Line 453: :OK();
> this sounds like you're talking about a pre-existing state, but really you 
Done


Line 589:       return false;
> is this recursion inherently depth-limited by the number of input rowsets? 
yeah, there can be at most n-1 ghosts per row where n is the number of rowsets in the compaction input


Line 667:   Arena* prepared_block_arena_;
> doc
Done


Line 679: };
> doing this recursively seems dangerous, since it's bounded only by the numb
it can, you're right, I was lazy. Done


PS12, Line 698: 
> typo
Done


PS12, Line 708: Bi
> typo: is
Done


Line 969: 
> nit: indentation seems off on this comment block.
Done


Line 988: 
> yea, this seems like a potentially serious issue. Mind filing a critical JI
created KUDU-1760


Line 991:       }
> I know you didn't write this but is it right not to use an arena here?
I guess the point here is that there is no advantage in copying the row to the arena since dst_row comes from a row_block on the stack and it's lifetime is already controlled.


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.h
File src/kudu/tablet/compaction.h:

PS12, Line 167: row is found in mu
> or "when a row appears in multiple RowSets"
Done


PS12, Line 167: row is found in mu
> I think the term 'version' here is a bit confusing, because even without th
Done


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

Line 687: // Visitor which establishes the liveness of a row by applying deletes and reinserts.
> +1 for LivenessVisitor
Done


Line 687: // Visitor which establishes the liveness of a row by applying deletes and reinserts.
> we should rename this visitor to LivenessVisitor or DeleteReinsertVisitor o
Done


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/tablet-test.cc
File src/kudu/tablet/tablet-test.cc:

PS12, Line 228: row
> typo
Done


PS12, Line 229: ful
> s/all/full/
Done


Line 237:   // Insert one row.
> punctuation here and below in the comments
Done


Line 238:   ASSERT_OK(this->InsertTestRow(&writer, 1, 0));
> ASSERT_OK() ?
Done


Line 239: 
> No need to check last_op_result(), InsertTestRow() already returns non-OK v
Done


PS12, Line 255: delet
> typo: fourth
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#23).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Ran 500 loops of mt-tablet-test and compaction-test
in dist-test with "KUDU_ALLOW_SLOW_TESTS=1" and
"--stress_cpu_threads=4". All tests passed. Results:

compaction-test: http://dist-test.cloudera.org//job?job_id=david.alves.1480028902.31041
mt-tablet-test:  http://dist-test.cloudera.org//job?job_id=david.alves.1480028908.31125

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 634 insertions(+), 134 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/23
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 23
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#10).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 331 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#15).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 662 insertions(+), 160 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/15
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#5).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Of interest here is that
this process requires a copy. Since the oldest ghost row is
discarded from the compaction input we must make sure that its
data survives until the most recent is compacted. This is done
by copying the oldest to the youngest's arena.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 392 insertions(+), 134 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 90:       row_block_.reset(new RowBlock(iter_->schema(), num_in_block, &arena_));
> well, it avoids an extra bit of copying, right?
removed


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#9).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Of interest here is that
this process requires a copy. Since the oldest ghost row is
discarded from the compaction input we must make sure that its
data survives until the most recent is compacted. This is done
by copying the oldest to the youngest's arena.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 331 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 12:

(17 comments)

Any chance you've looked at coverage of the new code paths? I suggested one test addition that I think will be helpful for verifying the 'non-disjoint histories' case which I'm not seeing a lot of coverage for.

Also would be great to loop some of the randomized/stress tests with this change (eg the mt-tablet-test delete/reinsert test)

http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 349
hmm, why remove these assertions? they've been quite useful in catching bugs in the past in the various stress tests where we'd accidentally end up mis-ordering deltas and get two DELETEs in a row


PS12, Line 309: updated
I guess this function is no longer meant to be update-specific, given you removed the DCHECK? update the comment? Perhaps there should still be a DCHECK that the type is update/reinsert?


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction-test.cc
File src/kudu/tablet/compaction-test.cc:

Line 528: TEST_F(TestCompaction, TestDuplicatedGhostRowsMerging) {
I think it's worth extending this test or adding a new one which does something like:
- create 10 rowsets, each with a disjoint history of insert/update/delete
- while there is more than one rowset, merge a random set of 3 rowsets
- once you've fully compacted down, verify the whole history is still there

This will trigger the case of interleaved histories that isn't necessarily triggered in the case you have here.


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 90:       row_block_.reset(new RowBlock(iter_->schema(), num_in_block, &arena_));
hm, was this a bug?


Line 255: // Compares two duplicate rows.
At first before reading the code, I expected there was a precondition here that we've already processed/collapsed the REDO list such that redo_head on both rows is either a single DELETE or nullptr. but, looking at the code I see that's not the case.

Would be good to be explicit about that.


PS12, Line 256: 1 if left is more recent than right
this phrasing is a bit odd. Is it equivalent to say: Returns -1 if 'left' is less recent than 'right', 1 otherwise.

Since that's the normal way in which comparison functions are phrased?


Line 261:     DCHECK(right.redo_head != nullptr);
for the purposes of this DCHECK, seems like it's worth actually doing AdvanceToLastInList() here to make sure the _last_ REDO is a DELETE.


PS12, Line 284: 
              :   // In case the histories aren't disjoint it must be because one row was deleted in the
              :   // the same operation that inserted the other one, which was then mutated.
can you give the specific scenario here? iirc it's:

- row exists in DRS 1
- a single operation does 'DELETE row ; INSERT row ; UPDATE row'

so this results in DRS 1 with a REDO DELETE at the same time as the MRS (and its resulting flush) has a REDO UPDATE


PS12, Line 295: it'
its


PS12, Line 453:  The newest one will point to the oldest.
this sounds like you're talking about a pre-existing state, but really you are saying that we will set the newer one to point to the older one


Line 589:     // Copy previous versions recursively.
is this recursion inherently depth-limited by the number of input rowsets? want to make sure it doesn't risk a stack overflow


Line 667: void MergeUndoHistories(Mutation* left, Mutation* right, Mutation** head) {
doc


Line 679:     MergeUndoHistories(left->next(), right, head);
doing this recursively seems dangerous, since it's bounded only by the number of mutations, and not by the number of input rowsets. The mutation history is user-controlled so seems like a plausible crash. I'd think this can be converted to an iterative algorithm.


PS12, Line 698: includive
typo


Line 988:         // TODO(dralves) Make Reinserts set defaults on the dest row.
yea, this seems like a potentially serious issue. Mind filing a critical JIRA against 1.2 just so we don't lose track?


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.h
File src/kudu/tablet/compaction.h:

PS12, Line 167:  multiple versions
I think the term 'version' here is a bit confusing, because even without the delete/reinsert case, one might consider an updated row to have multiple "versions" when updated.

Mind switching to use another term, like 'histories' perhaps? or multiple lifetimes/lives? multiple existences?


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

Line 687: // Visitor which applies deletes to the selection vector.
we should rename this visitor to LivenessVisitor or DeleteReinsertVisitor or somesuch and update the docs accordingly


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#4).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Of interest here is that
this process requires a copy. Since the oldest ghost row is
discarded from the compaction input we must make sure that its
data survives until the most recent is compacted. This is done
by copying the oldest to the youngest's arena.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 376 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 19:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4995/19/src/kudu/tablet/compaction-test.cc
File src/kudu/tablet/compaction-test.cc:

Line 556: 
spurious newline


Line 618:   ts = ts == Timestamp::kInvalidTimestamp ? Timestamp(clock_->GetCurrentTime()) : ts;
how about:

  if (ts == Timestamp::kInvalidTimestamp) ts = Timestamp(clock_->GetCurrentTime());


PS19, Line 670: In the end make sure that the output is correct.
How about: Repeatedly merge all the generated RowSets until we are left with a single RowSet, then make sure that its history matches our expected history.


Line 699:     for (int j = 0; j < num_rowsets_in_layer; ++j) {
Mind adding a comment about the purpose of this "layers" logic?


Line 715:       // For odd rows, update them and delete them in the drs
Is there any particular reason why the odd rows in the DRS are updated with null values in AddExpectedUpdate() ?

Also, need punctuation in the comment.


PS19, Line 717: row_idx++
Increment row_idx on a separate line for readability


Line 724:     // For the next layer remove one rowset worth or rows at random
needs period


PS19, Line 724: or
of


Line 727:       row_ids.erase(row_ids.begin() + to_remove);
expensive operation, isn't it?


Line 762:     // Merge between 2 and 4 row sets
and here


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 19:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4995/19/src/kudu/tablet/compaction-test.cc
File src/kudu/tablet/compaction-test.cc:

Line 556: 
> spurious newline
Done


Line 618:   ts = ts == Timestamp::kInvalidTimestamp ? Timestamp(clock_->GetCurrentTime()) : ts;
> how about:
Done


PS19, Line 670: In the end make sure that the output is correct.
> How about: Repeatedly merge all the generated RowSets until we are left wit
Done


Line 699:     for (int j = 0; j < num_rowsets_in_layer; ++j) {
> Mind adding a comment about the purpose of this "layers" logic?
I did to the header. having overlapping layers is so that we have stuff to compact together. having one less row set per layer is so that we'll likely always have rows that are unique in a compaction input which is more "normal" than the artificial scenario of having all rowsets but the bottom one have only duplicated rows.


Line 715:       // For odd rows, update them and delete them in the drs
> Is there any particular reason why the odd rows in the DRS are updated with
no just the way this test worked previously (look at insert/update row). Done


PS19, Line 717: row_idx++
> Increment row_idx on a separate line for readability
Done


Line 724:     // For the next layer remove one rowset worth or rows at random
> needs period
Done


PS19, Line 724: or
> of
Done


Line 727:       row_ids.erase(row_ids.begin() + to_remove);
> expensive operation, isn't it?
this test runs in 374 ms total, my guess is that 90% of that time is spent on writes, i don't think that's an issue.


Line 762:     // Merge between 2 and 4 row sets
> and here
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#20).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Ran 500 loops of mt-tablet-test and compaction-test
in dist-test with "KUDU_ALLOW_SLOW_TESTS=1" and
"--stress_cpu_threads=4". All tests passed. Results:

compaction-test: http://dist-test.cloudera.org//job?job_id=david.alves.1480028902.31041
mt-tablet-test:  http://dist-test.cloudera.org//job?job_id=david.alves.1480028908.31125

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 736 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/20
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#16).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 661 insertions(+), 159 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/16
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4995/14/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

PS14, Line 261: DCHECK(right.redo_head != nullptr);
              :     const Mutation* right_last = left.redo_head;
              :     AdvanceToLastInList(&right_last);
no need to have the advance code twice with two different vars


PS14, Line 671: live
the most recent version might not be 'live' either.


PS14, Line 688: se
typo


PS14, Line 1021: TODO(dralves) Make Reinserts set defaults on the dest row.
point to the jira number


PS14, Line 1093: versions
remove reference to "versions" here and elsewhere


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 12:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 299:   CHECK_NE(left_latest->changelist().is_delete(), right_latest->changelist().is_delete());
nit: maybe more intuitive to write this as

  CHECK(!(left_latest->changelist().is_delete() && right_latest->changelist().is_delete()));


PS12, Line 708: if
typo: is


Line 969:         // When we see a reinsert REDO we do the following:
nit: indentation seems off on this comment block.


Line 991:             dst_row, static_cast<Arena*>(nullptr), &undo_encoder), ERROR,
I know you didn't write this but is it right not to use an arena here?


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.h
File src/kudu/tablet/compaction.h:

PS12, Line 167:  multiple versions
> I think the term 'version' here is a bit confusing, because even without th
or "when a row appears in multiple RowSets"


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

Line 687: // Visitor which applies deletes to the selection vector.
> we should rename this visitor to LivenessVisitor or DeleteReinsertVisitor o
+1 for LivenessVisitor


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/tablet-test.cc
File src/kudu/tablet/tablet-test.cc:

PS12, Line 228: tow
typo


PS12, Line 229: all
s/all/full/


Line 237:   // Insert one row
punctuation here and below in the comments


Line 238:   CHECK_OK(this->InsertTestRow(&writer, 1, 0));
ASSERT_OK() ?


Line 239:   ASSERT_FALSE(writer.last_op_result().has_failed_status());
No need to check last_op_result(), InsertTestRow() already returns non-OK via LocalTabletWriter::WriteBatch() if any of the ops failed. Also superfluous below for DeleteTestRow().


PS12, Line 255: forth
typo: fourth


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 22: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 20:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4995/20/src/kudu/tablet/compaction-test.cc
File src/kudu/tablet/compaction-test.cc:

PS20, Line 625: int
this should be a sized int type, no?


Line 664:   // Expecte an UNDO reinsert for the delete.
typo


http://gerrit.cloudera.org:8080/#/c/4995/12/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 90:       row_block_.reset(new RowBlock(iter_->schema(), num_in_block, &arena_));
> I guess not really since we don't do flushing compactions (where we would n
well, it avoids an extra bit of copying, right?


http://gerrit.cloudera.org:8080/#/c/4995/20/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

PS20, Line 233: 0;
nit: should be a '.'


PS20, Line 658: OrNull(
I think 'OrNull' is confusing here, because you're not advancing "while it's null". I think you can just remove it, since it's kind of implicit that we won't advance past the end of the linked list.


PS20, Line 669:    head = right;
              :     return head;
why not just 'return right;' (same below)


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#8).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Of interest here is that
this process requires a copy. Since the oldest ghost row is
discarded from the compaction input we must make sure that its
data survives until the most recent is compacted. This is done
by copying the oldest to the youngest's arena.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 334 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#6).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Of interest here is that
this process requires a copy. Since the oldest ghost row is
discarded from the compaction input we must make sure that its
data survives until the most recent is compacted. This is done
by copying the oldest to the youngest's arena.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 391 insertions(+), 132 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4995/14/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

PS14, Line 261: DCHECK(right.redo_head != nullptr);
              :     const Mutation* right_last = left.redo_head;
              :     AdvanceToLastInList(&right_last);
> no need to have the advance code twice with two different vars
Done


PS14, Line 671: live
> the most recent version might not be 'live' either.
Done


PS14, Line 688: se
> typo
Done


PS14, Line 1021: TODO(dralves) Make Reinserts set defaults on the dest row.
> point to the jira number
Done


PS14, Line 1093: versions
> remove reference to "versions" here and elsewhere
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#14).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 661 insertions(+), 160 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#17).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Ran 500 loops of mt-tablet-test and compaction-test
in dist-test with "KUDU_ALLOW_SLOW_TESTS=1" and
"--stress_cpu_threads=4". All tests passed. Results:

compaction-test: http://dist-test.cloudera.org//job?job_id=david.alves.1480028902.31041
mt-tablet-test:  http://dist-test.cloudera.org//job?job_id=david.alves.1480028908.31125

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 663 insertions(+), 160 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/17
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has submitted this change and it was merged.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Ran 500 loops of mt-tablet-test and compaction-test
in dist-test with "KUDU_ALLOW_SLOW_TESTS=1" and
"--stress_cpu_threads=4". All tests passed. Results:

compaction-test: http://dist-test.cloudera.org//job?job_id=david.alves.1480028902.31041
mt-tablet-test:  http://dist-test.cloudera.org//job?job_id=david.alves.1480028908.31125

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Reviewed-on: http://gerrit.cloudera.org:8080/4995
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 634 insertions(+), 134 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 25
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 16:

(7 comments)

Overall LGTM, I didn't review the new tests yet though

http://gerrit.cloudera.org:8080/#/c/4995/16/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 266:     DCHECK(right.redo_head != nullptr);
nit: can remove this or I guess should be DCHECK(right_last != nullptr)


Line 272:     DCHECK(left.redo_head != nullptr);
nit: could remove or I guess should be DCHECK(left_last != nullptr)


PS16, Line 463:  want
want to make


Line 689: void MergeUndoHistories(Mutation* left, Mutation* right, Mutation** head) {
How about return the head instead of having to pass a Mutation**


Line 1012:                                         << ERROR_LOG_CONTEXT;
nit: looks like an extra space


http://gerrit.cloudera.org:8080/#/c/4995/16/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

PS16, Line 749: }
nit: inconsistent missing space before brace


Line 753:   LivenessVisitor<UNDO> visitor = { this, sel_vec};
here too


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4995/20/src/kudu/tablet/compaction-test.cc
File src/kudu/tablet/compaction-test.cc:

PS20, Line 625: int
> this should be a sized int type, no?
Done


Line 664:   // Expecte an UNDO reinsert for the delete.
> typo
Done


http://gerrit.cloudera.org:8080/#/c/4995/20/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

PS20, Line 233: 0;
> nit: should be a '.'
Done


PS20, Line 658: OrNull(
> I think 'OrNull' is confusing here, because you're not advancing "while it'
Done


PS20, Line 669:    head = right;
              :     return head;
> why not just 'return right;' (same below)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 24: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 24
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#13).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 661 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#22).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Ran 500 loops of mt-tablet-test and compaction-test
in dist-test with "KUDU_ALLOW_SLOW_TESTS=1" and
"--stress_cpu_threads=4". All tests passed. Results:

compaction-test: http://dist-test.cloudera.org//job?job_id=david.alves.1480028902.31041
mt-tablet-test:  http://dist-test.cloudera.org//job?job_id=david.alves.1480028908.31125

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 734 insertions(+), 193 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/22
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#2).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_version'. This can happen
for an arbitrary number of ghost rows. Of interest here is that
this process requires a copy. Since the oldest ghost row is
discarded from the compaction input we must make sure that its
data survives until the most recent is compacted. This is done
by copying the oldest to the youngest's arena.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

Follow up patches will add new itests that test this functionality
more broadly.

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/tablet-test.cc
7 files changed, 374 insertions(+), 129 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4995

to look at the new patch set (#21).

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................

KUDU-237 (part 2) - Add support for REINSERT in delta files

This patch goes the final mile in adding support for REINSERTs in
delta files. It does various things to achieve this:

- Transforms REDO DELETE/REINSERT mutation pairs into UNDO
REINSERT/DELETE mutation pairs, leaving at the most a REDO
delete.

- Merges ghost rows on compaction: When duplicated rows are
found a new algorithm finds out which one is the most recent
and adds the other one as a 'previous_ghost'. This can happen
for an arbitrary number of ghost rows. Noteworthy is that setting
previous versions requires a copy. The two rows are in different
RowBlocks (for row data) and Arenas (for mutations) and it is
not guaranteed that the previous ghost will suvive by the time the
row that points to it is processed.

- Adds new test to tablet-test and changes a test in
compaction-test proving that this works.

- Adds a new test to compaction-test that creates several layers
of overlapping rowsets where each layer has one rowset less than
the previous one. This creates a mix of duplicated (up to 10
different ghosts) and unique rows. The test then compacts the
rowsets a few at a time and makes sure the histories are correct.

Follow up patches will add new itests that test this functionality
even more broadly.

Ran 500 loops of mt-tablet-test and compaction-test
in dist-test with "KUDU_ALLOW_SLOW_TESTS=1" and
"--stress_cpu_threads=4". All tests passed. Results:

compaction-test: http://dist-test.cloudera.org//job?job_id=david.alves.1480028902.31041
mt-tablet-test:  http://dist-test.cloudera.org//job?job_id=david.alves.1480028908.31125

Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
---
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/tablet-test.cc
8 files changed, 735 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/4995/21
-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 16:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4995/16/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 266:     DCHECK(right.redo_head != nullptr);
> nit: can remove this or I guess should be DCHECK(right_last != nullptr)
Done


Line 272:     DCHECK(left.redo_head != nullptr);
> nit: could remove or I guess should be DCHECK(left_last != nullptr)
Done


PS16, Line 463:  want
> want to make
Done


Line 689: void MergeUndoHistories(Mutation* left, Mutation* right, Mutation** head) {
> How about return the head instead of having to pass a Mutation**
Done


Line 1012:                                         << ERROR_LOG_CONTEXT;
> nit: looks like an extra space
Done


http://gerrit.cloudera.org:8080/#/c/4995/16/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

PS16, Line 749: }
> nit: inconsistent missing space before brace
Done


Line 753:   LivenessVisitor<UNDO> visitor = { this, sel_vec};
> here too
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 2) - Add support for REINSERT in delta files

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT in delta files
......................................................................


Patch Set 24: Code-Review+2

just a rebase keeping todd's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1173b2bea721b376f2b6049be20f57307582c47
Gerrit-PatchSet: 24
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No