You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org> on 2016/10/22 00:03:11 UTC

[kudu-CR] Support proper mutation encoding for reinserts

David Ribeiro Alves has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4791

Change subject: Support proper mutation encoding for reinserts
......................................................................

Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 126 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#17).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 295 insertions(+), 238 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/17
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#3).

Change subject: Support proper mutation encoding for reinserts
......................................................................

Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 166 insertions(+), 171 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#10).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 303 insertions(+), 240 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> Yea, I made some comments in the other patch too. I think this is a case he
I see the clean aspect of the approach and I considered it, but the reasoning here was to avoid traversing the full set of columns twice, once to build the UNDO reinsert and then again to apply the REDO reinsert.


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#9).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 303 insertions(+), 240 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> What if we added a 'bool* row_exists' out-param to ApplyRowChanges, so it t
But in that case we would be generating a REINSERT UNDO for all deletes, which I was trying to avoid and would likely be unnecessary in large majority of cases we see a delete (as you know REINSERTS are the biggest and most expensive UNDOs to build). Plus we'd have to keep that "on the side" just in case we needed it for the following mutation and when we did see a REINSERT we'd go through row again.

This seems a steep price to pay for the advantage of having a single method that returns an UNDO that matches precisely the encoded REDO.

I think a bit of this problem stems from our naming choices, after all ApplyRowChanges feels like it should return an UNDO because of the argument naming choice. What if we broke that connection and just returned an 'out' like in RemoveColumnIdsFromChangeList. We could also change the name of the method to something like MutateRowAndCaptureChanges. With the "undo" connection broken the purpose of the method is clearer. What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Reviewed-on: http://gerrit.cloudera.org:8080/4791
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 295 insertions(+), 238 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> Given that this is a fairly rare case, I think we should go for the cleaner
Just some final thoughts on the alternate implementation to make sure we're on the same page:
We currently keep DELETEs as REDOs and we likely don't want to generate a REINSERT for all deletes and keep it on the side in case it's needed, right? (just for the ones that are followed by a REINSERT).

So, in this if block, we would run ApplyRowChanges on the REDO delete (which would be a special case because we would then discard it) to get the REINSERT and then run it again on the REINSERT to apply the new state to the row and get the new DELETE. We would then still do the dance where we discard the REDO delete and assemble the UNDO list with the new UNDO REINSERT and UNDO DELETE.

Maybe we should change the name of ApplyRowChanges or add an alternate method then? Feels unnatural to ApplyRowChanges on a delete (not applying anything to 'dst_row').


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> Thinking about the above a bit more, it feels artificial to have to call Ap
What if we added a 'bool* row_exists' out-param to ApplyRowChanges, so it that it also took care of what TwiddleDeleteStatus is doing? Then it's very consistent:
- for updates, it just pulls in the changed cols
- for deletes, it twiddles *row_exists to false (and CHECKs that it was formerly true), plus generates a REINSERT undo
- for reinserts, it twiddles *row_exists to true, and generates a DELETE undo


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> ApplyRowChanges needs to capture the state of the old (now deleted) base + 
Yea, I made some comments in the other patch too. I think this is a case here of two wrongs making a right, but perhaps we could just do the right thing in both places? In other words, applying the DELETE should generate the REINSERT undo, and then applying the REINSERT should generate a DELETE undo. But the code in the next patch is calling the apply stuff in the wrong order (see comments there)


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist-test.cc
File src/kudu/common/row_changelist-test.cc:

Line 176
no equivalent test of an invalid reinsert delta?


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist.cc
File src/kudu/common/row_changelist.cc:

Line 44:   string ret;
move this variable down below the DELETE case


Line 190:     encoder.SetType(decoder.type_);
should this be outside the loop? what if, in the projected result, none of the columns are still there. Do we expect it to be empty or to be an empty update/reinsert? wonder if we have a test for this case.


Line 222:     undo_encoder->SetType(type_);
shouldn't the UNDO of a REINSERT be a DELETE?


Line 278:       out->SetType(decoder.type_);
same note as above. I dont think it's a change, but the docs should probably say that the resulting RCL is uninitialized if the removal of columns yields a no-op RCL


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 156:   // The change list includes direct and indirect data.
can you be more explicit here that the data is copied?


Line 239:     Slice val_slice;
since this val_slice is never actually set to anything, I think it's clearer to just use a temporary 'Slice()' on line 241


Line 309:   // state of the row into the undo_encoder.
can you amend this doc to explain what happens with reinsert?


Line 313:   // Apply this RowChangeList to row number 'row_idx' in 'dst_col', but only
same


Line 391:   Status DecodeNext(DecodedUpdate* update);
should this now say REQUIRES is_update or is_reinsert?


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
why disregard instead of having ApplyRowChanges create the right undo?


PS5, Line 734:  undo_encoder.Reset();
             :         undo_encoder.SetToDelete();
this should probably be done by the ApplyRowChanges itself


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#4).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 166 insertions(+), 171 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#2).

Change subject: Support proper mutation encoding for reinserts
......................................................................

Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 163 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4791/16/src/kudu/common/row_changelist-test.cc
File src/kudu/common/row_changelist-test.cc:

Line 150:   EXPECT_EQ(string("REINSERT col2=world, col3=12345, col4=NULL"),
I'm surprised not to see col1=hello here. Can you add a comment why it's not listed in the string representation even though it's added above? I guess maybe because it's part of the PK?


http://gerrit.cloudera.org:8080/#/c/4791/16/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 221:   // it to work for REINSERT and UPDATE.
nit: can you say 'both REINSERT and UPDATE'?


Line 250:     const ColumnSchema& col_schema = schema->column_by_id(col_id);
this is producing another map lookup, I think better to just use schema->column(i)


http://gerrit.cloudera.org:8080/#/c/4791/16/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 634: namespace { // anonymous namespace
this '// anonymous namespace' comment belongs with the closing brace below


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#12).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 292 insertions(+), 235 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#6).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 216 insertions(+), 175 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> I see the clean aspect of the approach and I considered it, but the reasoni
Given that this is a fairly rare case, I think we should go for the cleaner approach instead of the one that saves a pass over the columns. In the very worst case, it's twice as slow, but usually we'd expect only a small fraction of rows to hit this scenario, and from the context of looking at the RCL code it is just "not right"


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#7).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 293 insertions(+), 236 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#13).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 292 insertions(+), 235 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> Just some final thoughts on the alternate implementation to make sure we're
Thinking about the above a bit more, it feels artificial to have to call ApplyRowChanges twice, particularly since on the delete would not be applying anything.

Maybe what we want to do is to keep ApplyRowChanges just for updates and add a new ApplyReinsertAndEncodeOldRow ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4791/15/src/kudu/common/row_changelist-test.cc
File src/kudu/common/row_changelist-test.cc:

Line 201:   RowChangeListDecoder decoder(RowChangeList(Slice("\x03")));
> I know you didn't do this but instead of Slice("\x03") it would be nicer to
this doesn't actually work. the string in your suggestion ends up with the incorrect type (i tried).


http://gerrit.cloudera.org:8080/#/c/4791/15/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 163:     SetType(RowChangeList::kReinsert);
> Does anything in the encoder protect us from having an incomplete but nonem
not really, but rlcs for reinserts are always built from full rows.


Line 169:   void SetToReinsert(const RowType& src_row);
> How about EncodeReinsert()
would prefer to avoid externally visible renaming refactorings since that makes it harder for patches down the line. happy to rename after the series if you fell strongly about it.


Line 222:   void EncodeUpdate(const ColumnSchema& col_schema,
> How about EncodeColumnMutation()
Done


Line 235:   void EncodeUpdateRaw(int col_id, bool is_null, Slice new_val);
> How about EncodeColumnMutationRaw()
Done


Line 332:                                     RowChangeListEncoder* out);
> nit: why change the variable name from undo_encoder to 'out'? The previous 
because what's in 'out' is the old state of the row and not an undo. for instance if you do this on a delete you don't get a reinsert encoded back, so undo is a misnomer (see compaction.cc ApplyMutationsAndGenerateUndos)


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist-test.cc
File src/kudu/common/row_changelist-test.cc:

Line 176
> no equivalent test of an invalid reinsert delta?
added that and also extended the "normal" test quite a bit.


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist.cc
File src/kudu/common/row_changelist.cc:

Line 44:   string ret;
> move this variable down below the DELETE case
Done


Line 190:     encoder.SetType(decoder.type_);
> should this be outside the loop? what if, in the projected result, none of 
this is done inside the loop on purpose (i've made SetType() idempotent) precisely so that we get an empty 'buf' if we end up with an empty 'buf' if no columns are found. The behavior was the same before. and, in the only place where we call this (memrowset.cc line 601) we discard the mutation entirely if that is the case.

I moved the DCHECK out of the loop though.


Line 222:     undo_encoder->SetType(type_);
> shouldn't the UNDO of a REINSERT be a DELETE?
see my comment and explanation on compaction.cc. Having the UNDO be a REINSERT (which we would need to build anyway) allows us to go through the row only once (otherwise we'd need to build a REINSERT prior to running ApplyRowChanges to capture the old state, and then go through the row again to apply the new REINSERT).


Line 278:       out->SetType(decoder.type_);
> same note as above. I dont think it's a change, but the docs should probabl
I've changed the docs to reflect that the UPDATE and REINSERT cases are now the same

also moved the DCHECK outside the loop


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 156:   // The change list includes direct and indirect data.
> can you be more explicit here that the data is copied?
Done


Line 239:     Slice val_slice;
> since this val_slice is never actually set to anything, I think it's cleare
Done


Line 309:   // state of the row into the undo_encoder.
> can you amend this doc to explain what happens with reinsert?
Done


Line 313:   // Apply this RowChangeList to row number 'row_idx' in 'dst_col', but only
> same
Done


Line 391:   Status DecodeNext(DecodedUpdate* update);
> should this now say REQUIRES is_update or is_reinsert?
Done


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> why disregard instead of having ApplyRowChanges create the right undo?
ApplyRowChanges needs to capture the state of the old (now deleted) base + whatever updates happened prior to the delete. So "undo_encoder" will be a reinsert that will allow us to recover that state. 

For example say we have this schema (string a, int b) and the following row and mutation sequence (the number after @ is the timestamp):

DELETE@10 <- {BASE ("dummy",1)} -> UPDATE@20 (SET b = 2) -> DELETE@30 -> REINSERT@40 ("dummy", 3)

When we run ApplyRowChanges below 'undo_encoder' will encode: REINSERT("dummy",2) so that eventually we can have:

DELETE@10 <- UPDATE@20(SET b = 1) <- REINSERT@30("dummy", 2) <- DELETE@40 <- {BASE ("dummy",3)}

But this is only completed in a followup patch. In this patch we still build REINSERT@30 but then discard it to get the (still truncated history) of:

DELETE@40 <- {BASE ("dummy",3)}

The full logic in the next patch puts this in actual code.


PS5, Line 734:  undo_encoder.Reset();
             :         undo_encoder.SetToDelete();
> this should probably be done by the ApplyRowChanges itself
see my comment above.


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#16).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 293 insertions(+), 238 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/16
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4791/16/src/kudu/common/row_changelist-test.cc
File src/kudu/common/row_changelist-test.cc:

Line 150:   EXPECT_EQ(string("REINSERT col2=world, col3=12345, col4=NULL"),
> I'm surprised not to see col1=hello here. Can you add a comment why it's no
yeah, added the note you suggested.


http://gerrit.cloudera.org:8080/#/c/4791/16/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 221:   // it to work for REINSERT and UPDATE.
> nit: can you say 'both REINSERT and UPDATE'?
Done


Line 250:     const ColumnSchema& col_schema = schema->column_by_id(col_id);
> this is producing another map lookup, I think better to just use schema->co
Done


http://gerrit.cloudera.org:8080/#/c/4791/16/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 634: namespace { // anonymous namespace
> this '// anonymous namespace' comment belongs with the closing brace below
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 17: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4791

to look at the new patch set (#8).

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................

KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Currently reinserts are not properly encoded like the other mutations
and instead refer to an in-memory row, meaning we can't flush them
as they don't include the indirect data.

This patch makes it so that reinserts are properly encoded, like other
mutations but does not yet include the step of flushing them to disk,
which will be done in a follow-up patch.

Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
---
M src/kudu/common/row_changelist-test.cc
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
6 files changed, 302 insertions(+), 240 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4791/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-237 (part 1) - Support proper mutation encoding for reinserts

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for reinserts
......................................................................


Patch Set 15:

(6 comments)

Overall looks good, mostly just had some naming suggestions

http://gerrit.cloudera.org:8080/#/c/4791/15/src/kudu/common/row_changelist-test.cc
File src/kudu/common/row_changelist-test.cc:

Line 201:   RowChangeListDecoder decoder(RowChangeList(Slice("\x03")));
I know you didn't do this but instead of Slice("\x03") it would be nicer to see Substitute("$0", kReinsert) even though it requires another variable.


http://gerrit.cloudera.org:8080/#/c/4791/15/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 163:     SetType(RowChangeList::kReinsert);
Does anything in the encoder protect us from having an incomplete but nonempty RCL on a reinsert with non-nullable columns?


Line 169:   void SetToReinsert(const RowType& src_row);
How about EncodeReinsert()


Line 222:   void EncodeUpdate(const ColumnSchema& col_schema,
How about EncodeColumnMutation()


Line 235:   void EncodeUpdateRaw(int col_id, bool is_null, Slice new_val);
How about EncodeColumnMutationRaw()


Line 332:                                     RowChangeListEncoder* out);
nit: why change the variable name from undo_encoder to 'out'? The previous one seemed more descriptive.


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes