You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2018/04/25 23:29:15 UTC

[kudu-CR] Fast path scanning blocks of deleted rows

Hello Will Berkeley, Mostafa Mokhtar,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10213

to review the following change.


Change subject: Fast path scanning blocks of deleted rows
......................................................................

Fast path scanning blocks of deleted rows

This adds some very simple fast-paths in the case that an entire row
block consists of rows that were deleted.

The first is in the block materialization code: if delta application
results in all rows being deleted, we don't need to move forward any
materialize any columns.

The second is before serializing scan responses to the client. In this
case we don't need to loop over each column and read the selection
vector for each column. Instead we can just return immediately.

I tested this on a table where I'd inserted a few billion rows, deleted
them all, and then re-inserted them. Before the patch, a simple 'SELECT
* FROM t LIMIT 10' took 306sec. With the first optimization only, it
took about 10 seconds. With the second one as well, it took about 2.2
seconds.

There are probably more general optimizations that could be done for
sparsely-populated RowBlocks (eg where just a few rows are selected) but
they are much more complex, and it's relatively common for users to
delete large consecutive runs of rows. For example, users may use
'DELETE' all rows in a table or partition before re-adding them, or they
may delete all data corresponding to some prefix of the PK.

Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
---
M src/kudu/common/generic_iterators.cc
M src/kudu/tserver/tablet_service.cc
2 files changed, 11 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/13/10213/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] Fast path scanning blocks of deleted rows

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10213 )

Change subject: Fast path scanning blocks of deleted rows
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10213/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10213/1//COMMIT_MSG@13
PS1, Line 13: move forward any
            : materialize any columns.
> Looks like two thoughts collided here.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 27 Apr 2018 20:01:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] Fast path scanning blocks of deleted rows

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10213 )

Change subject: Fast path scanning blocks of deleted rows
......................................................................


Patch Set 2: Verified+1

Some unrelated flake... trying to udnerstand it


-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 27 Apr 2018 22:00:59 +0000
Gerrit-HasComments: No

[kudu-CR] Fast path scanning blocks of deleted rows

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10213 )

Change subject: Fast path scanning blocks of deleted rows
......................................................................

Fast path scanning blocks of deleted rows

This adds some very simple fast-paths in the case that an entire row
block consists of rows that were deleted.

The first is in the block materialization code: if delta application
results in all rows being deleted, we don't need to move forward and
materialize any columns.

The second is before serializing scan responses to the client. In this
case we don't need to loop over each column and read the selection
vector for each column. Instead we can just return immediately.

I tested this on a table where I'd inserted a few billion rows, deleted
them all, and then re-inserted them. Before the patch, a simple 'SELECT
* FROM t LIMIT 10' took 306sec. With the first optimization only, it
took about 10 seconds. With the second one as well, it took about 2.2
seconds.

There are probably more general optimizations that could be done for
sparsely-populated RowBlocks (eg where just a few rows are selected) but
they are much more complex, and it's relatively common for users to
delete large consecutive runs of rows. For example, users may use
'DELETE' all rows in a table or partition before re-adding them, or they
may delete all data corresponding to some prefix of the PK.

Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Reviewed-on: http://gerrit.cloudera.org:8080/10213
Tested-by: Todd Lipcon <to...@apache.org>
Reviewed-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/common/generic_iterators.cc
M src/kudu/tserver/tablet_service.cc
2 files changed, 11 insertions(+), 0 deletions(-)

Approvals:
  Todd Lipcon: Verified
  Will Berkeley: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] Fast path scanning blocks of deleted rows

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10213 )

Change subject: Fast path scanning blocks of deleted rows
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10213/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10213/1//COMMIT_MSG@13
PS1, Line 13: move forward any
            : materialize any columns.
Looks like two thoughts collided here.



-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 25 Apr 2018 23:55:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] Fast path scanning blocks of deleted rows

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10213 )

Change subject: Fast path scanning blocks of deleted rows
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 02 May 2018 05:29:30 +0000
Gerrit-HasComments: No

[kudu-CR] Fast path scanning blocks of deleted rows

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has removed a vote on this change.

Change subject: Fast path scanning blocks of deleted rows
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] Fast path scanning blocks of deleted rows

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, Mostafa Mokhtar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10213

to look at the new patch set (#2).

Change subject: Fast path scanning blocks of deleted rows
......................................................................

Fast path scanning blocks of deleted rows

This adds some very simple fast-paths in the case that an entire row
block consists of rows that were deleted.

The first is in the block materialization code: if delta application
results in all rows being deleted, we don't need to move forward and
materialize any columns.

The second is before serializing scan responses to the client. In this
case we don't need to loop over each column and read the selection
vector for each column. Instead we can just return immediately.

I tested this on a table where I'd inserted a few billion rows, deleted
them all, and then re-inserted them. Before the patch, a simple 'SELECT
* FROM t LIMIT 10' took 306sec. With the first optimization only, it
took about 10 seconds. With the second one as well, it took about 2.2
seconds.

There are probably more general optimizations that could be done for
sparsely-populated RowBlocks (eg where just a few rows are selected) but
they are much more complex, and it's relatively common for users to
delete large consecutive runs of rows. For example, users may use
'DELETE' all rows in a table or partition before re-adding them, or they
may delete all data corresponding to some prefix of the PK.

Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
---
M src/kudu/common/generic_iterators.cc
M src/kudu/tserver/tablet_service.cc
2 files changed, 11 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/13/10213/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Gerrit-Change-Number: 10213
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>