You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/05/05 23:27:41 UTC

[kudu-CR] KUDU-815. Improve performance of first scan following restart

Hello Jean-Daniel Cryans, Adar Dembo,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/2974

to review the following change.

Change subject: KUDU-815. Improve performance of first scan following restart
......................................................................

KUDU-815. Improve performance of first scan following restart

On the first scan following a tablet server restart, the TS has not read
deltafile stats for any delta files. This means that, when we construct
DeltaFileIterators to service a scan, we don't yet know whether the files
are even relevant given the MVCC snapshot that is being scanned.

Previous to this patch, we only attempted to cull irrelevant DeltaFiles
at iterator construction time, and without stats, we were unable to do so.
With this patch, we check again when the iterator is seeked, and in the
case that the file is irrelevant, we preemptively mark the file as
"exhausted" which prevents any needless IO.

To benchmark, I loaded a 1GB TPCH lineitem on a local tserver and looked
at the performance of the first scan.

without patch:
todd@todd-ThinkPad-T540p:~/git/kudu$ ./build/release/bin/tpch_real_world  --tpch_load_data=0  --tpch_use_mini_cluster=0
I0505 16:15:28.855382 32209 tpch_real_world.cc:307] Time spent querying data in cluster: real 1.966s    user 0.112s     sys 0.000s
I0505 16:15:29.598799 32209 tpch_real_world.cc:307] Time spent querying data in cluster: real 0.743s    user 0.100s     sys 0.000s

with patch:
todd@todd-ThinkPad-T540p:~/git/kudu$ ./build/release/bin/tpch_real_world  --tpch_load_data=0  --tpch_use_mini_cluster=0
I0505 16:14:31.102988 31545 tpch_real_world.cc:307] Time spent querying data in cluster: real 0.924s    user 0.096s     sys 0.008s

There is still a slight performance difference between the first scan after a
restart and the second due to cold caches, but the difference is much less
dramatic.

Change-Id: Icd01302723430e5b06308256bbbbb790aee096fc
---
M src/kudu/tablet/deltafile.cc
1 file changed, 11 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/2974/1
-- 
To view, visit http://gerrit.cloudera.org:8080/2974
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd01302723430e5b06308256bbbbb790aee096fc
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans