You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "helifu (Code Review)" <ge...@cloudera.org> on 2018/09/03 11:15:28 UTC

[kudu-CR] KUDU-2566:

helifu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11381


Change subject: KUDU-2566:
......................................................................

KUDU-2566:

two improvements:
1) support open-ended intervals;
2) end up fetching one more rowset for the upper bound which is exclusive;

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 256 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 1
Gerrit-Owner: helifu <hz...@corp.netease.com>

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and discard string copying while querying

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11381

to look at the new patch set (#4).

Change subject: KUDU-2566: Enhance rowset tree pruning and discard string copying while querying
......................................................................

KUDU-2566: Enhance rowset tree pruning and discard string copying while querying

Three improvements:
1) Support open-ended intervals:
   Previously, the rowset tree could only compute overlap with a closed and bounded
   interval. This changeset adds the ability to find overlap for unbounded intervals
   as well. As a result, tablet iterators with a single primary key bound are more
   efficient because they do not iterate over rowsets that don't overlap with the
   key bound.
2) End up fetching one more rowset for the upper bound which is exclusive:
   This changeset also adds the ability to query the rowset tree using an exclusive
   upper bound, whereas previously only inclusive intervals were supported.
   This makes scans more efficient since primary key upper bounds are passed as
   exclusive bounds, so now rowsets that overlap only in the upper bound are excluded.
3) Perf improvement: using raw slices instead of copying to strings while querying
   The copying from slices to string is discarded, which could help to increase the
   query performance.

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 294 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 4
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and discard string copying while querying

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11381

to look at the new patch set (#5).

Change subject: KUDU-2566: Enhance rowset tree pruning and discard string copying while querying
......................................................................

KUDU-2566: Enhance rowset tree pruning and discard string copying while querying

Three improvements:
1) Support open-ended intervals:
   Previously, the rowset tree could only compute overlap with a closed and bounded
   interval. This changeset adds the ability to find overlap for unbounded intervals
   as well. As a result, tablet iterators with a single primary key bound are more
   efficient because they do not iterate over rowsets that don't overlap with the
   key bound.
2) End up fetching one more rowset for the upper bound which is exclusive:
   This changeset also adds the ability to query the rowset tree using an exclusive
   upper bound, whereas previously only inclusive intervals were supported.
   This makes scans more efficient since primary key upper bounds are passed as
   exclusive bounds, so now rowsets that overlap only in the upper bound are excluded.
3) Perf improvement: using raw slices instead of copying to strings while querying
   The copying from slices to string is discarded, which could help to increase the
   query performance.

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 295 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 5
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop copying strings

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: Enhance rowset tree pruning and stop copying strings
......................................................................


Patch Set 7:

(3 comments)

Couple more nits and this is good.

http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc@1793
PS7, Line 1793: // It's a little bit difficult to merge the logic below to upper,
              :   // because some test cases reply on it :(
Can remove this.


http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc@2362
PS7, Line 2362: 
nit: extra line


http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/util/interval_tree-test.cc
File src/kudu/util/interval_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/util/interval_tree-test.cc@187
PS7, Line 187:  &
nit: Here and below, & with type.



-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 7
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 11 Sep 2018 03:05:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and discard string copying while querying

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: Enhance rowset tree pruning and discard string copying while querying
......................................................................


Patch Set 5:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG
Commit Message:

PS5: 
Thanks for rewriting the commit message. It's really nice now.


http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG@7
PS5, Line 7: discard string copying
nit: "stop copying strings"


http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG@11
PS5, Line 11: and bounded
nit: Could you wrap commit message lines at 72 characters?


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@211
PS5, Line 211: string&
By the convention of the Google style guide, we use references arguments only when they are const. Output parameters should be passed as pointers.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@218
PS5, Line 218: if (r == 0) continue
I think we want coverage of the degenerate case when s1 == s2 (so the interval [s1, s2) is empty).


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@217
PS5, Line 217: int r = strcmp(s1.c_str(), s2.c_str());
             :       if (r == 0) continue;
             :       left = r < 0 ? s1 : s2;
             :       right = r < 0 ? s2 : s1;
             :       break;
You can simplify this a little bit to

*left = std::min(s1, s2);
*right = std::max(s1, s2);

where I'm assuming left, right have been changed to be pointers and s1 == s2 is permissible.

It'd also be fine to take no parameters and return a std::pair.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@90
PS5, Line 90: FindRowSetsIntersectingInterval
Can you add a comment explaining what it means to pass boost::none for 'lower_bound' and 'upper_bound'?


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@90
PS5, Line 90:  &
style nit: We put the &'s and *'s for refs and pointers by the type, not the name, so

 const boost::optional<Slice>& lower_bound


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@91
PS5, Line 91:  &
Likewise.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc
File src/kudu/tablet/rowset_tree.cc:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@107
PS5, Line 107:  &
Likewise with the & placement nit.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@108
PS5, Line 108:  &
Likewise with the & placement nit.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@109
PS5, Line 109: const
I don't think the const specifier is meaningful on a bool passed by value.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@117
PS5, Line 117:  &
Likewise with the & placement nit.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@118
PS5, Line 118:  &
Likewise with the & placement nit.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@119
PS5, Line 119: const
Ditto on the const specifier.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@194
PS5, Line 194:  &
Likewise with the & placement nit.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@195
PS5, Line 195:  &
Likewise with the & placement nit.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@196
PS5, Line 196:  *
* next to type, not parameter name.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h
File src/kudu/util/interval_tree-inl.h:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h@402
PS5, Line 402: intervals
nit: interval


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h@409
PS5, Line 409: true
To make this easier to understand locally, could you do one of two things:

1. Add a comment naming the parameter corresponding to true or false in the calls to Traits::compare. E.g.

 return Traits::compare(Traits::get_left(interval), upper_bound, /*positive_direction=*/true);

2. Add an enum with descriptive names for the two cases:

 enum class EndpointIfNone {
   POSITIVE_INFINITY,
   NEGATIVE_INFINITY,
 };

 return Traits::compare(Traits::get_left(interval), upper_bound, POSITIVE_INFINITY);

See also https://abseil.io/tips/94.


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-test.cc
File src/kudu/util/interval_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-test.cc@64
PS5, Line 64: *upper <= this->left
I'm getting confused now about which intervals are inclusive of the right endpoint and which aren't. This Intersects method is checking for intersection of a closed interval [left, right] with a half-open interval [lower, upper) (when lower and upper are provided), right? Can we make this clearer? Maybe with different parameter names like `upper_excl` or `upper_exclusive`?



-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 5
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 07 Sep 2018 17:27:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2566: fix two todo list

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11381

to look at the new patch set (#2).

Change subject: KUDU-2566: fix two todo list
......................................................................

KUDU-2566: fix two todo list

two improvements:
1) support open-ended intervals;
2) end up fetching one more rowset for the upper bound which is exclusive;

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 256 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 2
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop coping string while querying

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: Enhance rowset tree pruning and stop coping string            while querying
......................................................................


Patch Set 6:

(6 comments)

I'll defer to Will; didn't review in detail.

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc@251
PS6, Line 251:   SeedRandom();
Don't need to SeedRandom() twice in a test.


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc@283
PS6, Line 283:   /*
Should either fix this test case, or remove it.


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree.h@96
PS6, Line 96:   //	[lower_bound, upper_bound)
Nit: got a tab here.


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/tablet.cc@1774
PS6, Line 1774:     boost::optional<Slice> lower_bound = boost::none;
              :     boost::optional<Slice> upper_bound = boost::none;
              :     if (spec->lower_bound_key()) {
              :       lower_bound = spec->lower_bound_key()->encoded_key();
              :     }
              :     if (spec->exclusive_upper_bound_key()) {
              :       upper_bound = spec->exclusive_upper_bound_key()->encoded_key();
              :     }
Nit: combine using ternaries:

  boost::optional<Slice> lower_bound = spec->lower_bound_key() ? spec->lower_bound_key()->encoded_key() : boost::none;


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h
File src/kudu/util/interval_tree-inl.h:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h@402
PS6, Line 402: interval
Nit: 'intervals' was correct; this deals with multiple intervals.


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree.h
File src/kudu/util/interval_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree.h@145
PS6, Line 145:   void FindIntersectingInterval(const QueryPointType &lower_bound,
             :                                 const QueryPointType &upper_bound,
Nit: const QueryPointType&

(no space between the type and the ampersand)



-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 6
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 10 Sep 2018 20:54:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop coping string while querying

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: Enhance rowset tree pruning and stop coping string            while querying
......................................................................


Patch Set 6:

(5 comments)

LGTM once Adar's and my nits are addressed.

http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG@7
PS6, Line 7: string
strings


http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG@7
PS6, Line 7: coping
copying


http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG@8
PS6, Line 8: while querying
I think git only uses the first line as the subject so inserting a line break will truncate the subject as displayed in e.g. git log --oneline.


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h
File src/kudu/util/interval_tree-inl.h:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h@402
PS6, Line 402: interval
> Nit: 'intervals' was correct; this deals with multiple intervals.
I think 'interval' can be singular or plural as long as the rest of the sentence agrees:

"Any interval whose left edge is < the query interval's right edge intersects the query interval"

or

"Any intervals whose left edges are < the query interval's right edge intersect the query interval"

but the former sounds better to me.


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-test.cc
File src/kudu/util/interval_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-test.cc@59
PS6, Line 59: // boost::none means infinity.
            :   // [left,  right] is closed interval.
            :   // [lower, upper) is half-open interval, so the upper is exclusive.
This and the pictures are great. Thanks for adding them!



-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 6
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 10 Sep 2018 22:47:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and discard string copying while querying

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11381

to look at the new patch set (#3).

Change subject: KUDU-2566: Enhance rowset tree pruning and discard string copying while querying
......................................................................

KUDU-2566: Enhance rowset tree pruning and discard string copying while querying

Four improvements:
1) Support open-ended intervals:
   Previously, the rowset tree could only compute overlap with a closed and bounded
   interval. This changeset adds the ability to find overlap for unbounded intervals
   as well. As a result, tablet iterators with a single primary key bound are more
   efficient because they do not iterate over rowsets that don't overlap with the
   key bound.
2) End up fetching one more rowset for the upper bound which is exclusive:
   This changeset also adds the ability to query the rowset tree using an exclusive
   upper bound, whereas previously only inclusive intervals were supported.
   This makes scans more efficient since primary key upper bounds are passed as
   exclusive bounds, so now rowsets that overlap only in the upper bound are excluded.
3) Perf improvement: using raw slices instead of copying to strings while querying
   The copying from slices to string is discarded, which could help to increase the
   query performance.
4) Simplify the code logic:
   The logic of the function 'CaptureConsistentIterators' is simpler after supporting
   unbounded intervals.

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 234 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 3
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2566: fix two todo list

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: fix two todo list
......................................................................


Patch Set 2:

(2 comments)

The formatting in this patch got kind of messed up; there's a lot of empty space between each new line of code.

http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h@88
PS2, Line 88:   void FindRowSetsIntersectingInterval(const Slice &lower_bound,
            :                                        const Slice &upper_bound,
            :                                        std::vector<RowSet *> *rowsets) const;
            :   void FindRowSetsIntersectingIntervalGE(const Slice& lower_bound,
            :                                          std::vector<RowSet*> *rowsets) const;
            :   void FindRowSetsIntersectingIntervalLT(const Slice& upper_bound,
            :                                          std::vector<RowSet*> *rowsets) const;
Rather than introducing new FindRowSetsIntersectingInterval... methods, could we rewrite the existing method like this:

  void FindRowSetsIntersectingInterval(std::vector<RowSet*>* rowsets,
                                       boost::optional<Slice> lower_bound,
                                       boost::optional<Slice> upper_bound) const;

Then if you wanted to omit one of the bounds, it's a simple matter of passing boost::none in for it. The function could check that at least one bound was provided.


http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h
File src/kudu/util/interval_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h@140
PS2, Line 140:   // Greater than or equal.
             :   // The 'query' is the lower_bound.
             :   template<class QueryPointType>
             :   void FindIntersectingIntervalGE(const QueryPointType& query,
             :                                   IntervalVector *results) const;
             :   // Less than or equal.
             :   // The 'query' is the upper_bound.
             :   template<class QueryPointType>
             :   void FindIntersectingIntervalLT(const QueryPointType &query,
             :                                   IntervalVector *results) const;
             : 
Rather than introduce these new methods, would it be possible to modify interval_type and to allow infinity and negative infinity as end points? Then you could reuse FindIntersectingInterval for one-sided interval queries.



-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 2
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 04 Sep 2018 18:41:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop copying strings

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11381

to look at the new patch set (#7).

Change subject: KUDU-2566: Enhance rowset tree pruning and stop copying strings
......................................................................

KUDU-2566: Enhance rowset tree pruning and stop copying strings

Three improvements:
1) Support open-ended intervals:
   Previously, the rowset tree could only compute overlap with a
   closed and bounded interval. This changeset adds the ability to
   find overlap for unbounded intervals as well. As a result, tablet
   iterators with a single primary key bound are more efficient
   because they do not iterate over rowsets that don't overlap with
   the key bound.
2) End up fetching one more rowset for the upper bound which is
   exclusive:
   This changeset also adds the ability to query the rowset tree
   using an exclusive upper bound, whereas previously only inclusive
   intervals were supported. This makes scans more efficient since
   primary key upper bounds are passed as exclusive bounds, so now
   rowsets that overlap only in the upper bound are excluded.
3) Perf improvement:
   Using raw slices instead of copying to strings while querying.
   The copying from slices to string is discarded, which could help
   to increase the query performance.

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 343 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 7
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop coping string while querying

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11381

to look at the new patch set (#6).

Change subject: KUDU-2566: Enhance rowset tree pruning and stop coping string            while querying
......................................................................

KUDU-2566: Enhance rowset tree pruning and stop coping string
           while querying

Three improvements:
1) Support open-ended intervals:
   Previously, the rowset tree could only compute overlap with a
   closed and bounded interval. This changeset adds the ability to
   find overlap for unbounded intervals as well. As a result, tablet
   iterators with a single primary key bound are more efficient
   because they do not iterate over rowsets that don't overlap with
   the key bound.
2) End up fetching one more rowset for the upper bound which is
   exclusive:
   This changeset also adds the ability to query the rowset tree
   using an exclusive upper bound, whereas previously only inclusive
   intervals were supported. This makes scans more efficient since
   primary key upper bounds are passed as exclusive bounds, so now
   rowsets that overlap only in the upper bound are excluded.
3) Perf improvement:
   using raw slices instead of copying to strings while querying.
   The copying from slices to string is discarded, which could help
   to increase the query performance.

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 369 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 6
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop copying strings

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11381

to look at the new patch set (#8).

Change subject: KUDU-2566: Enhance rowset tree pruning and stop copying strings
......................................................................

KUDU-2566: Enhance rowset tree pruning and stop copying strings

Three improvements:
1) Support open-ended intervals:
   Previously, the rowset tree could only compute overlap with a
   closed and bounded interval. This changeset adds the ability to
   find overlap for unbounded intervals as well. As a result, tablet
   iterators with a single primary key bound are more efficient
   because they do not iterate over rowsets that don't overlap with
   the key bound.
2) End up fetching one more rowset for the upper bound which is
   exclusive:
   This changeset also adds the ability to query the rowset tree
   using an exclusive upper bound, whereas previously only inclusive
   intervals were supported. This makes scans more efficient since
   primary key upper bounds are passed as exclusive bounds, so now
   rowsets that overlap only in the upper bound are excluded.
3) Perf improvement:
   Using raw slices instead of copying to strings while querying.
   The copying from slices to string is discarded, which could help
   to increase the query performance.

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 352 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11381/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 8
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop copying strings

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: Enhance rowset tree pruning and stop copying strings
......................................................................


Patch Set 9:

(47 comments)

thanks to Adar and Will.

http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@7
PS2, Line 7: Enhance rowset tr
> Could you make the subject more descriptive? Howabout "[tablet] KUDU-2566: 
Done


http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@10
PS2, Line 10: Support open-ended intervals
> Following good commit message guidelines, could you explain what the previo
Done


http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@11
PS2, Line 11: Previously, the rowset tree could only compute overlap with a
> Again, let's fill out the explanation here a bit. My suggestion (adding on 
Done


http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG
Commit Message:

PS5: 
> Thanks for rewriting the commit message. It's really nice now.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG@7
PS5, Line 7: stop copying strings
> nit: "stop copying strings"
Done


http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG@11
PS5, Line 11: 
> nit: Could you wrap commit message lines at 72 characters?
Done


http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG@8
PS6, Line 8: 
> I think git only uses the first line as the subject so inserting a line bre
Done


http://gerrit.cloudera.org:8080/#/c/11381/4/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/4/src/kudu/tablet/rowset_tree-test.cc@18
PS4, Line 18: #include <algorithm>
> warning: #includes are not sorted properly [llvm-include-order]
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@211
PS5, Line 211: domized
> By the convention of the Google style guide, we use references arguments on
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@218
PS5, Line 218: 
> I think we want coverage of the degenerate case when s1 == s2 (so the inter
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@217
PS5, Line 217: UND_EQUAL
             :   };
             :   const auto& GetStringPair = [] (const BoundOperator op) -> std::pair<string, string> {
             :     while (true) {
             :       string
> You can simplify this a little bit to
Done


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc@251
PS6, Line 251:   vector<RowSet*> out;
> Don't need to SeedRandom() twice in a test.
Done


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc@283
PS6, Line 283:         testing::Values(10, 100, 250, 500),
> Should either fix this test case, or remove it.
Done


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree.h@96
PS6, Line 96:   //  [lower_bound, upper_bound)
> Nit: got a tab here.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@90
PS5, Line 90: en 'lower_bound' is boost::none
> Can you add a comment explaining what it means to pass boost::none for 'low
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@90
PS5, Line 90: 
> style nit: We put the &'s and *'s for refs and pointers by the type, not th
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@91
PS5, Line 91: 
> Likewise.
Done


http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h@88
PS2, Line 88:                                    const std::function<void(RowSet*, int)>& cb) const;
            : 
            :   // When 'lower_bound' is boost::none, it means negative infinity.
            :   // When 'upper_bound' is boost::none, it means positive infinity.
            :   // So the query interval can be one of below:
            :   //  [-OO, +OO)
            :   //  [-OO, upper_bound)
> +1. We could also accept boost::none for both arguments and short-circuit r
some tests, example TestTablet.TestRowIteratorSimple in tablet-test.cc, will rely on rowsets's sequence, so we need keep the original logic.


http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h@88
PS2, Line 88:                                    const std::function<void(RowSet*, int)>& cb) const;
            : 
            :   // When 'lower_bound' is boost::none, it means negative infinity.
            :   // When 'upper_bound' is boost::none, it means positive infinity.
            :   // So the query interval can be one of below:
            :   //  [-OO, +OO)
            :   //  [-OO, upper_bound)
> Rather than introducing new FindRowSetsIntersectingInterval... methods, cou
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc
File src/kudu/tablet/rowset_tree.cc:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@107
PS5, Line 107: & 
> Likewise with the & placement nit.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@108
PS5, Line 108: & 
> Likewise with the & placement nit.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@109
PS5, Line 109: const
> I don't think the const specifier is meaningful on a bool passed by value.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@117
PS5, Line 117: & 
> Likewise with the & placement nit.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@118
PS5, Line 118: & 
> Likewise with the & placement nit.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@119
PS5, Line 119: const
> Ditto on the const specifier.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@194
PS5, Line 194: & 
> Likewise with the & placement nit.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@195
PS5, Line 195: & 
> Likewise with the & placement nit.
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@196
PS5, Line 196: * 
> * next to type, not parameter name.
Done


http://gerrit.cloudera.org:8080/#/c/11381/3/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/11381/3/src/kudu/tablet/tablet.cc@1773
PS3, Line 1773: 
              :   // Grab the memrowset iterator.
              :   gscoped_ptr<RowwiseIterator> ms_iter;
              :   RETURN_NOT_OK(components_->memrowset->NewRowIterator(opts, &ms_iter));
              :   ret.emplace_back(ms_iter.release());
              : 
              :   opts.io_context = io_context;
              : 
should check spec != nullptr first, because some test cases's spec is nullptr.


http://gerrit.cloudera.org:8080/#/c/11381/3/src/kudu/tablet/tablet.cc@1782
PS3, Line 1782: if (spec != nullptr && (spec->lower_bound_key() || spec->exclusive_upper_bound_key())) {
              :     boost::optional<Slice> lower_bound = spec->lower_bound_key() ? \
              :         boost::optional<Slice>(spec->lower_bound_key()->encoded_key()) : boost::none;
              :     boost::optional<Slice> upper_bound = spec->exclusive_upper_bound_key() ? \
              :         boost::optional<Slice>(spec->exclusive_upper_bound_key()->encoded_key()) : boost::none;
              :     vector<RowSet*> interval_sets;
              :     components_->rowsets->FindRowSetsIntersectingInterval(lower_bound, upper_bound, &interval_sets);
              :    
some tests, example TestTablet.TestRowIteratorSimple in tablet-test.cc, will rely on rowsets's sequence, so we need keep the original logic.


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/tablet.cc@1774
PS6, Line 1774:   // Grab the memrowset iterator.
              :   gscoped_ptr<RowwiseIterator> ms_iter;
              :   RETURN_NOT_OK(components_->memrowset->NewRowIterator(opts, &ms_iter));
              :   ret.emplace_back(ms_iter.release());
              : 
              :   opts.io_context = io_context;
              : 
              :   // 
> Nit: combine using ternaries:
Done


http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc@1793
PS7, Line 1793:                                      rs->ToString()));
              :       ret.emplace_back(row_it.release());
> Can remove this.
Done


http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc@2362
PS7, Line 2362: 
> nit: extra line
Done


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h
File src/kudu/util/interval_tree-inl.h:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h@402
PS6, Line 402: interval
> Nit: 'intervals' was correct; this deals with multiple intervals.
"interval ... is ... "  | "intervals ... are ... ", maybe the former is simple :)


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h@402
PS6, Line 402: interval
> I think 'interval' can be singular or plural as long as the rest of the sen
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h
File src/kudu/util/interval_tree-inl.h:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h@402
PS5, Line 402: interval 
> nit: interval
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h@409
PS5, Line 409: POSI
> To make this easier to understand locally, could you do one of two things:
Done


http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc
File src/kudu/util/interval_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@66
PS1, Line 66:       //            |     |
> warning: redundant boolean literal in conditional return statement [readabi
Done


http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@71
PS1, Line 71:       // [-OO,    upper)
> warning: redundant boolean literal in conditional return statement [readabi
Done


http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@223
PS1, Line 223:     tree.FindIntersectingInterval(lower, upper, &results);
> warning: the const qualified parameter 'all_intervals' is copied for each i
Done


http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@238
PS1, Line 238:   }
> warning: the const qualified parameter 'all_intervals' is copied for each i
Done


http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-test.cc
File src/kudu/util/interval_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-test.cc@64
PS5, Line 64: wer == boost::none &
> I'm getting confused now about which intervals are inclusive of the right e
Done


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-test.cc
File src/kudu/util/interval_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-test.cc@59
PS6, Line 59: // boost::none means infinity.
            :   // [left,  right] is closed interval.
            :   // [lower, upper) is half-open interval, so the upper is exclusive.
> This and the pictures are great. Thanks for adding them!
Done


http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/util/interval_tree-test.cc
File src/kudu/util/interval_tree-test.cc:

http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/util/interval_tree-test.cc@187
PS7, Line 187: & 
> nit: Here and below, & with type.
Done


http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree.h
File src/kudu/util/interval_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree.h@145
PS6, Line 145:   void FindIntersectingInterval(const QueryPointType& lower_bound,
             :                                 const QueryPointType& upper_bound,
> Nit: const QueryPointType&
Done


http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h
File src/kudu/util/interval_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h@140
PS2, Line 140: 
             :   // Find all intervals in the tree which intersect the given interval.
             :   // The resulting intervals are added to the 'results' vector.
             :   // The vector is not cleared first.
             :   template<class QueryPointType>
             :   void FindIntersectingInterval(const QueryPointType& lower_bound,
             :                                 const QueryPointType& upper_bound,
             :                                 IntervalVector* results) const;
             :  private:
             :   static void Partition(const IntervalVector &in,
             : 
> Rather than introduce these new methods, would it be possible to modify int
Done


http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h@140
PS2, Line 140: 
             :   // Find all intervals in the tree which intersect the given interval.
             :   // The resulting intervals are added to the 'results' vector.
             :   // The vector is not cleared first.
             :   template<class QueryPointType>
             :   void FindIntersectingInterval(const QueryPointType& lower_bound,
             :                                 const QueryPointType& upper_bound,
             :                                 IntervalVector* results) const;
             :  private:
             :   static void Partition(const IntervalVector &in,
             : 
> +1 to the idea though I think it might suffice to add an additional QueryIn
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 9
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Wed, 31 Oct 2018 07:23:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop copying strings

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: Enhance rowset tree pruning and stop copying strings
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 8
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 11 Sep 2018 16:24:08 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2566: fix two todo list

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: fix two todo list
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@7
PS2, Line 7: fix two todo list
Could you make the subject more descriptive? Howabout "[tablet] KUDU-2566:  Enhance rowset tree pruning"?


http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@10
PS2, Line 10: support open-ended intervals
Following good commit message guidelines, could you explain what the previous behavior was, and how this patch changes it? Maybe:

Previously, the rowset tree only could only compute overlap with a closed and bounded interval. This changeset adds the ability to find overlap for unbounded intervals as well. As a result, tablet iterators with a single primary key bound are more efficient because they do not iterate over rowsets that don't overlap with the key bound.


http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@11
PS2, Line 11: end up fetching one more rowset for the upper bound which is exclusive
Again, let's fill out the explanation here a bit. My suggestion (adding on to the previous suggestion):

This changeset also adds the ability to query the rowset tree using an exclusive upper bound, whereas previously only inclusive intervals were supported. This makes scans more efficient since primary key upper bounds are passed as exclusive bounds, so now rowsets that overlap only in the upper bound are excluded.


http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h@88
PS2, Line 88:   void FindRowSetsIntersectingInterval(const Slice &lower_bound,
            :                                        const Slice &upper_bound,
            :                                        std::vector<RowSet *> *rowsets) const;
            :   void FindRowSetsIntersectingIntervalGE(const Slice& lower_bound,
            :                                          std::vector<RowSet*> *rowsets) const;
            :   void FindRowSetsIntersectingIntervalLT(const Slice& upper_bound,
            :                                          std::vector<RowSet*> *rowsets) const;
> Rather than introducing new FindRowSetsIntersectingInterval... methods, cou
+1. We could also accept boost::none for both arguments and short-circuit return all rowsets, which would allow the logic in tablet.cc to be very simple (because it's pushed into the rowset tree).


http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h
File src/kudu/util/interval_tree.h:

http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h@140
PS2, Line 140:   // Greater than or equal.
             :   // The 'query' is the lower_bound.
             :   template<class QueryPointType>
             :   void FindIntersectingIntervalGE(const QueryPointType& query,
             :                                   IntervalVector *results) const;
             :   // Less than or equal.
             :   // The 'query' is the upper_bound.
             :   template<class QueryPointType>
             :   void FindIntersectingIntervalLT(const QueryPointType &query,
             :                                   IntervalVector *results) const;
             : 
> Rather than introduce these new methods, would it be possible to modify int
+1 to the idea though I think it might suffice to add an additional QueryIntervalType analogous to QueryPointType, and have the QueryIntervalType be able to be unbounded. The interval tree stores objects of type interval_type, so generalizing this type to support open-ended ranges means the interval tree needs to support storing open ended intervals, which is not necessary for the rowset tree.

In particular, the rowset tree's interval_type is RowsetWithBounds, which appears ot have been optimized for the rowset compaction algorithm, so there's other perf implications for changing it to be able to describe open-ended intervals..



-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 2
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 05 Sep 2018 15:30:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2566: Enhance rowset tree pruning and stop copying strings

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11381 )

Change subject: KUDU-2566: Enhance rowset tree pruning and stop copying strings
......................................................................

KUDU-2566: Enhance rowset tree pruning and stop copying strings

Three improvements:
1) Support open-ended intervals:
   Previously, the rowset tree could only compute overlap with a
   closed and bounded interval. This changeset adds the ability to
   find overlap for unbounded intervals as well. As a result, tablet
   iterators with a single primary key bound are more efficient
   because they do not iterate over rowsets that don't overlap with
   the key bound.
2) End up fetching one more rowset for the upper bound which is
   exclusive:
   This changeset also adds the ability to query the rowset tree
   using an exclusive upper bound, whereas previously only inclusive
   intervals were supported. This makes scans more efficient since
   primary key upper bounds are passed as exclusive bounds, so now
   rowsets that overlap only in the upper bound are excluded.
3) Perf improvement:
   Using raw slices instead of copying to strings while querying.
   The copying from slices to string is discarded, which could help
   to increase the query performance.

Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Reviewed-on: http://gerrit.cloudera.org:8080/11381
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/util/interval_tree-inl.h
M src/kudu/util/interval_tree-test.cc
M src/kudu/util/interval_tree.h
7 files changed, 352 insertions(+), 85 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79
Gerrit-Change-Number: 11381
Gerrit-PatchSet: 9
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>