You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2016/08/17 02:23:20 UTC

[kudu-CR](gh-pages) new range partitioning features blog post

Dan Burkert has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4012

Change subject: new range partitioning features blog post
......................................................................

new range partitioning features blog post

Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
---
A _posts/2016-08-18-new-range-partitioning-features.md
A img/new-range-partitioning-features/range-partitioning-on-time.png
2 files changed, 59 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4012/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has uploaded a new patch set (#3).

Change subject: new range partitioning features blog post
......................................................................

new range partitioning features blog post

Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
---
A _posts/2016-08-18-new-range-partitioning-features.md
A img/2016-08-18-new-range-partitioning-features/range-and-hash-partitioning.png
A img/2016-08-18-new-range-partitioning-features/range-partitioning-on-time.png
3 files changed, 97 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4012/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Misty Stanley-Jones (Code Review)" <ge...@cloudera.org>.
Misty Stanley-Jones has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4012/3/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

PS3, Line 28: the split is not a clean break in the middle of the tablet.
Can you explain this a little more? I think it's confusing to someone newer to Kudu because they may not realize that "middle" here refers to the middle by primary key, as the data is stored, and instead think middle as in the partition key. Perhaps just mention that the storage is sorted by primary key so a split on range key means potentially cherry-picking rows and re-compacting.


PS3, Line 37: should not
does not -- we're sure it doesn't preclude it, the question is just whether we'll do it


PS3, Line 44: the first and last
            : partition
'the first and last partitions', or 'the first partition and the last partition'


PS3, Line 46: range partitioned
nit: range-partitioned


PS3, Line 48: any other
s/any other/in any other


PS3, Line 48: Unbalanced partitions are commonly
            : referred to as hotspotting
Hmm this sounds off to me. I think it needs to be 'unbalanced partitions are commonly referred to as hotspots' or something like 'the occurrence of unbalanced partitions is commonly referred to as hotspotting'


PS3, Line 50: timeseries
s/timeseries/time series (consistent with elsewhere)


PS3, Line 66: lazily adding range partitions on the
            : fly
'lazily' + 'on the fly' is paradoxically redundant...I'm split over which I prefer though. 'lazily' seems more accurate but 'on-the-fly' sounds better.


PS3, Line 80: a
s/a/one


PS3, Line 94: uesers
typo: users


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4012/2/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

PS2, Line 8: work loads
workloads?


PS2, Line 49: is
typo: a strayed 'is'


PS2, Line 49: difficult
typo: an extra 'difficult'


PS2, Line 66: into
into --> in ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 1:

(15 comments)

I mostly proofread for style.

http://gerrit.cloudera.org:8080/#/c/4012/1/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

Line 14: Since Kudu's initial release way back with version 0.5, tables have had the
"Since Kudu's initial release, tables..."


Line 15: constraint that once created, a table's partitioning and tablets are static. The
"table's partitioning and tablets" is odd. Aren't tables partitioned into tablets? So aren't they kind of one and the same?


Line 16: range partitions that the table is created with are permanent, and can not be
Perhaps join this sentence to the previous one with a semi-colon and reword as "that is, a table's range partitions are permanent and cannot be changed".


Line 19: other partitions. In order to support adding and dropping range partitions, we
"To support..., we removed an even more fundamental range partition restriction."


PS1, Line 31: time
            : series
Here you're calling it "time series", not "timeseries". I don't have a preference but could you use the same term throughout?


PS1, Line 32: to range partition a
            : table
Maybe "to partition a table by range on..."


Line 42: Now that tables are no longer forced to have range partitions cover all possible
"Now that tables must no longer cover all possible rows". Hmm, that's not great either. Can you reword?


PS1, Line 45: keep on
"continue"


PS1, Line 46: the hotspotting problem
I assume you're referring to the "This is a big problem for timeseries data" above. If so, perhaps you can introduce the term "hotspotting problem" up there, to help when you refer to it here?


PS1, Line 48: old range partitions for time periods which are no longer needed
            : can be efficiently deleted by dropping the entire range partition.
There's singular/plural confusion here. Perhaps "an old range partition for a time period that is no longer needed can be efficiently deleted by dropping the entire range partition"? Still seems a little verbose, maybe you can reword differently.


PS1, Line 56: mixed
"combined"


PS1, Line 57: make up
"comprise"


PS1, Line 58: hash hash
"hash"


PS1, Line 58: droping
"dropping"


PS1, Line 59: adding or dropping a tablet per hash bucket.
"the addition or removal of one tablet per hash bucket."


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 5:

@aserbin, that restriction pretty much still stands, to alter the table schema would be more akin to changing the set of columns in the range partitioning, or changing the number of buckets in a hash partition.  It's definitely a subtle difference though, and the schema design guide needs to be updated to take these new features into account.

-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 5: Code-Review+1

This is not related directly to the blog post, but I found that at least in docs/schema_design.adoc the following is mentioned:

line 332:
Non-alterable Partition Schema:: Kudu does not allow you to alter the
  partition schema after table creation.

Probably, there are more files under the docs subdir which mention that restriction.  Does it make sense to update that somehow?

-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Todd Lipcon,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4012

to look at the new patch set (#5).

Change subject: new range partitioning features blog post
......................................................................

new range partitioning features blog post

Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
---
A _posts/2016-08-18-new-range-partitioning-features.md
A img/2016-08-18-new-range-partitioning-features/range-and-hash-partitioning.png
A img/2016-08-18-new-range-partitioning-features/range-partitioning-on-time.png
3 files changed, 99 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4012/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 3:

(18 comments)

updated rendering: http://danburkert.github.io/kudu/2016/08/18/new-range-partitioning-features.html

http://gerrit.cloudera.org:8080/#/c/4012/2/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

PS2, Line 8: workloads,
> workloads?
Done


PS2, Line 43: divide a
> maybe 'divide'?
Done


PS2, Line 47: incre
> increases in value
Done


PS2, Line 49: nt
> typo: a strayed 'is'
Done


PS2, Line 49: 10 it has
> typo: an extra 'difficult'
Done


Line 63: unoccupied space. Dropping a range partition will result in unoccupied space
> Just wanted to clarify on some imaginary example.
No, that would be the range split feature described earlier.  What you can do is make narrower partitions going forward.


PS2, Line 66: in t
> into --> in ?
Done


PS2, Line 97: 
            : 
> Not sure I buy this example -- wouldn't different devices likely generate d
K, I've removed this section.  If you think I should rewrite it to be something like geographic areas I can do that.


http://gerrit.cloudera.org:8080/#/c/4012/3/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

PS3, Line 28: the split is not a clean break in the middle of the tablet.
> Can you explain this a little more? I think it's confusing to someone newer
Good idea, I've tried to make this better.  Let me know what you think.


PS3, Line 37: should not
> does not -- we're sure it doesn't preclude it, the question is just whether
Done


PS3, Line 44: the first and last
            : partition
> 'the first and last partitions', or 'the first partition and the last parti
Done


PS3, Line 46: range partitioned
> nit: range-partitioned
Done


PS3, Line 48: any other
> s/any other/in any other
Done


PS3, Line 48: Unbalanced partitions are commonly
            : referred to as hotspotting
> Hmm this sounds off to me. I think it needs to be 'unbalanced partitions ar
Done


PS3, Line 50: timeseries
> s/timeseries/time series (consistent with elsewhere)
Done


PS3, Line 66: lazily adding range partitions on the
            : fly
> 'lazily' + 'on the fly' is paradoxically redundant...I'm split over which I
Done


PS3, Line 80: a
> s/a/one
Done


PS3, Line 94: uesers
> typo: users
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4012/2/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

PS2, Line 43: separate
maybe 'divide'?


PS2, Line 47: grows
increases in value


PS2, Line 97: A table is created to store the data from the
            : devices with a range partition per device category.
Not sure I buy this example -- wouldn't different devices likely generate different types of data?

Why not just use different tables for this kind of use case?


PS2, Line 105: ategory.
categorically?


PS2, Line 109: tables
users?


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Misty Stanley-Jones, Will Berkeley, Alexey Serbin,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4012

to look at the new patch set (#6).

Change subject: new range partitioning features blog post
......................................................................

new range partitioning features blog post

Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
---
A _posts/2016-08-23-new-range-partitioning-features.md
A img/2016-08-23-new-range-partitioning-features/range-and-hash-partitioning.png
A img/2016-08-23-new-range-partitioning-features/range-partitioning-on-time.png
3 files changed, 99 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4012/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has uploaded a new patch set (#4).

Change subject: new range partitioning features blog post
......................................................................

new range partitioning features blog post

Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
---
A _posts/2016-08-18-new-range-partitioning-features.md
A img/2016-08-18-new-range-partitioning-features/range-and-hash-partitioning.png
A img/2016-08-18-new-range-partitioning-features/range-partitioning-on-time.png
3 files changed, 99 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4012/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 6: Code-Review+2 Verified+1

Going to push this tonight and schedule some tweets to go out in the morning

-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4012/2/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

Line 63: unoccupied space. Dropping a range partition will result in unoccupied space
Just wanted to clarify on some imaginary example.

Dropping partition involves dropping the real data which 'bucketed' into the partition, doesn't it?  If so, what would be the right procedure for 'sub-partitioning': suppose I have a table with timestamp column and the original partitioning was range-based on that column with one tablet per month partitioning.  Later on, when data began to accumulate, it became clear there should have been one tablet per day partitioning instead.  Is it possible to re-partition the table not dropping already accumulated data for the last month, so that the data would use the new tablet-per-day partitioning for data coming tomorrow and in the future?


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has uploaded a new patch set (#2).

Change subject: new range partitioning features blog post
......................................................................

new range partitioning features blog post

Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
---
A _posts/2016-08-18-new-range-partitioning-features.md
A img/2016-08-18-new-range-partitioning-features/range-and-hash-partitioning.png
A img/2016-08-18-new-range-partitioning-features/range-partitioning-on-time.png
A img/2016-08-18-new-range-partitioning-features/range-partitioning-with-categories.png
4 files changed, 112 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4012/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4012/4/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

PS4, Line 27: store
> typo: stored
Done


PS4, Line 51: it has been a difficult problem to
            : avoid
> instead "they have been difficult to avoid"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 2:

(20 comments)

New rendered version: https://github.com/danburkert/kudu/blob/gh-pages/_posts/2016-08-18-new-range-partitioning-features.md

http://gerrit.cloudera.org:8080/#/c/4012/1/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

Line 14: Since Kudu's initial release, tables have had the constraint that once created,
> "Since Kudu's initial release, tables..."
Done


Line 15: the set of partitions is static. This forces users to plan ahead and create
> "table's partitioning and tablets" is odd. Aren't tables partitioned into t
No, many tablets can belong to a single range partition.


Line 16: enough partitions for the expected size of the table, because once the table is
> Perhaps join this sentence to the previous one with a semi-colon and reword
Done


Line 17: created no further partitions can be added. When using hash partitioning,
> s/is adding/provides
Done


Line 19: range partitioning, however, knowing where to put the extra partitions ahead of
> "To support..., we removed an even more fundamental range partition restric
Done


Line 26: remote server. Range splitting is particularly thorny with Kudu, because Kudu
> I know "disjoint" is probably accepted jargon, but it is jargon nontheless.
I ended up making it even more jargony, but I'm not sure what else to do.


PS1, Line 31: 
            : As an 
> Here you're calling it "time series", not "timeseries". I don't have a pref
Done


PS1, Line 32: lows range partitions
            : to be
> Maybe "to partition a table by range on..."
Done


Line 42: Previously, range partitions could only be created by specifying split points.
> "Now that tables must no longer cover all possible rows". Hmm, that's not g
Done


Line 42: Previously, range partitions could only be created by specifying split points.
> s/forced/required
Done


Line 43: Split points separate an implicit partition covering the entire range into
> s/we/Kudu
Done


PS1, Line 45: unded b
> "continue"
Done


Line 46: the final partition being unbounded is that datasets which are range partitioned
> on the fly,
Done


PS1, Line 46: nded is that datasets w
> I assume you're referring to the "This is a big problem for timeseries data
Done


PS1, Line 48:  any other. Unbalanced partitions are commonly referred to as
            : hotspotting, and until Kudu 0.10 is has been difficult a difficult
> There's singular/plural confusion here. Perhaps "an old range partition for
Done


PS1, Line 56:  poin
> "combined"
Done


PS1, Line 57: specifi
> "comprise"
Done


PS1, Line 58: uarantee 
> "hash"
Done


PS1, Line 58: corresp
> "dropping"
Done


PS1, Line 59: t, Kudu will now reject writes which fall in
> "the addition or removal of one tablet per hash bucket."
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: new range partitioning features blog post
......................................................................


new range partitioning features blog post

Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Reviewed-on: http://gerrit.cloudera.org:8080/4012
Reviewed-by: Todd Lipcon <to...@apache.org>
Tested-by: Todd Lipcon <to...@apache.org>
---
A _posts/2016-08-23-new-range-partitioning-features.md
A img/2016-08-23-new-range-partitioning-features/range-and-hash-partitioning.png
A img/2016-08-23-new-range-partitioning-features/range-partitioning-on-time.png
3 files changed, 99 insertions(+), 0 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Misty Stanley-Jones (Code Review)" <ge...@cloudera.org>.
Misty Stanley-Jones has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4012/1/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

Line 14: Since Kudu's initial release way back with version 0.5, tables have had the
s/way back with/in


Line 16: range partitions that the table is created with are permanent, and can not be
s/The range partitions that the table is created with are/A table's initial range partitions are


Line 17: changed. Kudu 0.10 is adding the ability to add and drop range partitions on the
s/is adding/provides


Line 26: multiple, disjoint partitions. When using split points, the first and last range
I know "disjoint" is probably accepted jargon, but it is jargon nontheless. Is there a more clear and less niche word?


Line 42: Now that tables are no longer forced to have range partitions cover all possible
s/forced/required
s/cover/covering


Line 43: rows, we can also support adding and dropping range partitions. In the example
s/we/Kudu


Line 46: partitions on the fly we avoid the hotspotting problem, and avoid having to
on the fly,

(and lose the comma after problem)

s/avoid having to/avoid the need to


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4012/4/_posts/2016-08-18-new-range-partitioning-features.md
File _posts/2016-08-18-new-range-partitioning-features.md:

PS4, Line 27: store
typo: stored


PS4, Line 30: instead of splitting the tablet in half
Nice. I think this will help newer users understand.


PS4, Line 51: it has been a difficult problem to
            : avoid
instead "they have been difficult to avoid"


-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR](gh-pages) new range partitioning features blog post

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: new range partitioning features blog post
......................................................................


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I53504d849c2aca9ff613b11e67d1533536283931
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Misty Stanley-Jones <mi...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No