You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Grant Henke (Code Review)" <ge...@cloudera.org> on 2019/01/25 18:43:41 UTC

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12275


Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 279 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2167
PS1, Line 2167:                                             long deadline) {
> Would the plumbing be a little cleaner if this were a DeadLineTracker rathe
I think it makes calling these methods a bit more of a pain plumbing the long deadline from the public methods as far down as possible is easier.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@17
PS1, Line 17: it will not reflect any metadata changes to the table that have occurred
> duplicated
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@32
PS1, Line 32: they
> Nit: "the user" or "the client"
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@37
PS1, Line 37:     while (true) {
> I don't think this should be done here; we don't typically do blocking oper
I can change over to a builder pattern.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@47
PS1, Line 47: catch (KuduException ex) {
            :         throw new IllegalStateException(ex);
            :       }
> Let's do this in a different method where we could throw the original KuduE
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@76
PS1, Line 76:    * @return The resulting partition index, or -1 if the row falls into a
            :    *         non-covered range. The result will be less than numPartitions().
> Wouldn't it be more Java-rific to throw an exception if the row falls into 
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@87
PS1, Line 87:     if (entry == null) {
            :       throw new IllegalArgumentException();
            :     }
> Given the presence of the sentinel, how is this case even possible? If it's
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@51
PS1, Line 51:   public void testPartitioner() throws Exception {
> I think it'd also be good to test the partitioner in a situation where it's
I added a test to check for expected NonCoveredRangeExceptions. Beyond that a partitioner that fails cannot be built.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@95
PS1, Line 95: 
            :     // We don't expect a completely even division of rows into partitions, but
            :     // we should be within 10% of that.
> Couldn't we just calculate the exact number of rows to fall into each parti
I primarily wrote the test this way to match the c++ tests and verify the Java implementation behaves the same way. 

We could in hindsight with knowledge of the hash function used, but I didn't want this test to do that because "knowing" where the partition lands is basically what the partitioner code does. My test would become a re-implementation of the partitioner in a way.



-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 06 Feb 2019 19:40:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2167
PS1, Line 2167:                                             long deadline) {
Would the plumbing be a little cleaner if this were a DeadLineTracker rather than a raw deadline value?


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

PS1: 
License header.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@17
PS1, Line 17: it will not reflect any metadata changes to the table that have occurred
duplicated


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@32
PS1, Line 32: they
Nit: "the user" or "the client"


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@37
PS1, Line 37:     while (true) {
I don't think this should be done here; we don't typically do blocking operations in constructors.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@47
PS1, Line 47: catch (KuduException ex) {
            :         throw new IllegalStateException(ex);
            :       }
Let's do this in a different method where we could throw the original KuduException.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@76
PS1, Line 76:    * @return The resulting partition index, or -1 if the row falls into a
            :    *         non-covered range. The result will be less than numPartitions().
Wouldn't it be more Java-rific to throw an exception if the row falls into a non-covered range?


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@87
PS1, Line 87:     if (entry == null) {
            :       throw new IllegalArgumentException();
            :     }
Given the presence of the sentinel, how is this case even possible? If it's not, I'd just combine the whole thing:

  return partitionByStartKey.floorEntry(partitionKey).getValue();


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@51
PS1, Line 51:   public void testPartitioner() throws Exception {
I think it'd also be good to test the partitioner in a situation where it's not possible to perform the partitioning (i.e. masters down or something).


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@95
PS1, Line 95: 
            :     // We don't expect a completely even division of rows into partitions, but
            :     // we should be within 10% of that.
Couldn't we just calculate the exact number of rows to fall into each partition? There's no randomness here so the assertion fuzziness seems unwarranted.



-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sun, 27 Jan 2019 19:22:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#2).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 363 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#5).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 408 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Reviewed-on: http://gerrit.cloudera.org:8080/12275
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 430 insertions(+), 8 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 8
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1825
PS6, Line 1825: propogte
propagate


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2163
PS6, Line 2163:    * @param table the table
Update this list.


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2212
PS6, Line 2212:                 if (lookupType == LookupType.POINT || entry.getUpperBoundPartitionKey().length == 0) {
Too long; wrap?


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@178
PS6, Line 178:     // Ensure the table information can't be found to force a timeout.
             :     harness.killAllMasterServers();
Hmm, surprised the partitioner use below doesn't hit the table locations cache.


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@182
PS6, Line 182:     long now = System.currentTimeMillis();
Nit: probably more useful as 'start'.



-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 13 Feb 2019 00:16:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#7).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 430 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/7
-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@133
PS4, Line 133:      * Defaults to the {@link AsyncKuduClient#getDefaultAdminOperationTimeoutMs()}.
> seems to be missing the equivalent of SetBuildTimeout() in the C++ impl
Done


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@137
PS4, Line 137: li
> NON_COVERED_RANGE_INDEX
Done


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@146
PS4, Line 146: neTracker deadlineTracker = new DeadlineTrac
> This times the number of partitions could take arbitrarily long so it seems
Done


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@155
PS4, Line 155: 
> NON_COVERED_RANGE_INDEX
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 11 Feb 2019 18:39:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@133
PS4, Line 133:     public KuduPartitioner build() throws KuduException {
seems to be missing the equivalent of SetBuildTimeout() in the C++ impl


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@137
PS4, Line 137: -1
NON_COVERED_RANGE_INDEX


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@146
PS4, Line 146: AsyncKuduClient.DEFAULT_OPERATION_TIMEOUT_MS
This times the number of partitions could take arbitrarily long so it seems useful to be able to set a timeout on the while series of operations like I mentioned in another comment (since this is intended to be a general-purpose API)


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@155
PS4, Line 155: -1
NON_COVERED_RANGE_INDEX

This is a bit tricky and I think is worth commenting that if there are non-covered ranges, the start of the non-covered range will have the value of NON_COVERED_RANGE_INDEX while if the range is covered then the start key for the next partition will overwrite the NON_COVERED_RANGE_INDEX value written here.



-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 07 Feb 2019 10:07:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 13 Feb 2019 16:52:58 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 6: Code-Review+1

(1 comment)

looks good just a minor test comment

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@192
PS6, Line 192:     assertTrue(elapsed <= timeoutMs);
Why would this hold? Seems potentially flaky. How about < timeoutMs * 2 or something like that instead?



-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 11 Feb 2019 18:59:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1825
PS6, Line 1825: propogte
> propagate
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2163
PS6, Line 2163:    * @param table the table
> Update this list.
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2212
PS6, Line 2212:                 if (lookupType == LookupType.POINT || entry.getUpperBoundPartitionKey().length == 0) {
> Too long; wrap?
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@178
PS6, Line 178:     // Ensure the table information can't be found to force a timeout.
             :     harness.killAllMasterServers();
> Hmm, surprised the partitioner use below doesn't hit the table locations ca
client.createTable doesn't populate the cache with all the partition information. I will add a test that proves the cache is being used.


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@182
PS6, Line 182:     long now = System.currentTimeMillis();
> Nit: probably more useful as 'start'.
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@192
PS6, Line 192:     assertTrue(elapsed <= timeoutMs);
> Why would this hold? Seems potentially flaky. How about < timeoutMs * 2 or 
I haven't seen flakiness. I can add a small amount of time but I don't want to double the timeout since that isn't representative of validating the timeout.



-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 13 Feb 2019 16:17:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#6).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 408 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#3).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
......................................................................

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 357 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>