You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2016/09/02 03:07:23 UTC

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Hello Adar Dembo, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/4299

to review the following change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................

KUDU-1065: [java client] Flexible Partition Pruning

This commit introduces an internal utility ByteVec class which is a
mashup of the C++ std::string / Rust Vec<u8> types. KeyEncoder has been
transitioned to use this type instead of ByteArrayOutputStream. The
partition pruning algorithm incrementally builds up partition keys from
predicates, and requires cloning the keys as they are being built in
order to multiply over hash partition buckets. ByteArrayOutputStream
doesn't have a clone method. ByteArrayOutputStream is also synchronized
internally, which is dumb. Thus begat ByteVec.

This version of partition pruning only looks at predicates when
determining which partitions to prune. Constraints in the primary key
bounds are not considered, unless the table is range partitioned over
the primary key columns and not hash partitioned (simple partitioning).
This limits the pruned partitions in some pretty rare cases, but the
workaround of explicitly setting the predicate is not too onerous.

Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduTable.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartialRow.java
A java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
A java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
A java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java
M java/kudu-client/src/test/resources/flags
M src/kudu/common/partition_pruner-test.cc
15 files changed, 1,745 insertions(+), 177 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/4299/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4299/3//COMMIT_MSG
Commit Message:

PS3, Line 16: hus begat 
any commit message including the phrase "Thus begat" wins extra points from me!


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

Line 311:    * Not for public consumption: use either predicates or primary key bounds instead.
given it's package-private, do you need this comment?


Line 325:    * Not for public consumption: use either predicates or primary key bounds instead.
same


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java:

Line 197:     if (len > 1) {
is this 'if' really necessary? seems like if len <= 1, the len - 2 would be negative and the for loop woudl already skip itself


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

PS3, Line 171:       // The range component is constrained.
             :       constrainedIndex = hashBuckets.size();
not 100% following here, can you explain further?


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
File java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java:

Line 116:   public void reserve(int additional) {
I find it surprising that 'reserve' here is reserving _additional_ bytes, vs the std::string::reserve() function which is the absolute value. Mind renaming to reserveAdditional() or reserveForAppend() or something of that nature? Or change it to be the same as std::string?


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java:

PS3, Line 97:     for (Partition partition : partitions) {
            :       if (!pruner.shouldPrune(partition)) scannedPartitions++;
            :     }
do we expect loops like this in real life? this seems o(n^2). Should we offer a 'prunePartitionList(List<Partition> partitions)' function?


-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


KUDU-1065: [java client] Flexible Partition Pruning

This commit introduces an internal utility ByteVec class which is a
mashup of the C++ std::string / Rust Vec<u8> types. KeyEncoder has been
transitioned to use this type instead of ByteArrayOutputStream. The
partition pruning algorithm incrementally builds up partition keys from
predicates, and requires cloning the keys as they are being built in
order to multiply over hash partition buckets. ByteArrayOutputStream
doesn't have a clone method. ByteArrayOutputStream is also synchronized
internally, which is dumb. Thus begat ByteVec.

This version of partition pruning only looks at predicates when
determining which partitions to prune. Constraints in the primary key
bounds are not considered, unless the table is range partitioned over
the primary key columns and not hash partitioned (simple partitioning).
This limits the pruned partitions in some pretty rare cases, but the
workaround of explicitly setting the predicate is not too onerous.

Finally, this commit changes the default test flags to remove mini
cluster verbose logging, since it is extremely noisy.

Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Reviewed-on: http://gerrit.cloudera.org:8080/4299
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduTable.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartialRow.java
A java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionSchema.java
A java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKeyEncoding.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
A java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java
M java/kudu-client/src/test/resources/flags
M src/kudu/common/partition_pruner-test.cc
17 files changed, 1,722 insertions(+), 198 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4299

to look at the new patch set (#3).

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................

KUDU-1065: [java client] Flexible Partition Pruning

This commit introduces an internal utility ByteVec class which is a
mashup of the C++ std::string / Rust Vec<u8> types. KeyEncoder has been
transitioned to use this type instead of ByteArrayOutputStream. The
partition pruning algorithm incrementally builds up partition keys from
predicates, and requires cloning the keys as they are being built in
order to multiply over hash partition buckets. ByteArrayOutputStream
doesn't have a clone method. ByteArrayOutputStream is also synchronized
internally, which is dumb. Thus begat ByteVec.

This version of partition pruning only looks at predicates when
determining which partitions to prune. Constraints in the primary key
bounds are not considered, unless the table is range partitioned over
the primary key columns and not hash partitioned (simple partitioning).
This limits the pruned partitions in some pretty rare cases, but the
workaround of explicitly setting the predicate is not too onerous.

Finally, this commit changes the default test flags to remove mini
cluster verbose logging, since it is extremely noisy.

Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduTable.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartialRow.java
A java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionSchema.java
A java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKeyEncoding.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
A java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java
M java/kudu-client/src/test/resources/flags
M src/kudu/common/partition_pruner-test.cc
17 files changed, 1,722 insertions(+), 198 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/4299/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/3285/

-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4299

to look at the new patch set (#2).

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................

KUDU-1065: [java client] Flexible Partition Pruning

This commit introduces an internal utility ByteVec class which is a
mashup of the C++ std::string / Rust Vec<u8> types. KeyEncoder has been
transitioned to use this type instead of ByteArrayOutputStream. The
partition pruning algorithm incrementally builds up partition keys from
predicates, and requires cloning the keys as they are being built in
order to multiply over hash partition buckets. ByteArrayOutputStream
doesn't have a clone method. ByteArrayOutputStream is also synchronized
internally, which is dumb. Thus begat ByteVec.

This version of partition pruning only looks at predicates when
determining which partitions to prune. Constraints in the primary key
bounds are not considered, unless the table is range partitioned over
the primary key columns and not hash partitioned (simple partitioning).
This limits the pruned partitions in some pretty rare cases, but the
workaround of explicitly setting the predicate is not too onerous.

Finally, this commit changes the default test flags to remove mini
cluster verbose logging, since it is extremely noisy.

Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduTable.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartialRow.java
A java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionSchema.java
A java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKeyEncoding.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
A java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java
M java/kudu-client/src/test/resources/flags
M src/kudu/common/partition_pruner-test.cc
17 files changed, 1,722 insertions(+), 197 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/4299/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 1:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

Line 311:    * Not for public consumption: use either predicates or primary key bounds instead.
> If Impala is no longer using these two, should we just remove them?
I tried that, but they are used internally.  I did move them from public to package private, so they shouldn't be accessible.


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

Line 142:   /** The partition pruner */
> Not a useful comment, just drop it.
Done


PS1, Line 260:     if (!table.getPartitionSchema().isSimpleRangePartitioning() &&
             :         (startPrimaryKey.length != 0 ||
             :          endPrimaryKey.length != 0) &&
             :         PARTITION_PRUNE_WARN.getAndSet(false)) {
             :       LOG.warn("Starting full table scan. " +
             :                "In the future this scan may be automatically optimized with partition pruning.");
> Is this still relevant?
Nope!


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java:

Line 348
> This is still a parameter though.
Done


Line 31: import java.io.ByteArrayOutputStream;
> Not used anymore?
Done


Line 45:   private final ByteVec buf = ByteVec.create();
> Not really seeing the point of this class member given that the first thing
I agree, I was just trying to preserve the existing structure of KeyEncoder.  I'll switch it over to be a non-instantiable utility class.  I don't think preserving ByteVec's is particularly important for performance, it should be a very shortlived object.  In fact there are a lot of cases where a KeyEncoder was getting constructed just to call one of these methods once, so I think it will be a total wash.


Line 63:   public static int getHashBucket(PartialRow row, HashBucketSchema hashSchema) {
> Javadoc?
Done


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

Line 328:           List <LocatedTablet> newTablets = table.getTabletsLocations(
> Nit: List<LocatedTablet>
Done


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

PS1, Line 292: { break; }
> Nit: why the braces if the break is on the same line as the condition?
Done


PS1, Line 360: Step 1:
> In this function there's only Step 1.
Done


Line 398:   private static byte[] pushPredicatesIntoUpperBoundRangeKey(Schema schema,
> I wonder if these push methods would be clearer if combined with a bool (or
I don't disagree, but these are pretty straight translations of existing methods in the C++ side.  I think keeping the methods in-sync is more important than anything, since this is some pretty dense code.  They are different in pretty subtle ways.  It is true that they are always called as a pair, though.


Line 433:         throw new IllegalArgumentException("unexpected predicate type can not be pushed into key");
> Nit: can you add the predicate as you did in pushPredicatesIntoLowerBoundRa
Done


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
File java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java:

PS1, Line 60:   /**
            :    * Creates a new vector.
            :    * @return the new vector.
            :    */
            :   public static ByteVec create() {
            :     return new ByteVec(DEFAULT_CAPACITY);
            :   }
            : 
            :   /**
            :    * Creates a new vector with the specified capacity.
            :    * @param capacity the initial capacity of the vector
            :    * @return a new vector with the specified capacity
            :    */
            :   public static ByteVec withCapacity(int capacity) {
            :     return new ByteVec(capacity);
            :   }
            : 
            :   /**
            :    * Wrap an existing array with a vector.
            :    * The array should not be modified after this call.
            :    * @param data the initial data for the vector
            :    * @return a vector wrapping the data
            :    */
            :   public static ByteVec wrap(byte[] data) {
            :     return new ByteVec(data);
            :   }
> You find these more useful than exposing the constructors directly? You cou
Yah I'm a fan of static ctor functions.  I can change it if you prefer proper ctors, though.


Line 115:   public void reserve(int additional) {
> Seems like the first two checks can be removed; they are duplicated in rese
The checks are necessary here because 'additional' is transformed before being passed in.  I've changed this to not call into reserveExact.


Line 116:     if (additional < 0) throw new IllegalArgumentException("negative additional");
> Nit: how about Preconditions.checkArgument(additional > 0) instead? Elsewhe
Done


PS1, Line 130: len
> Shouldn't this be 'additional'?
Done


PS1, Line 131:  >
> Shouldn't this be >= ?
Done


Line 177:     if (index >= len) throw new IndexOutOfBoundsException();
> May want to include a helpful message in the exception.
Done


PS1, Line 178: data[index] = value;
> Won't this do the bounds check for you?
no, the bounds check is against len, not data.length.


PS1, Line 181:   /**
             :    * Appends the bytes from another byte array to this vec.
             :    * @param values the values to append
             :    * @param offset the offset to append from
             :    * @param len the number of bytes to append
             :    */
> Nit: reword a bit to make it clear that we're appending some subset of the 
Done


Line 194:    * Appends the bytes from another byte array to this vec.
> Nit: likewise, here you should say something like "Appends all the bytes...
Done


PS1, Line 216:     if (index >= len) throw new IndexOutOfBoundsException();
             :     return data[index];
> Nit: add a helpful message, or let Java do the bounds check for you.
Done


PS1, Line 229:    * The vector should not be concurrently modified while the iterator is in use.
> Will a CME be thrown if someone modifies during iteration?
Done


Line 232:   public Iterator iterator() {
> How about having ByteVec inherit Iterable<Byte> and then have the nested It
The whole point of this was that it's an iterator of primitive bytes, not Bytes (and asList could be used to get that functionality).  I'm actually going to remove this since it's not used, and I can't see why it would ever be.  It was important in kudu-ts where this class was originally lifted from.


Line 281:   public int hashCode() {
> Maybe use Guava's Objects.hashCode(...) to simplify?
that doesn't handle arrays properly.


PS1, Line 290:     ByteVec clone = new ByteVec(0);
             :     clone.data = Arrays.copyOf(data, data.length);
             :     clone.len = len;
             :     return clone;
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java:

> Copyright
Done


Line 17:   @Test
> Can you reorder the contents of this class so that the "helper" methods pre
Done


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java
File java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java:

Line 47:   private void checkByteVec(List<Byte> vals) {
> How about testing wrap or withCapacity?
Done


PS1, Line 55: assertEquals(vals, vec.asList());
> This is going to rely on the correctness of ByteVec.equals(), but presumabl
No, this is actually calling into the equals of whatever type of list asList returns.


PS1, Line 66: copy.truncate(vals.size() / 2);
> For symmetry with the assertEquals() you just called, perhaps assert that v
Done


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/resources/flags
File java/kudu-client/src/test/resources/flags:

Line 1
> Sneaky sneaky. At least mention it in your commit description?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4299/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

Line 48: import java.util.concurrent.atomic.AtomicBoolean;
Now unused?


-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/3258/

-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4299/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

Line 48: import java.util.concurrent.atomic.AtomicBoolean;
> Now unused?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/3204/

-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 1:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

Line 311:    * Not for public consumption: use either predicates or primary key bounds instead.
If Impala is no longer using these two, should we just remove them?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

Line 142:   /** The partition pruner */
Not a useful comment, just drop it.


PS1, Line 260:     if (!table.getPartitionSchema().isSimpleRangePartitioning() &&
             :         (startPrimaryKey.length != 0 ||
             :          endPrimaryKey.length != 0) &&
             :         PARTITION_PRUNE_WARN.getAndSet(false)) {
             :       LOG.warn("Starting full table scan. " +
             :                "In the future this scan may be automatically optimized with partition pruning.");
Is this still relevant?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java:

Line 348
This is still a parameter though.


Line 31: import java.io.ByteArrayOutputStream;
Not used anymore?


Line 45:   private final ByteVec buf = ByteVec.create();
Not really seeing the point of this class member given that the first thing every non-static method does is clear() it.

If the goal is to avoid allocation new ByteVecs with each call, maybe make it possible for those methods to pass in their own ByteVec?


Line 63:   public static int getHashBucket(PartialRow row, HashBucketSchema hashSchema) {
Javadoc?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

Line 328:           List <LocatedTablet> newTablets = table.getTabletsLocations(
Nit: List<LocatedTablet>


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

PS1, Line 292: { break; }
Nit: why the braces if the break is on the same line as the condition?


PS1, Line 360: Step 1:
In this function there's only Step 1.


Line 398:   private static byte[] pushPredicatesIntoUpperBoundRangeKey(Schema schema,
I wonder if these push methods would be clearer if combined with a bool (or enum) to control whether they're upper or lower bounds. There's a lot of commonality.

Might even be interesting to handle both in a loop of size 2 (and return a pair); then you could reuse the product of idsToIndexes. If you go down that route, I'd look into combining with pushPredicatesIntoHashBucket() too, that way there's one logical method that returns everything that's been pushed.


Line 433:         throw new IllegalArgumentException("unexpected predicate type can not be pushed into key");
Nit: can you add the predicate as you did in pushPredicatesIntoLowerBoundRangeKey()?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
File java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java:

PS1, Line 60:   /**
            :    * Creates a new vector.
            :    * @return the new vector.
            :    */
            :   public static ByteVec create() {
            :     return new ByteVec(DEFAULT_CAPACITY);
            :   }
            : 
            :   /**
            :    * Creates a new vector with the specified capacity.
            :    * @param capacity the initial capacity of the vector
            :    * @return a new vector with the specified capacity
            :    */
            :   public static ByteVec withCapacity(int capacity) {
            :     return new ByteVec(capacity);
            :   }
            : 
            :   /**
            :    * Wrap an existing array with a vector.
            :    * The array should not be modified after this call.
            :    * @param data the initial data for the vector
            :    * @return a vector wrapping the data
            :    */
            :   public static ByteVec wrap(byte[] data) {
            :     return new ByteVec(data);
            :   }
You find these more useful than exposing the constructors directly? You could add a no-arg variant that uses DEFAULT_CAPACITY.


Line 115:   public void reserve(int additional) {
Seems like the first two checks can be removed; they are duplicated in reserveExact() anyway.


Line 116:     if (additional < 0) throw new IllegalArgumentException("negative additional");
Nit: how about Preconditions.checkArgument(additional > 0) instead? Elsewhere too, use Preconditions; it's a little more succinct and clear.


PS1, Line 130: len
Shouldn't this be 'additional'?


PS1, Line 131:  >
Shouldn't this be >= ?


Line 177:     if (index >= len) throw new IndexOutOfBoundsException();
May want to include a helpful message in the exception.


PS1, Line 178: data[index] = value;
Won't this do the bounds check for you?


PS1, Line 181:   /**
             :    * Appends the bytes from another byte array to this vec.
             :    * @param values the values to append
             :    * @param offset the offset to append from
             :    * @param len the number of bytes to append
             :    */
Nit: reword a bit to make it clear that we're appending some subset of the bytes from 'values', and that 'offset'/'len' are the base and bounds of that subset. Right now it's not clear that 'offset' refers to 'values' (and not to the offset in this.data).


Line 194:    * Appends the bytes from another byte array to this vec.
Nit: likewise, here you should say something like "Appends all the bytes..."


PS1, Line 216:     if (index >= len) throw new IndexOutOfBoundsException();
             :     return data[index];
Nit: add a helpful message, or let Java do the bounds check for you.


PS1, Line 229:    * The vector should not be concurrently modified while the iterator is in use.
Will a CME be thrown if someone modifies during iteration?


Line 232:   public Iterator iterator() {
How about having ByteVec inherit Iterable<Byte> and then have the nested Iterator class implement Iterator<Byte>? Then you could use ByteVec in enhanced for-loops.


Line 281:   public int hashCode() {
Maybe use Guava's Objects.hashCode(...) to simplify?


PS1, Line 290:     ByteVec clone = new ByteVec(0);
             :     clone.data = Arrays.copyOf(data, data.length);
             :     clone.len = len;
             :     return clone;
How about:

  ByteVec clone = ByteVec.withCapacity(data.length);
  clone.append(this);
  return clone;


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java:

Copyright


Line 17:   @Test
Can you reorder the contents of this class so that the "helper" methods precede the tests?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java
File java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java:

Line 47:   private void checkByteVec(List<Byte> vals) {
How about testing wrap or withCapacity?


PS1, Line 55: assertEquals(vals, vec.asList());
This is going to rely on the correctness of ByteVec.equals(), but presumably that may be buggy, so you may want a variant that checks byte by byte too. Of course, that could be buggy as well, so you can't win everywhere.


PS1, Line 66: copy.truncate(vals.size() / 2);
For symmetry with the assertEquals() you just called, perhaps assert that vals is NOT equals to copy.asList()?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/resources/flags
File java/kudu-client/src/test/resources/flags:

Line 1
Sneaky sneaky. At least mention it in your commit description?


-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1065: [java client] Flexible Partition Pruning

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

Line 311:    * Not for public consumption: use either predicates or primary key bounds instead.
> given it's package-private, do you need this comment?
Done


Line 325:    * Not for public consumption: use either predicates or primary key bounds instead.
> same
Done


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java:

Line 197:     if (len > 1) {
> is this 'if' really necessary? seems like if len <= 1, the len - 2 would be
Done


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

PS3, Line 171:       // The range component is constrained.
             :       constrainedIndex = hashBuckets.size();
> not 100% following here, can you explain further?
Done


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
File java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java:

Line 116:   public void reserve(int additional) {
> I find it surprising that 'reserve' here is reserving _additional_ bytes, v
Done


http://gerrit.cloudera.org:8080/#/c/4299/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java:

PS3, Line 97:     for (Partition partition : partitions) {
            :       if (!pruner.shouldPrune(partition)) scannedPartitions++;
            :     }
> do we expect loops like this in real life? this seems o(n^2). Should we off
This is true, but currently shouldPrune is only called in these tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes