You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by da...@apache.org on 2016/04/20 02:37:30 UTC

incubator-kudu git commit: kudu 0.8 predicate improvements blog post

Repository: incubator-kudu
Updated Branches:
  refs/heads/gh-pages 65ff8b29a -> 45544f55b


kudu 0.8 predicate improvements blog post

Change-Id: I0e413adbd04ad5f8a181645a7b078f4ea5b5a522
Reviewed-on: http://gerrit.cloudera.org:8080/2809
Reviewed-by: Jean-Daniel Cryans
Tested-by: Dan Burkert <da...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/45544f55
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/45544f55
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/45544f55

Branch: refs/heads/gh-pages
Commit: 45544f55b6d0ee99d79c4ba274dea462911e6a9c
Parents: 65ff8b2
Author: Dan Burkert <da...@cloudera.com>
Authored: Mon Apr 18 12:11:53 2016 -0700
Committer: Dan Burkert <da...@cloudera.com>
Committed: Wed Apr 20 00:30:35 2016 +0000

----------------------------------------------------------------------
 ...6-04-19-kudu-0-8-0-predicate-improvements.md | 78 ++++++++++++++++++++
 1 file changed, 78 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/45544f55/_posts/2016-04-19-kudu-0-8-0-predicate-improvements.md
----------------------------------------------------------------------
diff --git a/_posts/2016-04-19-kudu-0-8-0-predicate-improvements.md b/_posts/2016-04-19-kudu-0-8-0-predicate-improvements.md
new file mode 100644
index 0000000..2d86467
--- /dev/null
+++ b/_posts/2016-04-19-kudu-0-8-0-predicate-improvements.md
@@ -0,0 +1,78 @@
+---
+layout: post
+title: Predicate Improvements in Kudu 0.8
+author: Dan Burkert
+---
+
+The recently released Kudu version 0.8 ships with a host of new improvements to
+scan predicates. Performance and usability have been improved, especially for
+tables taking advantage of [advanced partitioning
+options](http://getkudu.io/docs/schema_design.html#data-distribution).
+
+<!--more-->
+
+## Scan Optimizations in the Server and C++ Client
+
+The server and C++ client have gotten more sophisticated in how they handle and
+optimize scan constraints. Constraints specified in the predicates and lower
+and upper bound primary keys are better unified, resulting in more predicates
+being pushed into primary key bounds, which can turn full table scans with
+predicates into much more efficient bounded scans.
+
+Additionally, the server and C++ client now recognize more opportunities to
+prune entire tablets during scans. For example, for the following schema and
+query Kudu will now be able to skip scanning 15 out of the 16 tablets in the
+table:
+
+```SQL
+-- create a table with 16 tablets
+CREATE TABLE users (id INT64, name STRING, address STRING)
+DISTRIBUTE BY HASH (id) INTO 16 BUCKETS;
+
+-- scan over a single tablet
+SELECT id, name, address FROM users
+WHERE id = 876932;
+```
+
+For a deeper look at the newly implemented scan and partition pruning
+optimizations, see the associated [design
+document](https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/scan-optimization-partition-pruning.md).
+These optimizations will eventually be incorporated into the Java client as
+well, but until that time they are still used on the server side for scans
+initiated by Java clients. If you would like to help with this effort, let us
+know on the [JIRA issue](https://issues.apache.org/jira/browse/KUDU-1065).
+
+## Redesigned Predicate API in the Java Client
+
+The Java client has a new way to express scan predicates: the
+[`KuduPredicate`](http://getkudu.io/apidocs/org/kududb/client/KuduPredicate.html).
+The API matches the corresponding C++ API more closely, and adds support for
+specifying exclusive, as well as inclusive, range predicates. The existing
+[`ColumnRangePredicate`](http://getkudu.io/apidocs/org/kududb/client/ColumnRangePredicate.html)
+API has been deprecated, and will be removed soon. Example of transitioning from
+the old to new API:
+
+```java
+ColumnSchema myIntColumnSchema = ...;
+KuduScanner.KuduScannerBuilder scannerBuilder = ...;
+
+// Old predicate API
+ColumnRangePredicate predicate = new ColumnRangePredicate(myIntColumnSchema);
+predicate.setLowerBound(20);
+scannerBuilder.addColumnRangePredicate(predicate);
+
+// New predicate API
+scannerBuilder.newPredicate(
+    KuduPredicate.newComparisonPredicate(myIntColumnSchema, ComparisonOp.GREATER_EQUAL, 20));
+```
+
+## Under the Covers Changes
+
+The scan optimizations in the server and C++ client, and the new `KuduPredicate`
+API in the Java client are made possible by an overhaul of how predicates are
+handled internally. A new protobuf message type,
+[`ColumnPredicatePB`](https://github.com/apache/incubator-kudu/blob/master/src/kudu/common/common.proto#L273)
+has been introduced, and will allow more column predicate types to be introduced
+in the future. If you are interested in contributing to Kudu but don't know
+where to start, consider adding a new predicate type; for example the `IS NULL`,
+`IS NOT NULL`, `IN`, and `LIKE` predicates types are currently not implemented.