You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2016/08/09 18:02:59 UTC

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Will Berkeley has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3871

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................

KUDU-1533 Spark Kudu Rdd/Dataframe upsert

This patch adds support for upsert in Kudu dataframes. The SaveMode
parameter to createRelation does not map to Kudu very well: Append
is fine, but Overwrite is supposed to mean truncate and then insert.
Kudu does not (currently) support truncation of tables, but it does
support updates and upserts, so Overwrite is taken to mean "upsert
new rows".

Circumventing the limitations of the datasource API, users can still
restore the old behavior of mode Overwrite (update, no insert) by
setting kudu.upsert = false in the options. I think users
prefer to have upsert semantics by default.

Additionally, Ignore was previously given the same meaning as Append,
contrary to its intended meaning. It's been re-categorized as
unsupported along with ErrorIfExisting.

Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
3 files changed, 62 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/3871/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Ram Mettu (Code Review)" <ge...@cloudera.org>.
Ram Mettu has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 4:

Changes and the documentation looks good.

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 2:

(8 comments)

This is looking great.   Sorry I've been so slow on reviews.  I'd love to get this in for the 0.10 release which is branching on Friday.  I'll try to be much faster on reviews, will you have time to work on it before then?

http://gerrit.cloudera.org:8080/#/c/3871/2//COMMIT_MSG
Commit Message:

Line 13: for Kudu without support for table truncation.
My understanding is that it's more fundamental than that: it requires table truncation *and* table creation if the table does not exist.  Doing the table creation without creation args (replication, partitioning, etc.) is not something I think we can support.


Line 19: The change is backwards-incompatible. Syntax like
Could you add this or something like it to docs/release_notes.adoc?


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

PS2, Line 172: insert
I liked your idea of defaulting to upsert, but allowing insert via a special configuration flag.  Can you bring that back from the previous patch?  I think in the long run most people will want upsert by default.


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

Line 132:     writeRows(data, tableName, table => table.newInsert())
this is a fantastic way to reduce the boilerplate.


PS2, Line 179: writeRowPartition
how about 'writePartitionRows'


Line 180:                 schema: StructType,
indent


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala:

Line 35
I thought this would still work because of the RelationProvider impl?  Or is it tied to CreatableRelationProvider?


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

Line 75:     val tableName = "testcreatetable"
thank you!


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/2831/

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


KUDU-1533 Spark Kudu Rdd/Dataframe upsert

This patch improves the Kudu SparkSQL integration in two ways:

1) Removed support for all SaveMode's except Append for the
creatableRelationProvider trait of DefaultSource. This is an
improvement because the other modes cannot be correctly implemented
for Kudu without support for table truncation and because some modes
require auto-table creation and, in that case, there's no
satisfactory mechanism to specify things like the partition schema.

2) Added {insert, update, upsert, delete}Rows methods to KuduContext.
This is the now preferred way to write to Kudu tables.

Additionally, inserts to Kudu tables from Spark SQL using
DefaultSource are now upserts by default. They can be returned to
being strict inserts with the operation parameter.

These changes may break some existing clients, so they have been
documented in the release notes. Additionally, the enhancements to
the KuduContext API, and its preferred status over using the
DefaultSource to write to Kudu tables, have been documented in the
examples and the release notes.

Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Reviewed-on: http://gerrit.cloudera.org:8080/3871
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@cloudera.com>
---
M docs/developing.adoc
M docs/release_notes.adoc
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 239 insertions(+), 103 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3871/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 43:   val UPSERT = "kudu.upsert"
> Could you change this key to "kudu.operation" and have it take values of ei
Done


Line 83:     val upsert = parameters.getOrElse(UPSERT, "true").toLowerCase match {
> Instead of duplicating the parsing, you can call the other createRelation m
Done


http://gerrit.cloudera.org:8080/#/c/3871/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

Line 190:     session.setIgnoreAllDuplicateRows(true)
> Please remove this setIgnoreAllDuplicateRows call.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/2785/

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala:

Line 35
> So I'm not quite following, it sounds like using this will cause a runtime 
Sorry, bad wording. I meant, while developing, I tried taking out the CreatableRelationProvider impl and leaving the above and it breaks at runtime. I don't want to put anything back in for the above or CreatableRelationProvider for the patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Chris George (Code Review)" <ge...@cloudera.org>.
Chris George has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 3:

Specifically this documentation: http://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Chris George, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3871

to look at the new patch set (#4).

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................

KUDU-1533 Spark Kudu Rdd/Dataframe upsert

This patch improves the Kudu SparkSQL integration in two ways:

1) Removed support for all SaveMode's except Append for the
creatableRelationProvider trait of DefaultSource. This is an
improvement because the other modes cannot be correctly implemented
for Kudu without support for table truncation and because some modes
require auto-table creation and, in that case, there's no
satisfactory mechanism to specify things like the partition schema.

2) Added {insert, update, upsert, delete}Rows methods to KuduContext.
This is the now preferred way to write to Kudu tables.

Additionally, inserts to Kudu tables from Spark SQL using
DefaultSource are now upserts by default. They can be returned to
being strict inserts with the upsert parameter.

These changes may break some existing clients, so they have been
documented in the release notes. Additionally, the enhancements to
the KuduContext API, and its preferred status over using the
DefaultSource to write to Kudu tables, have been documented in the
examples.

Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
---
M docs/developing.adoc
M docs/release_notes.adoc
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 242 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/3871/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2761/

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Chris George (Code Review)" <ge...@cloudera.org>.
Chris George has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 3: Code-Review-1

I would prefer that we not drop CreatableRelationProvider. I'm using cassandra spark connector as a precedent for this. They error out when attempting to append to a non existent table. I think there is a valid case for not allowing overwrite for the reasons stated previously about not having enough information. However it would be possible to add this if we put additional options, but I'm ok with not allowing this support unless we supported backend truncates. I think implementing CreateableRelation allows newer spark users to have less of a learning curve when adopting the new datasource. However I think the documentation should be updated to use the prefered approach. I think append should default to upsert and then we should throw an error back for overwrite. Also a -1 because the documentation needs to be updated with example usages.

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3871

to look at the new patch set (#2).

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................

KUDU-1533 Spark Kudu Rdd/Dataframe upsert

This patch improves the Kudu SparkSQL integration in two ways:

1) Removed trait creatableRelationProvider from DefaultSource. This
is an improvement because this trait cannot be correctly implemented
for Kudu without support for table truncation.

2) Added {insert, update, upsert, delete}Rows methods to KuduContext.
This replaces and expands on the removed functionality of
DefaultSource.

The change is backwards-incompatible. Syntax like

df.write.options(newOptions).mode("append").kudu

no longer works, and now needs to be written like

kuduContext.insertRows(df, tableName)

Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
4 files changed, 100 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/3871/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Chris George, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3871

to look at the new patch set (#5).

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................

KUDU-1533 Spark Kudu Rdd/Dataframe upsert

This patch improves the Kudu SparkSQL integration in two ways:

1) Removed support for all SaveMode's except Append for the
creatableRelationProvider trait of DefaultSource. This is an
improvement because the other modes cannot be correctly implemented
for Kudu without support for table truncation and because some modes
require auto-table creation and, in that case, there's no
satisfactory mechanism to specify things like the partition schema.

2) Added {insert, update, upsert, delete}Rows methods to KuduContext.
This is the now preferred way to write to Kudu tables.

Additionally, inserts to Kudu tables from Spark SQL using
DefaultSource are now upserts by default. They can be returned to
being strict inserts with the operation parameter.

These changes may break some existing clients, so they have been
documented in the release notes. Additionally, the enhancements to
the KuduContext API, and its preferred status over using the
DefaultSource to write to Kudu tables, have been documented in the
examples and the release notes.

Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
---
M docs/developing.adoc
M docs/release_notes.adoc
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 239 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/3871/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/2839/

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 61:     new KuduRelation(tableName, kuduMaster)(sqlContext)
Should the upsert flag get passed here as well?


Line 78:   override def createRelation(sqlContext: SQLContext, mode: SaveMode,
How would you feel about removing the method and not implementing CreatableRelationProvider?  As far as I can tell we are not, and can not, implement it correctly, since the main point of it is to create a new table when it doesn't exist.


Line 94:       case Overwrite => kuduRelation.insert(data, overwrite = true)
Throw Unsupported here as well for the reasons below.


Line 218:     context.writeRows(data, tableName, overwrite, upsert)
Could you actually change this to throw a NotSupported exception when overwrite = true?  I think the Spark semantics are supposed to be that overwrite will truncate the dataset then do inserts, which we can't yet support.  Changing it to throw will allow us to backwards compatibly support it later when we have lightweight table truncation.


http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

Line 132:   def writeRows(data: DataFrame, tableName: String, overwrite: Boolean, upsert: Boolean) {
Instead of writeRows taking two booleans, could you split this into insertRows, updateRows, and upsertRows?  I think that will be a lot more clear.  The boolean was already really confusing.


Line 151:   def writeRows(rows: Iterator[Row],
Same here.  There shouldn't be too much code duplication if you pull out the row building into a helper function.

Should this be private?


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 61:     new KuduRelation(tableName, kuduMaster)(sqlContext)
> Should the upsert flag get passed here as well?
N/A. See below.


Line 78:   override def createRelation(sqlContext: SQLContext, mode: SaveMode,
> How would you feel about removing the method and not implementing Creatable
You're right we shouldn't implement CreatableRelationProvider if we can't do it right. Alternatively, couldn't createRelation take a dataframe, construct a Kudu schema from its schema, and then allow it to be persisted to Kudu? Working this way, we could support everything except Overwrite right now...buuuuut it's going to be really awkward to specify the key and the partition schema. Awkward enough I think we should remove creatableRelationProvider until we can implement it correctly as it is now (based on an existing table).

All the operations will still be available on a dataframe via KuduContext.


Line 94:       case Overwrite => kuduRelation.insert(data, overwrite = true)
> Throw Unsupported here as well for the reasons below.
N/A. See above.


Line 218:     context.writeRows(data, tableName, overwrite, upsert)
> Could you actually change this to throw a NotSupported exception when overw
Done


http://gerrit.cloudera.org:8080/#/c/3871/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

Line 132:   def writeRows(data: DataFrame, tableName: String, overwrite: Boolean, upsert: Boolean) {
> Instead of writeRows taking two booleans, could you split this into insertR
Done


Line 151:   def writeRows(rows: Iterator[Row],
> Same here.  There shouldn't be too much code duplication if you pull out th
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 6:

Thanks, Will!

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/2816/

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3871

to look at the new patch set (#3).

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................

KUDU-1533 Spark Kudu Rdd/Dataframe upsert

This patch improves the Kudu SparkSQL integration in two ways:

1) Removed trait creatableRelationProvider from DefaultSource. This
is an improvement because this trait cannot be correctly implemented
for Kudu without support for table truncation and because some modes
require auto-table creation but there's no satisfactory mechanism to
specify things like the partition schema in that case.

2) Added {insert, update, upsert, delete}Rows methods to KuduContext.
This replaces and expands on the removed functionality of
DefaultSource.

The change is backwards-incompatible. Syntax like

df.write.options(newOptions).mode("append").kudu

no longer works, and now needs to be written like

kuduContext.insertRows(df, tableName)

Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
4 files changed, 160 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/3871/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3871/2//COMMIT_MSG
Commit Message:

Line 19: The change is backwards-incompatible. Syntax like
> Maybe a silly question: should I add it as part of this patch or get it add
That one's been committed, so you won't need to worry about a merge conflict.  You can follow the lead set out by Todd in formatting, though.


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

PS2, Line 172: insert
> I agree people will want upsert by default, but it makes me a little nervou
I think that's what we want (A sparksql INSERT INTO defaulting to a Kudu upsert), but maybe we should bring it up on the mailing list, or at least pull in some of the other known kudu-spark users into this discussion.  I'll ping some people on slack, but if you want to start a discuss thread on the mailing list that would probably be good too.  Wish we had more time before 0.10 :)


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala:

Line 35
> It's tied to CreatableRelationProvider. I crossed my fingers and tried it w
So I'm not quite following, it sounds like using this will cause a runtime exception?  Why add it back in that case?


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

Line 75:     val tableName = "testcreatetable"
> np. Actually, I'd also like to do a follow up patch just to clean up the co
That would be greatly appreciated.


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 4:

(3 comments)

Looking really good now, sorry I sent you down a rabbit hole with the CreatableRelation stuff.

http://gerrit.cloudera.org:8080/#/c/3871/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

Line 43:   val UPSERT = "kudu.upsert"
Could you change this key to "kudu.operation" and have it take values of either "insert" or "upsert"?  I'd like to avoid boolean flags if possible, and it leaves the door open for other operation types in the future.

Don't worry about changing any of the private APIs like KuduRelation.


Line 83:     val upsert = parameters.getOrElse(UPSERT, "true").toLowerCase match {
Instead of duplicating the parsing, you can call the other createRelation method, and then do the inserts/writing to that.


http://gerrit.cloudera.org:8080/#/c/3871/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

Line 190:     session.setIgnoreAllDuplicateRows(true)
Please remove this setIgnoreAllDuplicateRows call.


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Ram Mettu <ra...@rms.com>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1533 Spark Kudu Rdd/Dataframe upsert

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: KUDU-1533 Spark Kudu Rdd/Dataframe upsert
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/3871/2//COMMIT_MSG
Commit Message:

Line 13: for Kudu without support for table truncation.
> My understanding is that it's more fundamental than that: it requires table
Done


Line 19: The change is backwards-incompatible. Syntax like
> Could you add this or something like it to docs/release_notes.adoc?
Maybe a silly question: should I add it as part of this patch or get it added to Todd's release notes patch (https://gerrit.cloudera.org/#/c/3802/)?


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

PS2, Line 172: insert
> I liked your idea of defaulting to upsert, but allowing insert via a specia
I agree people will want upsert by default, but it makes me a little nervous because the method name is insert (from the trait) and Kudu has otherwise been punctilious about the distinction between insert and upsert. This function is used for INSERT INTO, btw, so INSERT INTO will by default mean UPSERT INTO.


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

Line 132:     writeRows(data, tableName, table => table.newInsert())
> this is a fantastic way to reduce the boilerplate.
:D


PS2, Line 179: writeRowPartition
> how about 'writePartitionRows'
Done


Line 180:                 schema: StructType,
> indent
Done


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/package.scala:

Line 35
> I thought this would still work because of the RelationProvider impl?  Or i
It's tied to CreatableRelationProvider. I crossed my fingers and tried it without the impl and got a runtime exception. As I understand it, CreatableRelationProvider lets one do things like CTAS, while InsertableRelation lets one do INSERT INTO. Also, I noticed there's no test for INSERT INTO, so added that.


http://gerrit.cloudera.org:8080/#/c/3871/2/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

Line 75:     val tableName = "testcreatetable"
> thank you!
np. Actually, I'd also like to do a follow up patch just to clean up the code and comments a bit, after this one's done.


-- 
To view, visit http://gerrit.cloudera.org:8080/3871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8e0d50fb74dc2ce5e757e8a56fc1e863f699822
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Chris George <ch...@rms.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes