You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Fengling Wang (Code Review)" <ge...@cloudera.org> on 2018/03/28 00:00:21 UTC

[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

Fengling Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9834


Change subject: KUDU-2303: Add ignoreNull option to upsertRows
......................................................................

KUDU-2303: Add ignoreNull option to upsertRows

This patch adds the ignoreNull option to upsertRows so that users can upsert only non-Null
columns and leave the rest of columns unchanged.

This feature is useful when users use Spark streaming to process JSON and upsert to Kudu,
where missing column values from JSON are set to NULL, resulting in some existing row
values being upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduContextTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
4 files changed, 44 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 1
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@78
PS8, Line 78:  || Try(parameters(OPERATION) == "insert-ignore").getOrElse(false)
> Done. Could you tell me why writing like this?
The number of combinator/fallback method calls is confusing, and using Try without the getOrElse calls keeps it to a minimum.

I think you missed the fix to the first clause to make it consistent.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 9
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 18:09:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@125
PS6, Line 125:       case "upsert" => Upsert
> This is part of our public API, so it can't be removed.  I think what you w
In the previous patch, will commented that maybe it's ok to remove InsertIgnore in operationType.scala cuz it's private. By removing that, i also had to remove this line cuz InsertIgnore didn't exist anymore. 

Could you tell me how this line affects our public API? Thanks.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 7
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Apr 2018 19:46:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@118
PS3, Line 118:       case "insert" => Insert
This is a backwards incompatible change.  Would it be possible to retain this functionality?


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@a250
PS3, Line 250: 
Likewise, removing this will break applications.  Is it possible to keep the API but transition it to use the proper write options?


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@311
PS3, Line 311:                                  KuduWriteOptions = new KuduWriteOptions()): RowErrorsAndOverflowStatus = {
I think you can skip the default value here, since it's a private method (so the API can change), and it should always be passed in.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@21
PS3, Line 21:   private var _ignoreDuplicateRowErrors: Boolean = false
This class is a very different style from the rest of the scala code we have.  I think something along the lines of the following would be more idiomatic:

    class KuduWriteOptions(
        val ignoreDuplicateRowErrors: Boolean = false,
        val ignoreNull: Boolean = false,
    )



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 3
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 02 Apr 2018 22:27:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
......................................................................


Patch Set 2:

(1 comment)

> Patch Set 2:
> 
> (5 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43
PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType {
> If we keep adding options to modify the behavior of these operations we're 
On second thought it may be cleaner to just introduce a WriteOptions class and allow extensibility via adding fields.  This could be passed to the insert/upsert/etc calls in KuduContext.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 14:45:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#4).

Change subject: [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
5 files changed, 121 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 4
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@78
PS8, Line 78: ERATION) == "insert-ignore").getOrElse(false)
> The number of combinator/fallback method calls is confusing, and using Try 
Done. Thanks.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 10
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 18:23:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows
......................................................................


Patch Set 3:

(9 comments)

It looks pretty good on the KuduContextSide but I think we should also support the ignoreNull options on the DefaultSource side, and for that I think we will need to support new parameters. See createRelation in DefaultSource.scala, and also DefaultSourceTest, since the way DefaultSource works is kinda magic.

http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7
PS3, Line 7: KUDU-2371
nit: Could you also add a [spark] tag at the beginning of the subject line?


http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7
PS3, Line 7: WriteOptions
nit: KuduWriteOptions.


http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@10
PS3, Line 10: writes to the Kudu table
It's unclear what this applies to. Could you mention this is referring to writing with Spark? See also my comment about tagging the subject line with [spark].


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@310
PS3, Line 310: writeOptions:
             :                                  KuduWriteOptions = new KuduWriteOptions()):
We don't have a Scala style guide, perse, but I think we should keep the full parameter (name + type) on the same line if we can. Howabout an indentation strategy like

  private def foo(writeOptions: KuduWriteOptions = new KuduWriteOptions)
    : RowErrorsAndOverflowStatus = {

where the : <return type> is on a separate line and indented 2 spaces from the method name. See https://github.com/databricks/scala-style-guide#spacing-and-indentation.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@328
PS3, Line 328: Can't set non-nullable column to null
You mean "Can't set primary key column to null". The message should also include the name of the primary key column, e.g. "Can't set primary key column `key` to null".

Depending on the write op and options a non-nullable column might be set to null and then the null ignored, so it's specifically the key column situation that we're calling out.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@20
PS3, Line 20: case
This isn't a case class. A case class is meant as a immutable sum type (with pattern matching); the KuduWriteOptions() class is designed to be mutated to set options.

See https://docs.scala-lang.org/tour/case-classes.html if you are curious about case classes.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@27
PS3, Line 27: def ignoreDuplicateRowErrors_
Looks like you may have been inspired by https://www.dustinmartin.net/getters-and-setters-in-scala/? So that means the name of the setter method is actually ignoreDuplicateRowErrors_=, and the "assignment"

  wo.ignoreDuplicateRowErrors = true

is actually a method call

  wo.ignoreDuplicateRowErrors_=(true)

Looking around more, it appears this is a standard recommendation of the Scala people? It seems too clever by half for me and I'd prefer

  w.setIgnoreDuplicateRows(true)

but if others are ok with it then I'd be ok with it.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@a32
PS3, Line 32: 
            : 
            : 
            : 
I think we need to leave InsertIgnore in for API compatibility.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@174
PS3, Line 174: kuduWriteOptions.ignoreDuplicateRowErrors = true
             :     kuduContext.insertRows(updateDF, tableName, kuduWriteOptions)
We need to leave InsertIgnore in, but it would be good to do this test both with InsertIgnore and with KuduWriteOptions. We can mark InsertIgnore as deprecated in favor of KuduWriteOptions.ignoreDuplicateRowErrors.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 3
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 02 Apr 2018 22:36:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7
PS2, Line 7: ignoreNull
Curious to hear others' opinions, but to me a name like 'treatNullAsUnset' or 'nullIsUnset' is more clear than 'ignoreNull'. Thoughts?



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 16:56:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 10
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 19:55:55 +0000
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#9).

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

KuduWriteOptions is also supported in DefaultSource APIs. Clients can
set up write options in the parameters from SparkSQL.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
6 files changed, 237 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/9
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 9
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

KuduWriteOptions is also supported in DefaultSource APIs. Clients can
set up write options in the parameters from SparkSQL.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Reviewed-on: http://gerrit.cloudera.org:8080/9834
Reviewed-by: Dan Burkert <da...@apache.org>
Tested-by: Kudu Jenkins
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
6 files changed, 237 insertions(+), 26 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 11
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@125
PS6, Line 125:       case "upsert" => Upsert
This is part of our public API, so it can't be removed.  I think what you want to do is translate "insert-ignore" into an Insert operation, with the ignore option enabled.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 7
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Apr 2018 19:31:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@125
PS6, Line 125:       case "upsert" => Upsert
> In the previous patch, will commented that maybe it's ok to remove InsertIg
We allow applications to set the 'kudu.operation'='insert-ignore' option.  If we stopped allowing that, it would break existing applications when they upgrade to the new version. It's fine to remove the InsertIgnore type, since it's private, but you'll need to still figure out a way to provide this option.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 7
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Apr 2018 19:48:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@78
PS8, Line 78:  || (parameters.getOrElse(OPERATION, "upsert") == "insert-ignore");
> Hi dan, i'm thinking of working around with the "insert-ignore" by doing th
Strategy looks good, but I think it will be a easier to follow the logic as:

    Try(parameters(FAULT_TOLERANT_SCANNER).toBoolean).getOrElse(false) ||
      Try(parameters(OPERATION) == "insert-ignore").getOrElse(false)



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 8
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 13 Apr 2018 17:48:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 6:

Nice!  I like where this has ended up, it will be much easier to add new options in the future.


-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 6
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 10 Apr 2018 22:02:19 +0000
Gerrit-HasComments: No

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 7:

(8 comments)

Thanks everyone!

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@250
PS6, Line 250: preferred:
> nit: s/more suggested/preferred
Done


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@253
PS6, Line 253:     * }}}
> Here and below, this can be done more concisely, and I think more clearly, 
Done


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@260
PS6, Line 260:                        tableName: String): Unit = {
> Add a message like @Deprecated("Use KuduContext.insert(.., new KuduWriteOpt
Done


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350
PS6, Line 350:             if (!writeOptions.ignoreNull) operation.getRow.setNull(kuduIdx)
> nit: this line is too long.
Done


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350
PS6, Line 350:             if (!writeOptions.ignoreNull) operation.getRow.setNull(kuduIdx)
> interpolated strings (s"") don't need to be concatenated, use:
Done


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@23
PS6, Line 23:   * KuduWriteOptions holds configuration of writes to Kudu tables.
> nit: Can you add a comment about what kind of options it is holding and wha
Done


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@23
PS6, Line 23:   * KuduWriteOptions holds configuration of writes to Kudu tables.
> Yeah, I think a quick description on the fields themselves would be good.
Done


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@31
PS6, Line 31: Update exten
> I think it's safe to get rid of this now.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 7
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Apr 2018 00:55:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9
PS2, Line 9: upsertRows
Ignoring nulls can apply equally well to any write type, not just upserts.  How do you feel about making this a general option for all writes?


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@11
PS2, Line 11: 
This patch doesn't expose this functionality through DefaultSource.createRelation, which is another important way that Spark can do writes to Kudu.


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@259
PS2, Line 259: value
s/value/values


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@326
PS2, Line 326:             if (!operationType.ignoreNull) operation.getRow.setNull(kuduIdx)
It's probably worth adding a check that the column isn't in the primary key.  Even though it's an error to set a primary key to null (since PK columns are always non-nullable), I think the error message will be better as 'can't set non-nullable column to null' than 'primary key column not set'.


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43
PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType {
If we keep adding options to modify the behavior of these operations we're quickly going to have an explosion of operation types.  Might be time to refactor along these lines:

private[kudu] case class Insert(val ignoreDuplicateRowErrors: Boolean) extends OperationType {
  override def operation(table: KuduTable): Operation = table.newInsert()
}
private[kudu] case object Update extends OperationType {
  override def operation(table: KuduTable): Operation = table.newUpdate()
}
private[kudu] case class Upsert(val ignoreNull: Boolean) extends OperationType {
  override def operation(table: KuduTable): Operation = table.newUpsert()
}
private[kudu] case object Delete extends OperationType {
  override def operation(table: KuduTable): Operation = table.newDelete()
}



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 03:12:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 6: Code-Review+1

(1 comment)

LGTM besides the one small thing...would like Dan or Hao to look too.

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@31
PS6, Line 31: InsertIgnore
I think it's safe to get rid of this now.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 6
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 09 Apr 2018 08:10:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 4:

(4 comments)

Looking pretty good on the KuduContext side. I think we want support for this in the DefaultSource as well, though.

http://gerrit.cloudera.org:8080/#/c/9834/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/4//COMMIT_MSG@7
PS4, Line 7: [spark]KUDU-2371
nit: Separate with a space: [spark] KUDU-2371


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@247
PS4, Line 247: ignoring any new
             :     * rows that have a primary key conflict with existing rows.
Please add a second bit to the docs here stating that insertIgnoreRows(df, tableName) is equivalent to

  val wo = new KuduWriteOptions
  KuduWriteOptions.ignoreDuplicateRowErrors = true
  insertRows(data, tableName, wo)


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@253
PS4, Line 253: insertIgnoreRows
I think we also want to mark this as deprecated in favor of using KuduWriteOptions. Dan, do you agree?


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@644
PS4, Line 644: val kuduWriteOptions = new KuduWriteOptions
             :     kuduWriteOptions.ignoreDuplicateRowErrors = true
Isn't this unnecessary?



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 4
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 04 Apr 2018 20:37:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#10).

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

KuduWriteOptions is also supported in DefaultSource APIs. Clients can
set up write options in the parameters from SparkSQL.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
6 files changed, 237 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/10
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 10
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 6: Code-Review+1

(2 comments)

Overall it looks good to me, just some minor nits.

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350
PS6, Line 350:               throw new IllegalArgumentException(s"Can't set primary key column \'" + key_name + "\' to null")
nit: this line is too long.


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@23
PS6, Line 23:   * KuduWriteOptions holds configuration of writes to Kudu tables.
nit: Can you add a comment about what kind of options it is holding and what are the options for?



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 6
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 10 Apr 2018 19:48:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows
......................................................................


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7
PS2, Line 7: WriteOptio
> Curious to hear others' opinions, but to me a name like 'treatNullAsUnset' 
Or maybe 'unsetNulls'?


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7
PS2, Line 7: KUDU-2371
> I think you mean 2250?
Done


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9
PS2, Line 9: 
> nit: Try to wrap commit message lines at about 80 chars.
Done


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9
PS2, Line 9: s to allow
> Ignoring nulls can apply equally well to any write type, not just upserts. 
i think it's good to also apply to updates but it doesn't make sense to me for inserts / deletes?


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@10
PS2, Line 10: s
> nit:  Add "the".
Done


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@11
PS2, Line 11: more fields in the future. The instance of this class is passed to
> This patch doesn't expose this functionality through DefaultSource.createRe
I see.


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@12
PS2, Line 12: func
> nit: "For example, this feature is useful..."
Done


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@13
PS2, Line 13: 
> nit: s/where/because
Done


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@259
PS2, Line 259: 
> s/value/values
Done


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@326
PS2, Line 326:           if (row.isNullAt(sparkIdx)) {
> It's probably worth adding a check that the column isn't in the primary key
Added. Not sure if it's the most effective way to check?


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43
PS2, Line 43: 
> If we keep adding options to modify the behavior of these operations we're 
Done


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43
PS2, Line 43: 
> On second thought it may be cleaner to just introduce a WriteOptions class 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 3
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sun, 01 Apr 2018 17:26:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 5:

As Will suggested, I am going to also support defaultSource with writeOptions by passing in some more params in the params map.


-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 5
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 19:50:21 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9
PS2, Line 9: upsert only non-Null
nit: Try to wrap commit message lines at about 80 chars.


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@10
PS2, Line 10:  
nit:  Add "the".


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@12
PS2, Line 12: This
nit: "For example, this feature is useful..."


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@13
PS2, Line 13: where
nit: s/where/because



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 08:14:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9834/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/4//COMMIT_MSG@7
PS4, Line 7: [spark] KUDU-237
> nit: Separate with a space: [spark] KUDU-2371
Done


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@247
PS4, Line 247: ignoring any new
             :     * rows that have a primary key conflict with existing rows.
> Please add a second bit to the docs here stating that insertIgnoreRows(df, 
Done


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@247
PS4, Line 247: ignoring any new
             :     * rows that have a primary key conflict with existing rows.
> don't you mean "wo.ignoreDuplicateRowErrors = true" above?
Done


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@253
PS4, Line 253: wo.ignoreDuplica
> I think we also want to mark this as deprecated in favor of using KuduWrite
Done


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@20
PS4, Line 20: import org.apache.yetus.audience.InterfaceStability
> if this is part of the public API shouldn't it have an interface audience a
Added stability.
I wonder if i need really to add the audience one? I notice other public classes like KuduContext and DefaultSourced only have stability annotation.


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@21
PS4, Line 21: 
> if public, probably should have javadoc (or scaladoc? whatever the equivale
Done


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@644
PS4, Line 644: val kuduWriteOptions = new KuduWriteOptions
             :     kuduWriteOptions.ignoreDuplicateRowErrors = true
> Isn't this unnecessary?
test failed if removed. seems necessary.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 5
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 19:32:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@78
PS8, Line 78:  || (parameters.getOrElse(OPERATION, "upsert") == "insert-ignore");
Hi dan, i'm thinking of working around with the "insert-ignore" by doing this.


http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@125
PS8, Line 125: Insert
Treat it as Insert here.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 8
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Apr 2018 22:29:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7
PS2, Line 7: KUDU-2303
I think you mean 2250?



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 16:48:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#6).

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

KuduWriteOptions is also supported in DefaultSource APIs. Clients can
set up write options in the parameters from SparkSQL.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
6 files changed, 208 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/6
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 6
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@78
PS8, Line 78:  || Try(parameters(OPERATION) == "insert-ignore").getOrElse(false)
> Strategy looks good, but I think it will be a easier to follow the logic as
Done. Could you tell me why writing like this?



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 9
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 13 Apr 2018 22:31:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@250
PS6, Line 250: more suggested
nit: s/more suggested/preferred


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@253
PS6, Line 253:     * wo.ignoreDuplicateRowErrors = true
Here and below, this can be done more concisely, and I think more clearly, as:

    insertRows(data, tableName, new KuduWriteOptions(ignoreDuplicateRowErrors = true))


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@260
PS6, Line 260:   @Deprecated()
Add a message like @Deprecated("Use KuduContext.insert(.., new KuduWriteOptions(ignoreDuplicateRowErrors = true))")


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350
PS6, Line 350:               throw new IllegalArgumentException(s"Can't set primary key column \'" + key_name + "\' to null")
> nit: this line is too long.
interpolated strings (s"") don't need to be concatenated, use:

    s"Can't set primary key column '$key_name' to null"


http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@23
PS6, Line 23:   * KuduWriteOptions holds configuration of writes to Kudu tables.
> nit: Can you add a comment about what kind of options it is holding and wha
Yeah, I think a quick description on the fields themselves would be good.



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 6
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 10 Apr 2018 22:01:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#7).

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

KuduWriteOptions is also supported in DefaultSource APIs. Clients can
set up write options in the parameters from SparkSQL.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
6 files changed, 209 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/7
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 7
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#3).

Change subject: KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows
......................................................................

KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table. This allows extensibility via adding
more fields in the future. The instance of this class is passed to
functions (insert/delete/upsert/update) in KuduContext.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
6 files changed, 105 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 3
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#8).

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

KuduWriteOptions is also supported in DefaultSource APIs. Clients can
set up write options in the parameters from SparkSQL.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
6 files changed, 237 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/8
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 8
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@247
PS4, Line 247: ignoring any new
             :     * rows that have a primary key conflict with existing rows.
> Please add a second bit to the docs here stating that insertIgnoreRows(df, 
don't you mean "wo.ignoreDuplicateRowErrors = true" above?


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@20
PS4, Line 20: class KuduWriteOptions(
if this is part of the public API shouldn't it have an interface audience and stability annotation?


http://gerrit.cloudera.org:8080/#/c/9834/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@21
PS4, Line 21:   var ignoreDuplicateRowErrors: Boolean = false,
if public, probably should have javadoc (or scaladoc? whatever the equivalent is in Scala)



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 4
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 05:33:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Dan Burkert, Kudu Jenkins, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9834

to look at the new patch set (#5).

Change subject: [spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................

[spark] KUDU-2371: Add KuduWriteOptions class and ignoreNull option

This patch adds the KuduWriteOptions class to allow configuration of
writes to the Kudu table when writing with Spark. This allows
extensibility via adding more fields in the future. The instance of
this class is passed to functions (insert/delete/upsert/update) in
KuduContext.

This patch also adds the ignoreNull write option so that users can
upsert/update only non-Null columns and leave the rest of the columns
unchanged.

For example, this feature is useful when users use Spark streaming to
process JSON and upsert to Kudu, because missing column values from
JSON are set to NULL, resulting in some existing row values being
upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
A java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
5 files changed, 139 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/5
-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 5
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option

Posted by "Fengling Wang (Code Review)" <ge...@cloudera.org>.
Fengling Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )

Change subject: [spark]KUDU-2371: Add KuduWriteOptions class and ignoreNull option
......................................................................


Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7
PS3, Line 7: 1: Add KuduW
> nit: KuduWriteOptions.
Done


http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7
PS3, Line 7: [spark]KU
> nit: Could you also add a [spark] tag at the beginning of the subject line?
Done


http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@10
PS3, Line 10: writes to the Kudu table
> It's unclear what this applies to. Could you mention this is referring to w
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@118
PS3, Line 118:       case "insert" => Insert
> This is a backwards incompatible change.  Would it be possible to retain th
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@a250
PS3, Line 250: 
> Likewise, removing this will break applications.  Is it possible to keep th
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@311
PS3, Line 311:       if (errorCount > 0) {
> I think you can skip the default value here, since it's a private method (s
I can skip the default value in writePartitionRows but not writeRows since writeRows is called by 
insert(data: DataFrame, overwrite: Boolean) 
in DefaultSource.scala. 
Would you recommend to skip or not skip for writeRows?


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@310
PS3, Line 310: ors.getRowErrors.length
             :       if (errorCount > 0) {
> We don't have a Scala style guide, perse, but I think we should keep the fu
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@328
PS3, Line 328: leName)
> You mean "Can't set primary key column to null". The message should also in
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@20
PS3, Line 20: clas
> This isn't a case class. A case class is meant as a immutable sum type (wit
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@21
PS3, Line 21:   var ignoreDuplicateRowErrors: Boolean = false,
> This class is a very different style from the rest of the scala code we hav
Done. Why using val here by the way?


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@27
PS3, Line 27: 
> Looks like you may have been inspired by https://www.dustinmartin.net/gette
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@a32
PS3, Line 32: 
            : 
            : 
            : 
> I think we need to leave InsertIgnore in for API compatibility.
Done


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@174
PS3, Line 174: kuduWriteOptions.ignoreDuplicateRowErrors = true
             :     kuduContext.insertRows(updateDF, tableName, kuduWriteOptions)
> We need to leave InsertIgnore in, but it would be good to do this test both
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 4
Gerrit-Owner: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Apr 2018 18:22:07 +0000
Gerrit-HasComments: Yes