You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Dan Burkert (JIRA)" <ji...@apache.org> on 2018/03/23 15:21:00 UTC
[jira] [Commented] (KUDU-2250) Document odd interaction between
upserts and Spark Datasets
[ https://issues.apache.org/jira/browse/KUDU-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411550#comment-16411550 ]
Dan Burkert commented on KUDU-2250:
-----------------------------------
This issue comes up quite a bit. We could make this easier by adding a flag to the Kudu/Spark integration which will cause the UPSERT/UPDATE operations to skip null values.
> Document odd interaction between upserts and Spark Datasets
> -----------------------------------------------------------
>
> Key: KUDU-2250
> URL: https://issues.apache.org/jira/browse/KUDU-2250
> Project: Kudu
> Issue Type: Task
> Components: spark
> Affects Versions: 1.6.0
> Reporter: Jean-Daniel Cryans
> Assignee: Fengling Wang
> Priority: Major
> Labels: newbie
>
> We need to document a specific behavior of Spark Datasets that runs contrary to how Kudu works.
> Say you have 3 columns "k, x, y" where k is the primary key.
> You run a first insert on a row "k=1, x=2, y=3".
> Now you upsert "k=1, y=4".
> Using any Kudu API, the full row would now be "k=1, x=2, y=4" but with Datasets you have "k=1, x=*NULL*, y=4". This means that Datasets put a null value when some columns aren't specified.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)