You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2016/08/09 18:03:23 UTC

[jira] [Assigned] (KUDU-1533) Spark Kudu Rdd/Dataframe upsert

     [ https://issues.apache.org/jira/browse/KUDU-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Berkeley reassigned KUDU-1533:
-----------------------------------

    Assignee: Will Berkeley

> Spark Kudu Rdd/Dataframe upsert 
> --------------------------------
>
>                 Key: KUDU-1533
>                 URL: https://issues.apache.org/jira/browse/KUDU-1533
>             Project: Kudu
>          Issue Type: Bug
>         Environment: Spark
>            Reporter: Qutiba
>            Assignee: Will Berkeley
>
> Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
> You mention in the documentation under "Kudu integration with Spark":
> some possible operations to perform:
> ***********************************************
> // then we can insert data into the kudu table
> df.write.options(Map("kudu.master" \->  "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("append").kudu
> // to update existing data change the mode to 'overwrite'
> df.write.options(Map("kudu.master" \-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("overwrite").kudu
> ****************************************************************
> But there is no possibility to perform:
> kuduDataFrame.write.options(Map("kudu.master" \-> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu
> ***************************************************************
> the current solution which is quit slow is:
> Call DataFrame.foreachpartition
> - open the table
> - create session
>     --For each row in this partition 
>           --- create upsert operation
>           --- get row from the operation
>           --- add all fields and values to this row
>           --- perform this operation
> ----------------------------------
> this solution is quit slow! so adding upsert mode to Dataframe writing function for Kudu tables could be better than open sessions and create operations as the previous solution.
> kuduDataFrame.write.options(Map("kudu.master" \-> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)