You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Qutiba (JIRA)" <ji...@apache.org> on 2016/07/15 11:00:26 UTC

[jira] [Created] (KUDU-1533) Spark Kudu Rdd/Dataframe upsert

Qutiba created KUDU-1533:
----------------------------

             Summary: Spark Kudu Rdd/Dataframe upsert 
                 Key: KUDU-1533
                 URL: https://issues.apache.org/jira/browse/KUDU-1533
             Project: Kudu
          Issue Type: Bug
         Environment: Spark
            Reporter: Qutiba


Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
You mention in the documentation under "Kudu integration with Spark":
some possible operations to perform:
***********************************************
// then we can insert data into the kudu table
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("append").kudu

// to update existing data change the mode to 'overwrite'
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("overwrite").kudu
****************************************************************
But there is no possibility to perform:
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu
***************************************************************
the current solution which is quit slow is:
Call DataFrame.foreachpartition
- open the table
- create session
    --For each row in this partition 
          --- create upsert operation
          --- get row from the operation
          --- add all fields and values to this row
          --- perform this operation
----------------------------------
this solution is quit slow! so adding upsert mode to Dataframe for Kudu tables could be better than open sessions and create operations as the previous solution.
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)