You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Liang-Chi Hsieh (JIRA)" <ji...@apache.org> on 2019/05/20 04:38:00 UTC

[jira] [Commented] (SPARK-27716) Complete the transactions support for part of jdbc datasource operations.

    [ https://issues.apache.org/jira/browse/SPARK-27716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843649#comment-16843649 ] 

Liang-Chi Hsieh commented on SPARK-27716:
-----------------------------------------

If the added support doesn't cover all cases, doesn't it make users more confused?

> Complete the transactions support for part of jdbc datasource operations.
> -------------------------------------------------------------------------
>
>                 Key: SPARK-27716
>                 URL: https://issues.apache.org/jira/browse/SPARK-27716
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: feiwang
>            Priority: Major
>              Labels: pull-request-available
>
> With the jdbc datasource, we can save a rdd to the database.
> The comments for the function saveTable is that.
> {code:java}
>   /**
>    * Saves the RDD to the database in a single transaction.
>    */
>   def saveTable(
>       df: DataFrame,
>       tableSchema: Option[StructType],
>       isCaseSensitive: Boolean,
>       options: JdbcOptionsInWrite)
> {code}
> In fact, it is not true.
> The savePartition operation is in a single transaction but the saveTable operation is not in a single transaction.
> There are several cases of data transmission:
> case1: Append data to origin existed gptable.
> case2: Overwrite origin gptable, but the table is a cascadingTruncateTable, so we can not drop the gptable, we have to truncate it and append data.
> case3: Overwrite origin existed table and the table is not a cascadingTruncateTable, so we can drop it first.
> case4: For an unexisted table, create and transmit data.
> In this PR, I add a transactions support for case3 and case4.
> For case3 and case4, we can transmit the rdd to a temp table at first.
> We use an accumulator to record the suceessful savePartition operations.
> At last, we compare the value of accumulator with dataFrame's partitionNum.
> If all the savePartition operations are successful, we drop the origin table if it exists, then we alter the temp table rename to origin table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org