You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liang-Chi Hsieh (JIRA)" <ji...@apache.org> on 2019/05/20 04:38:00 UTC
[jira] [Commented] (SPARK-27716) Complete the transactions support
for part of jdbc datasource operations.
[ https://issues.apache.org/jira/browse/SPARK-27716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843649#comment-16843649 ]
Liang-Chi Hsieh commented on SPARK-27716:
-----------------------------------------
If the added support doesn't cover all cases, doesn't it make users more confused?
> Complete the transactions support for part of jdbc datasource operations.
> -------------------------------------------------------------------------
>
> Key: SPARK-27716
> URL: https://issues.apache.org/jira/browse/SPARK-27716
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.3
> Reporter: feiwang
> Priority: Major
> Labels: pull-request-available
>
> With the jdbc datasource, we can save a rdd to the database.
> The comments for the function saveTable is that.
> {code:java}
> /**
> * Saves the RDD to the database in a single transaction.
> */
> def saveTable(
> df: DataFrame,
> tableSchema: Option[StructType],
> isCaseSensitive: Boolean,
> options: JdbcOptionsInWrite)
> {code}
> In fact, it is not true.
> The savePartition operation is in a single transaction but the saveTable operation is not in a single transaction.
> There are several cases of data transmission:
> case1: Append data to origin existed gptable.
> case2: Overwrite origin gptable, but the table is a cascadingTruncateTable, so we can not drop the gptable, we have to truncate it and append data.
> case3: Overwrite origin existed table and the table is not a cascadingTruncateTable, so we can drop it first.
> case4: For an unexisted table, create and transmit data.
> In this PR, I add a transactions support for case3 and case4.
> For case3 and case4, we can transmit the rdd to a temp table at first.
> We use an accumulator to record the suceessful savePartition operations.
> At last, we compare the value of accumulator with dataFrame's partitionNum.
> If all the savePartition operations are successful, we drop the origin table if it exists, then we alter the temp table rename to origin table.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org