You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/10 12:11:48 UTC

[GitHub] [spark] ebyhr opened a new pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

ebyhr opened a new pull request #30001:
URL: https://github.com/apache/spark/pull/30001


   ### What changes were proposed in this pull request?
   Allow save dataframe even when JDBC driver doesn't support executeBatch. 
   
   
   ### Why are the changes needed?
   To resolve a situation, some DB or query engines doesn't support executeBatch in their JDBC driver. 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unfortunately, I couldn't find the simple way in the existing tests. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709672597






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709836666






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30001:
URL: https://github.com/apache/spark/pull/30001#discussion_r505932800



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##########
@@ -675,15 +676,19 @@ object JdbcUtils extends Logging {
             }
             i = i + 1
           }
-          stmt.addBatch()
-          rowCount += 1
-          totalRowCount += 1
-          if (rowCount % batchSize == 0) {
-            stmt.executeBatch()
-            rowCount = 0
+          if (supportsBatchUpdates) {

Review comment:
       Could you pull this check out from the loop? I think we don't need to check it in every iterations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709653955


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709656628


   **[Test build #129862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129862/testReport)** for PR 30001 at commit [`0ebceb0`](https://github.com/apache/spark/commit/0ebceb01d1bbd30345f4d0a3662f34a51bc965d7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-706540187


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #30001:
URL: https://github.com/apache/spark/pull/30001


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30001:
URL: https://github.com/apache/spark/pull/30001#discussion_r505932800



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##########
@@ -675,15 +676,19 @@ object JdbcUtils extends Logging {
             }
             i = i + 1
           }
-          stmt.addBatch()
-          rowCount += 1
-          totalRowCount += 1
-          if (rowCount % batchSize == 0) {
-            stmt.executeBatch()
-            rowCount = 0
+          if (supportsBatchUpdates) {

Review comment:
       Could you pull this check from the loop? I think we don't need to check it in every iterations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709667914


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34468/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-706540187


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709672587


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34468/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709836666






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-706540059


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709654568


   > To resolve a situation, some DB or query engines doesn't support executeBatch in their JDBC driver.
   
   Just a question; do you know which system does not support `executeBatch`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-803216102


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-706540059


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709834570


   **[Test build #129862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129862/testReport)** for PR 30001 at commit [`0ebceb0`](https://github.com/apache/spark/commit/0ebceb01d1bbd30345f4d0a3662f34a51bc965d7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709656628


   **[Test build #129862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129862/testReport)** for PR 30001 at commit [`0ebceb0`](https://github.com/apache/spark/commit/0ebceb01d1bbd30345f4d0a3662f34a51bc965d7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30001: [SPARK-33112][SQL] Avoid executeBatch when JDBC does not support batch updates

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30001:
URL: https://github.com/apache/spark/pull/30001#issuecomment-709672597






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org