You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2016/07/05 08:13:23 UTC

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/14054

    [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid locking when writing partitions

    ## What changes were proposed in this pull request?
    
    Saving partitions to JDBC in transaction can use a weaker transaction isolation level to reduce locking. Use better method to check if transactions are supported.
    
    ## How was this patch tested?
    
    Existing Jenkins tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-16226

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14054.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14054
    
----
commit 1dd8495b0dc13e8c3ee1350c975b3a598b80d6f2
Author: Sean Owen <so...@cloudera.com>
Date:   2016-07-05T08:11:49Z

    Saving partitions to JDBC in transaction can use a weaker transaction isolation level to reduce locking. Use better method to check if transactions are supported.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    I am not aware of any restriction on the type of modification that is possible in any transaction isolation level. It should be orthogonal. Insert, update and delete is everything right -- what would this level be for, then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    I don't think it's worth configuring. Put it this way, why is SERIALIZABLE the right isolation level (this is the default)? I am not sure it is. The purpose of the transaction is rollback here, not isolation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    @rxin can I get an opinion on which one you think works better -- before, with no configurability, or this take on configuring the isolation level? I can go either way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by maver1ck <gi...@git.apache.org>.

Github user maver1ck commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    @srowen 
    Maybe we can add this as a configuration option ?
    
    I'm not sure how this affects performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14054#discussion_r71265164
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ---
    @@ -158,25 +159,41 @@ object JdbcUtils extends Logging {
           rddSchema: StructType,
           nullTypes: Array[Int],
           batchSize: Int,
    -      dialect: JdbcDialect): Iterator[Byte] = {
    +      dialect: JdbcDialect,
    +      isolationLevel: Int): Iterator[Byte] = {
         require(batchSize >= 1,
           s"Invalid value `${batchSize.toString}` for parameter " +
           s"`${JdbcUtils.JDBC_BATCH_INSERT_SIZE}`. The minimum value is 1.")
     
         val conn = getConnection()
         var committed = false
    -    val supportsTransactions = try {
    -      conn.getMetaData().supportsDataManipulationTransactionsOnly() ||
    -      conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions()
    -    } catch {
    -      case NonFatal(e) =>
    -        logWarning("Exception while detecting transaction support", e)
    -        true
    +
    +    var finalIsolationLevel = Connection.TRANSACTION_NONE
    +    if (isolationLevel != Connection.TRANSACTION_NONE) {
    +      try {
    +        val metadata = conn.getMetaData
    +        if (metadata.supportsTransactions()) {
    +          if (metadata.supportsTransactionIsolationLevel(isolationLevel))  {
    +            finalIsolationLevel = isolationLevel
    +          } else {
    +            val defaultIsolation = metadata.getDefaultTransactionIsolation
    +            logWarning(s"Requested isolation level $isolationLevel is not supported; " +
    +                s"falling back to isolation level $defaultIsolation")
    +            finalIsolationLevel = defaultIsolation
    +          }
    +        } else {
    +          logWarning(s"Requested isolation level $isolationLevel, but transactions are unsupported")
    +        }
    +      } catch {
    +        case NonFatal(e) => logWarning("Exception while detecting transaction support", e)
    +      }
         }
    +    val supportsTransactions = finalIsolationLevel != Connection.TRANSACTION_NONE
    --- End diff --
    
    Yeah, if possible, the default isolation needs to be consistent. Otherwise, it might be hard to debug if users hit the issue in the production environment. Sometimes, the problem is hard to reproduce it especially for the isolation-related issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #62524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62524/consoleFull)** for PR 14054 at commit [`8e6bf81`](https://github.com/apache/spark/commit/8e6bf816d0d79e588f8a53c8364975b9fcf52cbf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    I'm going to merge this to `master` only if I don't hear back soon. however if anyone feels moderately strongly about adding a config param I can do that. It could even be done later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14054#discussion_r71093448
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ---
    @@ -284,9 +286,17 @@ object JdbcUtils extends Logging {
         val rddSchema = df.schema
         val getConnection: () => Connection = createConnectionFactory(url, properties)
         val batchSize = properties.getProperty(JDBC_BATCH_INSERT_SIZE, "1000").toInt
    -    df.foreachPartition { iterator =>
    -      savePartition(getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect)
    -    }
    +    val isolationLevel =
    +      properties.getProperty(JDBC_TXN_ISOLATION_LEVEL, "READ_UNCOMMITTED") match {
    --- End diff --
    
    If not specified, should we use the default transaction isolation level set by the data sources? 
    
    My major concern is that the default level `READ_UNCOMMITTED` might not be supported by the underlying JDBC sources. In this case, we might get a `SQLException`? 
    
    Thus, maybe we should check whether the isolation we specified is supported or not by using `supportsTransactionIsolationLevel`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    @rxin I added configurability; does that look reasonable? there are several ways to play this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    `READ_UNCOMMITTED` is applicable to read-only transactions. Normally, the RDBMS will promote it to the higher/restricter isolation level if users try to apply `READ_UNCOMMITTED` to insert, delete or updates. This is what happens inside RDBMS. Thus, I think we should remove this option for our insert-only workloads.
    
    In addition, I am not sure whether we should call `supportsTransactionIsolationLevel` before using the isolation levels.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14054#discussion_r71179560
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ---
    @@ -158,25 +159,41 @@ object JdbcUtils extends Logging {
           rddSchema: StructType,
           nullTypes: Array[Int],
           batchSize: Int,
    -      dialect: JdbcDialect): Iterator[Byte] = {
    +      dialect: JdbcDialect,
    +      isolationLevel: Int): Iterator[Byte] = {
         require(batchSize >= 1,
           s"Invalid value `${batchSize.toString}` for parameter " +
           s"`${JdbcUtils.JDBC_BATCH_INSERT_SIZE}`. The minimum value is 1.")
     
         val conn = getConnection()
         var committed = false
    -    val supportsTransactions = try {
    -      conn.getMetaData().supportsDataManipulationTransactionsOnly() ||
    -      conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions()
    -    } catch {
    -      case NonFatal(e) =>
    -        logWarning("Exception while detecting transaction support", e)
    -        true
    +
    +    var finalIsolationLevel = Connection.TRANSACTION_NONE
    +    if (isolationLevel != Connection.TRANSACTION_NONE) {
    +      try {
    +        val metadata = conn.getMetaData
    +        if (metadata.supportsTransactions()) {
    +          if (metadata.supportsTransactionIsolationLevel(isolationLevel))  {
    +            finalIsolationLevel = isolationLevel
    +          } else {
    +            val defaultIsolation = metadata.getDefaultTransactionIsolation
    +            logWarning(s"Requested isolation level $isolationLevel is not supported; " +
    +                s"falling back to isolation level $defaultIsolation")
    +            finalIsolationLevel = defaultIsolation
    +          }
    +        } else {
    +          logWarning(s"Requested isolation level $isolationLevel, but transactions are unsupported")
    +        }
    +      } catch {
    +        case NonFatal(e) => logWarning("Exception while detecting transaction support", e)
    +      }
         }
    +    val supportsTransactions = finalIsolationLevel != Connection.TRANSACTION_NONE
    --- End diff --
    
    If we got an exception in the above try-catch block, `finalIsolationLevel` might be still `TRANSACTION_NONE`. The worst case could be `TRANSACTION_NONE`, instead of our default value `READ_UNCOMMITTED`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    I'm reluctant to add yet another (undocumented) config for another behavior. Still, it's not that big a deal. My question is: is there a good reason to employ `SERIALIZABLE`? it means only one partition can be written at once. We have other destinations that don't have these semantics; even this output is happy to proceed with no transaction at all. The downside is that partial writes are observable, but that didn't seem to be the motivation for using transactions in the first place.
    
    If people are pretty neutral I'd say leave it; if anyone feels fairly strongly about it I'll just make up some suitable config but leave this new transaction isolation level as the default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Interesting, well maybe `READ_UNCOMMITTED` is still a pretty good setting, because the transaction here doesn't even read anything and doesn't care about what it might read, but sounds like it could be promoted anyway.
    
    Yes, how about using the specified isolation level if available, and falling back to just leaving the default if not available, but transactions are supported in general. I like that as it is more robust.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    hm it does seem to me this should be an option in the data source itself.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62098/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    I do feel it'd be useful to be configurable, but go ahead. As you said we can make it configurable later.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    @gatorsmile or @zsxwing do you have any quick reaction to this change? reasonable or needs adjustment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #61749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61749/consoleFull)** for PR 14054 at commit [`1dd8495`](https://github.com/apache/spark/commit/1dd8495b0dc13e8c3ee1350c975b3a598b80d6f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    JDBC data sources could be any source that might have different behaviors. Thus, the above feedback is only for RDBMS. I think we do not need to add any restriction on the existing isolation levels.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14054


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #62457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62457/consoleFull)** for PR 14054 at commit [`12a3803`](https://github.com/apache/spark/commit/12a38032e84a673bec6591003b28fafa3c68daaf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61749/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #62457 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62457/consoleFull)** for PR 14054 at commit [`12a3803`](https://github.com/apache/spark/commit/12a38032e84a673bec6591003b28fafa3c68daaf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #62098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62098/consoleFull)** for PR 14054 at commit [`772dcc3`](https://github.com/apache/spark/commit/772dcc32d637f8790325b86161cc343d42497c76).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    `READ_UNCOMMITTED` is impossible for the transactions that contain inserts or updates, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #62524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62524/consoleFull)** for PR 14054 at commit [`8e6bf81`](https://github.com/apache/spark/commit/8e6bf816d0d79e588f8a53c8364975b9fcf52cbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62457/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14054#discussion_r71093530
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ---
    @@ -284,9 +286,17 @@ object JdbcUtils extends Logging {
         val rddSchema = df.schema
         val getConnection: () => Connection = createConnectionFactory(url, properties)
         val batchSize = properties.getProperty(JDBC_BATCH_INSERT_SIZE, "1000").toInt
    -    df.foreachPartition { iterator =>
    -      savePartition(getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect)
    -    }
    +    val isolationLevel =
    +      properties.getProperty(JDBC_TXN_ISOLATION_LEVEL, "READ_UNCOMMITTED") match {
    --- End diff --
    
    Maybe here, we can use a best-effort heuristics. If the `supportsTransactionIsolationLevel` calls for all the isolation levels return `false` (ideally, we should not hit this scenario), we can still keep the existing approach (i.e., do not set the isolation level).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #62098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62098/consoleFull)** for PR 14054 at commit [`772dcc3`](https://github.com/apache/spark/commit/772dcc32d637f8790325b86161cc343d42497c76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    **[Test build #61749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61749/consoleFull)** for PR 14054 at commit [`1dd8495`](https://github.com/apache/spark/commit/1dd8495b0dc13e8c3ee1350c975b3a598b80d6f2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62524/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14054#discussion_r71213696
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ---
    @@ -158,25 +159,41 @@ object JdbcUtils extends Logging {
           rddSchema: StructType,
           nullTypes: Array[Int],
           batchSize: Int,
    -      dialect: JdbcDialect): Iterator[Byte] = {
    +      dialect: JdbcDialect,
    +      isolationLevel: Int): Iterator[Byte] = {
         require(batchSize >= 1,
           s"Invalid value `${batchSize.toString}` for parameter " +
           s"`${JdbcUtils.JDBC_BATCH_INSERT_SIZE}`. The minimum value is 1.")
     
         val conn = getConnection()
         var committed = false
    -    val supportsTransactions = try {
    -      conn.getMetaData().supportsDataManipulationTransactionsOnly() ||
    -      conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions()
    -    } catch {
    -      case NonFatal(e) =>
    -        logWarning("Exception while detecting transaction support", e)
    -        true
    +
    +    var finalIsolationLevel = Connection.TRANSACTION_NONE
    +    if (isolationLevel != Connection.TRANSACTION_NONE) {
    +      try {
    +        val metadata = conn.getMetaData
    +        if (metadata.supportsTransactions()) {
    +          if (metadata.supportsTransactionIsolationLevel(isolationLevel))  {
    +            finalIsolationLevel = isolationLevel
    +          } else {
    +            val defaultIsolation = metadata.getDefaultTransactionIsolation
    +            logWarning(s"Requested isolation level $isolationLevel is not supported; " +
    +                s"falling back to isolation level $defaultIsolation")
    +            finalIsolationLevel = defaultIsolation
    +          }
    +        } else {
    +          logWarning(s"Requested isolation level $isolationLevel, but transactions are unsupported")
    +        }
    +      } catch {
    +        case NonFatal(e) => logWarning("Exception while detecting transaction support", e)
    +      }
         }
    +    val supportsTransactions = finalIsolationLevel != Connection.TRANSACTION_NONE
    --- End diff --
    
    Yeah, that was sort of on purpose, in case an exception happened in `supportsTransactions`. But if that succeeds but for some reason `supportsTransactionIsolationLevel` fails, something's odd, but perhaps it's better to make sure it uses the default isolation in that case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14054
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org