You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/28 14:31:00 UTC

[GitHub] [spark] LantaoJin opened a new pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

LantaoJin opened a new pull request #28938:
URL: https://github.com/apache/spark/pull/28938


   ### What changes were proposed in this pull request?
   
   Use `ReadWriteLock` for each database instead of one synchronized block to improve the performance.
   
   
   ### Why are the changes needed?
   In `HiveExternalCatalog`, all metastore operations are synchronized by a same object lock. In a heavy traffic Spark thriftserver or Spark Driver, users's queries may be stuck by any a long operation.
   
   For example, if a user is accessing a table which contains mass partitions, the operation `loadDynamicPartitions()` holds the object lock for a long time. All queries are blocking to wait for the lock. From the thread dump stack, `Thread-61500` was holding the object lock with a high frequency as mass partitions table access, this lead to many queries stuck.
   ```
   61500 HiveServer2-Background-Pool: Thread-61500
   
   java.lang.Object.wait(Native Method)
   org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1542)
   org.apache.hadoop.ipc.Client.call(Client.java:1498)
   org.apache.hadoop.ipc.Client.call(Client.java:1398)
   org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
   com.sun.proxy.$Proxy10.getEZForPath(Unknown Source)
   org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEZForPath(ClientNamenodeProtocolTranslatorPB.java:1448)
   sun.reflect.GeneratedMethodAccessor292.invoke(Unknown Source)
   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   java.lang.reflect.Method.invoke(Method.java:498)
   org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
   org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
   org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
   com.sun.proxy.$Proxy11.getEZForPath(Unknown Source)
   org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3408)
   org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2259)
   org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:339)
   org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
   org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:2687)
   org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2621)
   org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2748)
   org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
   org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1593)
   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   java.lang.reflect.Method.invoke(Method.java:498)
   org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:1001)
   org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:961)
   org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:959)
   org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:959)
   org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1$$anonfun$apply$2.apply(HiveClientImpl.scala:326)
   org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$retryLocked(HiveClientImpl.scala:255)
   org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:309)
   org.apache.spark.sql.hive.client.HiveClientImpl.updateCallMetrics(HiveClientImpl.scala:339)
   org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:308)
   org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:959)
   org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:993)
   org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:981)
   org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:981)
   org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:127)
   org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:152)
   org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:981)
   org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:262)
   org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:111)
   org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:111) => holding Monitor(org.apache.spark.sql.execution.command.DataWritingCommandExec@708550291})
   org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:109)
   org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:126)
   org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:137) => holding Monitor(java.lang.Object@1464134318})
   org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:197)
   org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:197)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Exists UTs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709776746


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129874/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-773702470


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709199189


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129829/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709898787


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r508276917



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
##########
@@ -103,22 +103,24 @@ private[hive] class HiveClientImpl(
   // Circular buffer to hold what hive prints to STDOUT and ERR.  Only printed when failures occur.
   private val outputBuffer = new CircularBuffer()
 
-  private val shim = version match {
-    case hive.v12 => new Shim_v0_12()
-    case hive.v13 => new Shim_v0_13()
-    case hive.v14 => new Shim_v0_14()
-    case hive.v1_0 => new Shim_v1_0()
-    case hive.v1_1 => new Shim_v1_1()
-    case hive.v1_2 => new Shim_v1_2()
-    case hive.v2_0 => new Shim_v2_0()
-    case hive.v2_1 => new Shim_v2_1()
-    case hive.v2_2 => new Shim_v2_2()
-    case hive.v2_3 => new Shim_v2_3()
-    case hive.v3_0 => new Shim_v3_0()
-    case hive.v3_1 => new Shim_v3_1()
-  }
-
-  // Create an internal session state for this HiveClientImpl.
+  private val shim: ThreadLocal[Shim] = new ThreadLocal[Shim]() {
+    override def initialValue(): Shim = version match {
+      case hive.v12 => new Shim_v0_12()
+      case hive.v13 => new Shim_v0_13()
+      case hive.v14 => new Shim_v0_14()
+      case hive.v1_0 => new Shim_v1_0()
+      case hive.v1_1 => new Shim_v1_1()
+      case hive.v1_2 => new Shim_v1_2()
+      case hive.v2_0 => new Shim_v2_0()
+      case hive.v2_1 => new Shim_v2_1()
+      case hive.v2_2 => new Shim_v2_2()
+      case hive.v2_3 => new Shim_v2_3()
+      case hive.v3_0 => new Shim_v3_0()
+      case hive.v3_1 => new Shim_v3_1()
+    }
+  }
+
+  // Since HiveClientImpl is thread local, one state per HiveClientImpl

Review comment:
       Oh, this comment should be reverted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709225088


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129830/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712738632






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709208893


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34433/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709224716


   **[Test build #129830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129830/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709903663


   **[Test build #129887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129887/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709928778






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709239523


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34435/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709776736


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654796439


   I think it's safe to use database level R/W locks here with thread-unsafe hive-client. Use one object lock is over kill in `HiveExternalCatalog`. You never know how slow a RPC in hive-metastore could be in production. In the underlay storage in hive-metastore, MySQL help to provide a transaction lock. So the metadata tables in MySQL is safe. Using a lock in client side is to prevent from the cases such as two clients creating a same table at the same time. But with the DB level R/W locks, no conflicted operations in the same database. Creating tables in different DBs concurrently with thread-unsafe hive-client is fine. No conflicts or dirty read/write in hive and MySQL. Could you give a case about conflict with this kind of locks?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650868622






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509954044



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       @cloud-fan The hive client is not thread-safe.
   1. If there are two threads are creating two tables in the same database. For example, thread-1 is creating db1.table1 and thread-2 is creating db1.table2. This db-level lock in `withClient` will block one thread until the other one finished.
   2. If there are two threads are creating two tables in the different database. For example, thread-1 is creating db1.table1 and thread-2 is creating db2.table2.  There operations could be executed concurrently since the hive clients are thread-local. The client in thread-1 is not the same instance with the client in thread-2:
   ```scala
     private def client: Hive = {
       // get the Hive and set to thread local
       val c = Hive.get(conf)
       Hive.set(c)
       c
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650913642


   **[Test build #124619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124619/testReport)** for PR 28938 at commit [`96dfd17`](https://github.com/apache/spark/commit/96dfd17c631e2d8ef2e330dbe2cbe63fd4e7a262).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712737511


   **[Test build #130038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130038/testReport)** for PR 28938 at commit [`a7a72ce`](https://github.com/apache/spark/commit/a7a72ceb294b9a16cd1a34b36a36f966d4b3b7a0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650971407


   Merged build finished. Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712690950


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34645/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712708387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650870128


   **[Test build #124619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124619/testReport)** for PR 28938 at commit [`96dfd17`](https://github.com/apache/spark/commit/96dfd17c631e2d8ef2e330dbe2cbe63fd4e7a262).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650770193






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712156116






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842475


   **[Test build #129877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129877/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650795068


   **[Test build #124604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124604/testReport)** for PR 28938 at commit [`1cb5d2f`](https://github.com/apache/spark/commit/1cb5d2fcd7dce6ef1cd3198a9ca94f1bd2bd6060).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r448086615



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -213,23 +234,23 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
     client.alterDatabase(dbDefinition)
   }
 
-  override def getDatabase(db: String): CatalogDatabase = withClient {
+  override def getDatabase(db: String): CatalogDatabase = withClient(db) {
     client.getDatabase(db)
   }
 
-  override def databaseExists(db: String): Boolean = withClient {
+  override def databaseExists(db: String): Boolean = withClient(db) {
     client.databaseExists(db)
   }
 
-  override def listDatabases(): Seq[String] = withClient {
+  override def listDatabases(): Seq[String] = withClient() {

Review comment:
       I think we don't need a global lock for `listDatabases`. There are two reasons. First is these are just read operations, no critical problems except getting dirty databases list. Second, the underlay store of HiveMetastore such as MYSQL has locks itself.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709208903






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709727243


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650870128


   **[Test build #124619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124619/testReport)** for PR 28938 at commit [`96dfd17`](https://github.com/apache/spark/commit/96dfd17c631e2d8ef2e330dbe2cbe63fd4e7a262).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842821






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714421333


   **[Test build #130154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130154/testReport)** for PR 28938 at commit [`e1ae2e7`](https://github.com/apache/spark/commit/e1ae2e7a552a0f29366fadb820095d34ad28de0e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509927098



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -78,6 +80,9 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
     classOf[TException].getCanonicalName,
     classOf[InvocationTargetException].getCanonicalName)
 
+  // Locks used to synchronized read or write operations per database
+  private val clientLocks = new ConcurrentHashMap[String, ReadWriteLock]()

Review comment:
       Can we use a fixed number of lock objects? e.g. `hash(db_name) % num_lock_objs`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714459867






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712106731


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34610/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712156116






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650866245






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712667936


   **[Test build #130038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130038/testReport)** for PR 28938 at commit [`a7a72ce`](https://github.com/apache/spark/commit/a7a72ceb294b9a16cd1a34b36a36f966d4b3b7a0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650866245






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716949776






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650971407


   Merged build finished. Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650769940


   **[Test build #124604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124604/testReport)** for PR 28938 at commit [`1cb5d2f`](https://github.com/apache/spark/commit/1cb5d2fcd7dce6ef1cd3198a9ca94f1bd2bd6060).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709935310


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129887/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709776321


   **[Test build #129874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129874/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709065662


   **[Test build #129830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129830/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654836968


   AFAIK, hive client (metastore clien) is not thread-safe. I don't find any documentation but I find a patch [HIVE-13753](https://issues.apache.org/jira/browse/HIVE-13753) to provide a thread-saft client `SynchronizedMetaStoreClient`. So current implementation of `HiveClientImpl` still uses `getMSC()` to operate with metastore.
   
   But I think it's fine even the metastore client is not thread-safe but with R/W locks per DB. "conflicting operations like creating the same table at the same time." is just an example. With the locks, we still can prevent from the conflict operations in client side such as `tableExists("db1.table1")` reture `false`, but another thread is `createTable("db1.table1")`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654706001


   Can you reference some hive doc? IIRC we use a global lock because the hive client is not thread-safe. I'm not convinced that the hive client is thread-safe when operating on different databases.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709226967


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34435/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709919948


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34491/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509954044



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       @cloud-fan The hive client is not thread-safe. But it's thread local.
   1. If there are two threads are creating two tables in the same database. For example, thread-1 is creating db1.table1 and thread-2 is creating db1.table2. This db-level lock in `withClient` will block one thread until the other one finished.
   2. If there are two threads are creating two tables in the different database. For example, thread-1 is creating db1.table1 and thread-2 is creating db2.table2.  There operations could be executed concurrently since the hive clients are thread-local. The client in thread-1 is not the same instance with the client in thread-2:
   ```scala
     private def client: Hive = {
       // get the Hive and set to thread local
       val c = Hive.get(conf)
       Hive.set(c)
       c
     }
   ```
   
   Because `Hive.get(conf)` will get a Hive client instance for each thread.
   ```java
     private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean isFastCheck,
         boolean doRegisterAllFns) throws HiveException {
       Hive db = hiveDB.get();
       if (db == null || !db.isCurrentUserOwner() || needsRefresh
           || (c != null && db.metaStoreClient != null && !isCompatible(db, c, isFastCheck))) {
         db = create(c, false, db, doRegisterAllFns);
       }
       if (c != null) {
         db.conf = c;
       }
       return db;
     }
   ```
   and
   ```java
   private static ThreadLocal<Hive> hiveDB = new ThreadLocal<Hive>() {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714459603


   **[Test build #130154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130154/testReport)** for PR 28938 at commit [`e1ae2e7`](https://github.com/apache/spark/commit/e1ae2e7a552a0f29366fadb820095d34ad28de0e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509954044



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       @cloud-fan The hive client is not thread-safe. But it's thread local.
   1. If there are two threads are creating two tables in the same database. For example, thread-1 is creating db1.table1 and thread-2 is creating db1.table2. This db-level lock in `withClient` will block one thread until the other one finished.
   2. If there are two threads are creating two tables in the different database. For example, thread-1 is creating db1.table1 and thread-2 is creating db2.table2.  There operations could be executed concurrently since the hive clients are thread-local. The client in thread-1 is not the same instance with the client in thread-2:
   ```scala
     private def client: Hive = {
       // get the Hive and set to thread local
       val c = Hive.get(conf)
       Hive.set(c)
       c
     }
   ```
   
   Because `Hive.get(conf)` will get a Hive client instance for each thread.
   ```java
     private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean isFastCheck,
         boolean doRegisterAllFns) throws HiveException {
       Hive db = hiveDB.get();
       if (db == null || !db.isCurrentUserOwner() || needsRefresh
           || (c != null && db.metaStoreClient != null && !isCompatible(db, c, isFastCheck))) {
         db = create(c, false, db, doRegisterAllFns);
       }
       if (c != null) {
         db.conf = c;
       }
       return db;
     }
   ```
   ```scala
     public static void set(Hive hive) {
       hiveDB.set(hive);
     }
   ```
   and
   ```java
   private static ThreadLocal<Hive> hiveDB = new ThreadLocal<Hive>() {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714421333


   **[Test build #130154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130154/testReport)** for PR 28938 at commit [`e1ae2e7`](https://github.com/apache/spark/commit/e1ae2e7a552a0f29366fadb820095d34ad28de0e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709798588


   **[Test build #129877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129877/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650971900


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124619/
   Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r448086615



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -213,23 +234,23 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
     client.alterDatabase(dbDefinition)
   }
 
-  override def getDatabase(db: String): CatalogDatabase = withClient {
+  override def getDatabase(db: String): CatalogDatabase = withClient(db) {
     client.getDatabase(db)
   }
 
-  override def databaseExists(db: String): Boolean = withClient {
+  override def databaseExists(db: String): Boolean = withClient(db) {
     client.databaseExists(db)
   }
 
-  override def listDatabases(): Seq[String] = withClient {
+  override def listDatabases(): Seq[String] = withClient() {

Review comment:
       I think we don't need that. There are two reasons. First is these are just read operations, no critical problems except getting dirty databases list. Second, the underlay store of HiveMetastore such as MYSQL has locks itself.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712124058


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34610/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712658826


   Hi @cloud-fan , in the previous discussion, we are confused about whether or not the hive client is thread-safe. Actually, I missed a change in HiveClientImpl -- changing the shim to thread-local. Now the `Hive` object is totally thread-safe. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709782115






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709782115


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714461354


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34760/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654796439


   I think it's safe to use database level R/W locks here with thread-unsafe hive-client. Use one object lock is over kill in `HiveExternalCatalog`. You never know how slow a RPC in hive-metastore could be in production. In the underlay storage in hive-metastore, MySQL help to provide a transaction lock. So the metadata tables in MySQL is safe. Using a lock in client side is to prevent from the cases such as two clients creating a same table at the same time. But with the DB level R/W locks, no conflicted operations in the same database. Could you give a case about conflict with this kind of locks?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654809755


   Your comments seem to rely on Hive internal details, it's better to have official document to prove it.
   
   BTW I'm not sure the hive client thread-unsafety comes from conflicting operations like creating the same table at the same time. Hive metastore must handle conflicts as it needs to serve multiple clients.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714445981


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34760/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712124025


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34610/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712124044






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650770193






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509927849



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       can you refer to some hive documents? I'm not sure it's safe to assume that we can lock on the database-level. I thought the hive client itself is not thread-safe.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709239554






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654836968


   In my opinion, hive client (metastore clien) is not thread-safe. I don't find any documentation but I find a patch [HIVE-13753](https://issues.apache.org/jira/browse/HIVE-13753) to provide a thread-saft client `SynchronizedMetaStoreClient`. So current implementation of `HiveClientImpl` still uses `getMSC()` to operate with metastore.
   
   But I think it's fine even the metastore client is not thread-safe but with R/W locks per DB. "conflicting operations like creating the same table at the same time." is just an example. With the locks, we still can prevent from the conflict operations in client side such as `tableExists("db1.table1")` reture `false`, but another thread is `createTable("db1.table1")`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509954044



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       @cloud-fan The hive client is not thread-safe. But it's thread local.
   1. If there are two threads are creating two tables in the same database. For example, thread-1 is creating db1.table1 and thread-2 is creating db1.table2. This db-level lock in `withClient` will block one thread until the other one finished.
   2. If there are two threads are creating two tables in the different database. For example, thread-1 is creating db1.table1 and thread-2 is creating db2.table2.  There operations could be executed concurrently since the hive clients are thread-local. The client in thread-1 is not the same instance with the client in thread-2:
   ```scala
     private def client: Hive = {
       // get the Hive and set to thread local
       val c = Hive.get(conf)
       Hive.set(c)
       c
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716928333


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650795296






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712067038


   **[Test build #130003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130003/testReport)** for PR 28938 at commit [`cf345f4`](https://github.com/apache/spark/commit/cf345f4cfcaff9d0c28ec699c2e9108c964d77e9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709161390


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34433/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716950585


   Merged build finished. Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709935305


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716950585






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716949776


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714461368






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-773702470


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654881394


   The problem is we don't know why hive client is not thread-safe. You need to investigate and explain the internal details (what are the internal states of the hive client). Otherwise, reviewers don't have enough context to understand your solution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-651006218


   > Please put that on the PR description. That will be a commit log permanantly.
   
   Updated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709729676


   **[Test build #129874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129874/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842833


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129877/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650868622

> Hi, @LantaoJin . Since this PR aims to use multiple locks instead of a global one, could you add some explanation briefly about how this PR is deadlock-free? Thanks.

Sure. The multiple read-write reentrant locks are divided by different databases. Only operations applied in the same database name could share a same lock. I checked all operations, there is no one operation and its callers need two write locks. Since it's a reentrant lock, it's ok that the method `renamePartitions` which calls `alterPartitions` will re-enter the same write lock.
Besides, this patch has been online for a long time in our production (over 1 year moved object lock to read-write lock and over 5 months separated to multiple locks). Firstly, we changed the single object lock to a single read-write lock since we found the contention from the old lock is very heavy. After moved to read-write lock, it became much better. But we still found a heavy holding in the write lock would impact the operations even through these queries were submitted by different users. So we decided to split the single read-write lock to one lock per database.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650971900


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124619/
   Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650868622

> Hi, @LantaoJin . Since this PR aims to use multiple locks instead of a global one, could you add some explanation briefly about how this PR is deadlock-free? Thanks.

Sure. The multiple read-write reentrant locks are divided by different databases. Only operations applied in the same database name could share a same lock. I checked all operations, there is no one operation and its callers need two write locks. Since it's a reentrant lock, it's ok that the method `renamePartitions` which calls `alterPartitions` will re-enter the same write lock.
Besides, this patch has been online for a long time in our production (over 1 year moved object lock to read-write lock and over 5 months separated to multiple locks per database). Firstly, we changed the single object lock to a single read-write lock since we found the contention from the old lock is very heavy. After moved to read-write lock, it became much better. But we still found a heavy holding in the write lock would impact the operations even through these queries were submitted in different users. So we decided to split the single read-write lock to one lock per database.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709844264






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709935087


   **[Test build #129887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129887/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650898036


   Please put that on the PR description. That will be a commit log permanantly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r448055608



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -213,23 +234,23 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
     client.alterDatabase(dbDefinition)
   }
 
-  override def getDatabase(db: String): CatalogDatabase = withClient {
+  override def getDatabase(db: String): CatalogDatabase = withClient(db) {
     client.getDatabase(db)
   }
 
-  override def databaseExists(db: String): Boolean = withClient {
+  override def databaseExists(db: String): Boolean = withClient(db) {
     client.databaseExists(db)
   }
 
-  override def listDatabases(): Seq[String] = withClient {
+  override def listDatabases(): Seq[String] = withClient() {

Review comment:
       Do we need a global lock for these method since they use all database ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712067038


   **[Test build #130003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130003/testReport)** for PR 28938 at commit [`cf345f4`](https://github.com/apache/spark/commit/cf345f4cfcaff9d0c28ec699c2e9108c964d77e9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712124044


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709199150


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650868622

> Hi, @LantaoJin . Since this PR aims to use multiple locks instead of a global one, could you add some explanation briefly about how this PR is deadlock-free? Thanks.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r510009386



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       You are right, you remind me to check the diff between our code and spark3 again. We changed many objects in `HiveExternalCatalog.client` non-shared. For example,
   In spark3
   ```scala
     val state: SessionState = ...
   
     override val userName = UserGroupInformation.getCurrentUser.getShortUserName
   ```
   and our code
   ```scala
     // get the state from thread-local
     def state: SessionState = ...
   
     // This should be def instead of val. Since it may be invoked in different ugi
     // in thrift server, we need re-evaluate it every time it is invoked, and internally
     // it obtains the user name via the current ugi.
     private def userName = conf.getUser
   ```
   
   Em, let me remark it to WIP.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509998660



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       Interesting. If it's thread-local, then `HiveExternalCatalog.client` should be thread-safe? Or these hive clients that in different threads still share something?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709050036


   **[Test build #129829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129829/testReport)** for PR 28938 at commit [`04f1f2d`](https://github.com/apache/spark/commit/04f1f2d955d05ef67ae4444d6125042c025e4e34).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709225078


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716949632


   **[Test build #130307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130307/testReport)** for PR 28938 at commit [`e1ae2e7`](https://github.com/apache/spark/commit/e1ae2e7a552a0f29366fadb820095d34ad28de0e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716929003


   **[Test build #130307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130307/testReport)** for PR 28938 at commit [`e1ae2e7`](https://github.com/apache/spark/commit/e1ae2e7a552a0f29366fadb820095d34ad28de0e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716950572


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34909/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709198539


   **[Test build #129829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129829/testReport)** for PR 28938 at commit [`04f1f2d`](https://github.com/apache/spark/commit/04f1f2d955d05ef67ae4444d6125042c025e4e34).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709842821


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650769940


   **[Test build #124604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124604/testReport)** for PR 28938 at commit [`1cb5d2f`](https://github.com/apache/spark/commit/1cb5d2fcd7dce6ef1cd3198a9ca94f1bd2bd6060).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650795296






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709796207


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712708387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712708365


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34645/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654796439


   I think it's safe to use database level R/W locks here with thread-unsafe hive-client. Use one object lock is over kill in `HiveExternalCatalog`. You never know how slow a RPC in hive-metastore could be in production. In the underlay storage in hive-metastore, MySQL help to provide a transaction lock. So the metadata tables in MySQL is safe. Using a lock in client side is to prevent from the cases such as two clients creating a same table at the same time. But with the DB level R/W locks, no conflicted operations in the same database. Creating tables in different DBs concurrently with thread-unsafe hive-client is fine. No conflicts or dirty read/write in hive and MySQL. Could you give a case about conflict ops with this kind of locks if I'm missing?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709928746


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34491/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709065662


   **[Test build #129830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129830/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709844239


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34482/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709239554






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716950593


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34909/
   Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712667936


   **[Test build #130038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130038/testReport)** for PR 28938 at commit [`a7a72ce`](https://github.com/apache/spark/commit/a7a72ceb294b9a16cd1a34b36a36f966d4b3b7a0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709729676


   **[Test build #129874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129874/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709935305






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650868622


   > Hi, @LantaoJin . Since this PR aims to use multiple locks instead of a global one, could you add some explanation briefly about how this PR is deadlock-free? Thanks.
   
   Sure. The multiple read-write reentrant locks are divided by different databases. Only operations applied in the same database name could share a same lock. I checked all operations, there is no one operation and its callers need two write locks. Since it's a reentrant lock, it's ok that the method `renamePartitions` which calls `alterPartitions` will re-enter the same write lock.
   Besides, this patch has been online for a long time in our production (over 1 year moved object lock to read-write lock and over 5 months separated to multiple locks per database).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709928778






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r510012214



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Run some code involving `client` in a [[synchronized]] block and wrap certain
+   * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
    * exceptions thrown in the process in [[AnalysisException]].
+   *
+   * @param db to specify the place of the operation act on.
+   * @param writeLock to specify it is a write lock.
    */
-  private def withClient[T](body: => T): T = synchronized {
+  private def withClient[T](db: String = "", writeLock: Boolean = false) (body: => T): T = {

Review comment:
       We use the spark thrift server only, so the `HiveExternalCatalog.client.state` always thread-local. But for the open source spark, it may be a little bit tricky to change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716949779


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130307/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709798588


   **[Test build #129877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129877/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709199150






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712154968


   **[Test build #130003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130003/testReport)** for PR 28938 at commit [`cf345f4`](https://github.com/apache/spark/commit/cf345f4cfcaff9d0c28ec699c2e9108c964d77e9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709776736






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709903663


   **[Test build #129887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129887/testReport)** for PR 28938 at commit [`3fbfd5d`](https://github.com/apache/spark/commit/3fbfd5d5edc52519dea3e7958ee0b4d64ff930fa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716929003


   **[Test build #130307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130307/testReport)** for PR 28938 at commit [`e1ae2e7`](https://github.com/apache/spark/commit/e1ae2e7a552a0f29366fadb820095d34ad28de0e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin edited a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650868622

> Hi, @LantaoJin . Since this PR aims to use multiple locks instead of a global one, could you add some explanation briefly about how this PR is deadlock-free? Thanks.

Sure. The multiple read-write reentrant locks are divided by different databases. Only operations applied in the same database name could share a same lock. I checked all operations, there is no one operation and its callers need two write locks. Since it's a reentrant lock, it's ok that the method `renamePartitions` which calls `alterPartitions` will re-enter the same write lock.
Besides, this patch has been online for a long time in our production (over 1 year moved object lock to read-write lock and over 5 months separated to multiple locks). Firstly, we changed the single object lock to a single read-write lock since we found the contention from the old lock is very heavy. After moved to read-write lock, it became much better. But we still found a heavy holding in the write lock would impact the operations even through these queries were submitted in different users. So we decided to split the single read-write lock to one lock per database.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r448086615



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -213,23 +234,23 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
     client.alterDatabase(dbDefinition)
   }
 
-  override def getDatabase(db: String): CatalogDatabase = withClient {
+  override def getDatabase(db: String): CatalogDatabase = withClient(db) {
     client.getDatabase(db)
   }
 
-  override def databaseExists(db: String): Boolean = withClient {
+  override def databaseExists(db: String): Boolean = withClient(db) {
     client.databaseExists(db)
   }
 
-  override def listDatabases(): Seq[String] = withClient {
+  override def listDatabases(): Seq[String] = withClient() {

Review comment:
       I think we don't need a global lock for `listDatabases`. There are two reasons. First is it is just a read operation, no critical problems except getting dirty databases list. Second, the underlay store of HiveMetastore such as MYSQL has locks itself.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-712738632






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709844264






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709050036


   **[Test build #129829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129829/testReport)** for PR 28938 at commit [`04f1f2d`](https://github.com/apache/spark/commit/04f1f2d955d05ef67ae4444d6125042c025e4e34).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-654693688


   Gentle ping @dongjoon-hyun @cloud-fan @wangyum @rxin @maropu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709208903






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709782124


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34479/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-714459867






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709831123


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34482/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-650803341


   Hi, @LantaoJin . Since this PR aims to use multiple locks instead of a global one, could you add some explanation briefly about how this PR is deadlock-free? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-716944717


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34909/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28938:
URL: https://github.com/apache/spark/pull/28938#issuecomment-709225078






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #28938: [WIP][SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Posted by GitBox <gi...@apache.org>.

github-actions[bot] closed pull request #28938:
URL: https://github.com/apache/spark/pull/28938


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org