You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/08/25 13:48:18 UTC

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22233

    [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITIONS

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to not perform recursive parallel listening of files in the `scanPartitions` method because it can cause a deadlock. Instead of that I propose to do `scanPartitions` in parallel for top level partitions only.
    
    ## How was this patch tested?
    
    I extended an existing test to trigger the deadlock.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 fix-recover-partitions

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22233
    
----
commit 189e14d98d9317acc41b46e5c6a4fe1f867174d3
Author: Maxim Gekk <ma...@...>
Date:   2018-08-25T12:36:57Z

    Extending tests to catch a dead-lock

commit bc660a09faed9472c0492ac33c183f6bf5248048
Author: Maxim Gekk <ma...@...>
Date:   2018-08-25T12:43:35Z

    Enable the tests which catches a dead-lock

commit 59a376d324a6cfbaffba40e7b14e48023513eeac
Author: Maxim Gekk <ma...@...>
Date:   2018-08-25T13:19:42Z

    List file in parallel on top level only

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213252905
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    @kiszk Right, all `Future`s do the same - trying to execute another `Future` on the same fixed thread pool. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213371098
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    Got it. sorry for my overlooking.
    
    Are other places safe where parallelism would not reach the fixed thread pool size?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213423805
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
    @@ -60,7 +60,8 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA
       protected override def generateTable(
           catalog: SessionCatalog,
           name: TableIdentifier,
    -      isDataSource: Boolean): CatalogTable = {
    +      isDataSource: Boolean,
    +      partitionCols: Seq[String] = Seq("a", "b")): CatalogTable = {
    --- End diff --
    
    The interface of this function looks strange. The original one is also hacky. We should refine them later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r214203842
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
    @@ -60,7 +60,8 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA
       protected override def generateTable(
           catalog: SessionCatalog,
           name: TableIdentifier,
    -      isDataSource: Boolean): CatalogTable = {
    +      isDataSource: Boolean,
    +      partitionCols: Seq[String] = Seq("a", "b")): CatalogTable = {
    --- End diff --
    
    Yeah, please don't overwrite a method with a default parameter. It's very easy to use different default values then the value to pick up will depend on the type you are using...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213009058
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    Thank you attaching the stack trace. I have just looked at it. It looks strange to me. Every thread is `waiting for`. No blocker is there, only one `locked` exists.
    In typical case, a deadlock occurs due to existence of blocker as attached stack trace in #22221
    
    I will investigate it furthermore tomorrow if we need to use this implementation instead of reverting it to the original implementation to use Scala parallel collection.
    
    ```
    ...
            - parking to wait for  <0x0000000793c0d610> (a scala.concurrent.impl.Promise$CompletionLatch)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
            at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:206)
            at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222)
            at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
            at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
            at org.apache.spark.util.ThreadUtils$.parmap(ThreadUtils.scala:317)
            at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.scanPartitions(ddl.scala:690)
            at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.run(ddl.scala:626)
            at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
            - locked <0x0000000793b04e88> (a org.apache.spark.sql.execution.command.ExecutedCommandExec)
            at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
            at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
    ...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r212813730
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    This change might introduce performance regression. Do you know why it works when using `.par` previously? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95343/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213050406
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    I think the root cause is clear - fixed thread pool + submitting and waiting a future inside of another future from the the same thread pool. @gatorsmile I will revert parallel collection back here if you don't mind since there is no reasons for `parmap` in this place.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    **[Test build #95343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95343/testReport)** for PR 22233 at commit [`071de47`](https://github.com/apache/spark/commit/071de4767d762247751f530be78d6b99758bd95c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213057049
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    cc @zsxwing had a few offline comments about the original PR for `parmap`. He will post them soon. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r212817243
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    > This change might introduce performance regression.
    
    Right, if there is significant disbalance of sub-folders, scanning will be slower probably.
    
    > Do you know why it works when using .par previously?
    
    Scala parallel collections can cope with nested calls. See this from slide 12: https://www.slideshare.net/AleksandarProkopec/scala-parallel-collections
    
    @gatorsmile I can revert Scala parallel collections here since we use them on the driver, and `parmap` is not not necessary here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213322084
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    I am worry whether the similar deadlock may occur in other places due to
    * larger parallelism than the fixed thread pool
    * nested parallelism like this
    
    I also realized there is another `parmap` implementation uses thread pool. Can we use  another implemetation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213419455
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -52,23 +52,24 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo
       protected override def generateTable(
           catalog: SessionCatalog,
           name: TableIdentifier,
    -      isDataSource: Boolean = true): CatalogTable = {
    +      isDataSource: Boolean = true,
    +      partitionCols: Seq[String] = Seq("a", "b")): CatalogTable = {
    --- End diff --
    
    @gatorsmile Yes, the changes are related to an existing test which was modified to reproduce the issue. In particular, this line is related to support of any number of partition columns. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213406078
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -52,23 +52,24 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo
       protected override def generateTable(
           catalog: SessionCatalog,
           name: TableIdentifier,
    -      isDataSource: Boolean = true): CatalogTable = {
    +      isDataSource: Boolean = true,
    +      partitionCols: Seq[String] = Seq("a", "b")): CatalogTable = {
    --- End diff --
    
    A question about the changes in this file. Are they related to the work of this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213327251
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    > Can we use another implemetation?
    
    This is what @zsxwing proposed. Please, look at my comment https://github.com/apache/spark/pull/22233#discussion_r213075357


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213041684
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    @kiszk Thanks for the investigation! Please take a look at the root cause? If unable to figure it out, we need to revert it back to `.par`. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213075357
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    > The parmap overload without the ec parameter is fine ...
    
    I didn't use it because each recursive call creates new fork join pool. This can create significant amount of threads - `n` ^ `d` threads where `n` - max size of fork join pool, `d` - average depth.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95251/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r212831774
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    @MaxGekk Do you have a stack trace on each thread when a dead lock occurs?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22233


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r212834324
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    Yes, I do [jstack.txt](https://github.com/apache/spark/files/2321770/jstack.txt)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    **[Test build #95251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95251/testReport)** for PR 22233 at commit [`59a376d`](https://github.com/apache/spark/commit/59a376d324a6cfbaffba40e7b14e48023513eeac).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r212802305
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -1131,7 +1135,7 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {
       }
     
       test("alter table: recover partition (parallel)") {
    -    withSQLConf("spark.rdd.parallelListingThreshold" -> "1") {
    +    withSQLConf("spark.rdd.parallelListingThreshold" -> "0") {
    --- End diff --
    
    @MaxGekk, out of curiosity, why does this have to be 0?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    **[Test build #95343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95343/testReport)** for PR 22233 at commit [`071de47`](https://github.com/apache/spark/commit/071de4767d762247751f530be78d6b99758bd95c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r212804252
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -1131,7 +1135,7 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {
       }
     
       test("alter table: recover partition (parallel)") {
    -    withSQLConf("spark.rdd.parallelListingThreshold" -> "1") {
    +    withSQLConf("spark.rdd.parallelListingThreshold" -> "0") {
    --- End diff --
    
    On the recursive calls this condition https://github.com/apache/spark/blob/131ca146ed390cd0109cd6e8c95b61e418507080/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L685-L686 is false because `statuses.length` is `1` and `threshold` is `1`. So, it leads to sequential listening of files. I just enforce parallel scanning even for 1 sub-folder.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213137139
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    @MaxGekk could you revert to use Scala `par`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213063623
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    @MaxGekk can we remove this `parmap` overload? It's pretty easy to cause deadlock. The `parmap` overload without the `ec` parameter is fine since it doesn't need a user specified thread pool.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    Basically, this PR is to revert the code to the original .par -based solution. 
    
    LGTM Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22233#discussion_r213138024
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
             val value = ExternalCatalogUtils.unescapePathName(ps(1))
             if (resolver(columnName, partitionNames.head)) {
               scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value),
    -            partitionNames.drop(1), threshold, resolver)
    +            partitionNames.drop(1), threshold, resolver, listFilesInParallel = false)
    --- End diff --
    
    Does it mean there is no avaiable thread in a given thread pool when a problem try to execute a new `Future`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22233
  
    **[Test build #95251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95251/testReport)** for PR 22233 at commit [`59a376d`](https://github.com/apache/spark/commit/59a376d324a6cfbaffba40e7b14e48023513eeac).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org