You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Barry Becker (JIRA)" <ji...@apache.org> on 2017/03/22 19:49:41 UTC
[jira] [Commented] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

    [ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936997#comment-15936997 ] 

Barry Becker commented on SPARK-13747:
--------------------------------------

We have hit this on rare instances in our production environment when calling tableNames on SQLContext. We are using Spark 2.1.0. Are there any possible workarounds that we might try? What is the ETA of spark 2.2?

{code}
spark.sql.execution.id is already set
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:81)
org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2375)
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2375)
org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778)
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2375)
org.apache.spark.sql.Dataset.collect(Dataset.scala:2351)
org.apache.spark.sql.SQLContext.tableNames(SQLContext.scala:750)
{code}

> Concurrent execution in SQL doesn't work with Scala ForkJoinPool
> ----------------------------------------------------------------
>
>                 Key: SPARK-13747
>                 URL: https://issues.apache.org/jira/browse/SPARK-13747
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Shixiong Zhu
>            Assignee: Shixiong Zhu
>             Fix For: 2.2.0
>
>
> Run the following codes may fail
> {code}
> (1 to 100).par.foreach { _ =>
>   println(sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count())
> }
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
>         at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) 
>         at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
>         at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> This is because SparkContext.runJob can be suspended when using a ForkJoinPool (e.g.,scala.concurrent.ExecutionContext.Implicits.global) as it calls Await.ready (introduced by https://github.com/apache/spark/pull/9264).
> So when SparkContext.runJob is suspended, ForkJoinPool will run another task in the same thread, however, the local properties has been polluted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org