You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "nicerobot (JIRA)" <ji...@apache.org> on 2016/03/06 19:47:40 UTC

[jira] [Comment Edited] (SPARK-10548) Concurrent execution in SQL does not work

    [ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182265#comment-15182265 ] 

nicerobot edited comment on SPARK-10548 at 3/6/16 6:47 PM:
-----------------------------------------------------------

I might be misunderstanding the solution but i'm not clear how the implementation addresses the problem. The issue appears to be that the ThreadLocal property {{"spark.sql.execution.id"}} is not handled properly (cleaned up) in thread-pooled environments. The implemented solution is essentially in {{SparkContext}}

{code}
      // Note: make a clone such that changes in the parent properties aren't reflected in
      // the those of the children threads, which has confusing semantics (SPARK-10563).
      SerializationUtils.clone(parent).asInstanceOf[Properties]
{code}

But from what I can tell, the problem isn't related to parent/child threads. It's that {{localProperties}}' {{"spark.sql.execution.id"}} key is retained after a thread completes. When that thread is returned to the pool and reused by another execution, the execution id will remain because it's part of the SparkContext's {{localProperties}}. It seems like a {{"spark.sql.execution.id"}} should be local to an execution context instance (a {{QueryExecution}}?), not global to a thread nor specifically a property of a SQLContext/SparkContext.


was (Author: nicerobot):
I might be misunderstanding the solution but i'm not clear how the implementation addresses the problem. The issue appears to be that the ThreadLocal property "spark.sql.execution.id" is not handled properly in thread-pooled environments. The implemented solution is essentially in {{SparkContext}}

{code}
      // Note: make a clone such that changes in the parent properties aren't reflected in
      // the those of the children threads, which has confusing semantics (SPARK-10563).
      SerializationUtils.clone(parent).asInstanceOf[Properties]
{code}

But from what I can tell, the problem isn't related to parent/child threads. It's that {{localProperties}}' {{"spark.sql.execution.id"}} key is retained after a thread completes. When that thread is returned to the pool and reused by another execution, the execution id will remain because it's part of the SparkContext's {{localProperties}}. It seems like a {{"spark.sql.execution.id"}} should be local to an execution context instance (a {{QueryExecution}}?), not global to a thread nor specifically a property of a SQLContext/SparkContext.

> Concurrent execution in SQL does not work
> -----------------------------------------
>
>                 Key: SPARK-10548
>                 URL: https://issues.apache.org/jira/browse/SPARK-10548
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>            Priority: Blocker
>             Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
>         at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) 
>         at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
>         at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org