You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2015/09/11 02:50:14 UTC

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/8710

    [SPARK-10548] [SQL] Fix concurrent SQL executions

    The query execution ID is currently passed from a thread to its children, which is not the intended behavior. This led to `IllegalArgumentException` when running queries in parallel, e.g.:
    ```
    (1 to 100).par.foreach { _ =>
      sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
    }
    ```
    The cause is `SparkContext`'s local properties are inherited by default. This patch adds a way to exclude keys we don't want to be inherited, and makes SQL go through that code path.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark concurrent-sql-executions

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8710.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8710
    
----
commit 8ceae42fbb5c08a1e5d801b1bc4beceab9f03142
Author: Andrew Or <an...@databricks.com>
Date:   2015-09-10T23:37:55Z

    Exclude certain local properties from being inherited
    
    such as, cough cough, the SQL execution ID. This was a problem
    because scala's parallel collections spawns threads as children
    of the existing threads, causing the execution ID to be inherited
    when it shouldn't be.

commit 3ec715c4e5af5fa8d5e58c7aa93d01cd09970ae7
Author: Andrew Or <an...@databricks.com>
Date:   2015-09-11T00:45:30Z

    Fix remove from Properties + add tests
    
    Because java.util.Properties' remove method takes in an Any
    instead of a String, there were some issues with matching the
    key's hashCode, so removing was not successful in unit tests.
    
    Instead, this commit fixes it by manually filtering out the keys
    and adding them to the child thread's properties.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-220408309
  
    Yes, unfortunately that is only available in the upcoming 2.0 so you will have to upgrade to fix the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39235399
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class SQLExecutionSuite extends SharedSQLContext {
    +  import testImplicits._
    +
    +  test("query execution IDs are not inherited across threads") {
    +    sparkContext.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, "123")
    +    sparkContext.setLocalProperty("do-inherit-me", "some-value")
    +    var throwable: Option[Throwable] = None
    +    val thread = new Thread {
    +      override def run(): Unit = {
    +        try {
    +          assert(sparkContext.getLocalProperty("do-inherit-me") === "some-value")
    +          assert(sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) === null)
    +        } catch {
    +          case t: Throwable =>
    +            throwable = Some(t)
    +        }
    +      }
    +    }
    +    thread.start()
    +    thread.join()
    +    throwable.foreach { t => throw t }
    +  }
    +
    +  // This is the end-to-end version of the previous test.
    +  test("parallel query execution (SPARK-10548)") {
    +    (1 to 5).foreach { i =>
    +      // Scala's parallel collections spawns new threads as children of the existing threads.
    +      // We need to run this multiple times to ensure new threads are spawned. Without the fix
    +      // for SPARK-10548, this usually fails on the second try.
    +      val df = sparkContext.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b")
    +      (1 to 10).par.foreach { _ => df.count() }
    --- End diff --
    
    Ah, I think we have fixed that, and if not I would also consider that a bug :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139424590
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139910807
  
      [Test build #42379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42379/console) for   PR 8710 at commit [`35bb6f0`](https://github.com/apache/spark/commit/35bb6f08befdab26edf5535f27f0b6a405f142f2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39296051
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,27 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    --- End diff --
    
    I agree, I was actually going to do it in a separate patch. Incidentally @tdas @JoshRosen and I just talked about this last night and we all agreed to make it do a clone instead so the semantics are simpler.
    
    However, my one concern is that doing so will change semantics for non-SQL users in 1.5.1, so my proposal is the following: I will make the changes in this patch and merge this patch ONLY into master. Then I'll create a new patch for branch 1.5 that will have the current changes (the ones where we don't clone except for SQL). I think that's the safest way forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139891837
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8710


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39234870
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class SQLExecutionSuite extends SharedSQLContext {
    +  import testImplicits._
    +
    +  test("query execution IDs are not inherited across threads") {
    +    sparkContext.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, "123")
    +    sparkContext.setLocalProperty("do-inherit-me", "some-value")
    +    var throwable: Option[Throwable] = None
    +    val thread = new Thread {
    +      override def run(): Unit = {
    +        try {
    +          assert(sparkContext.getLocalProperty("do-inherit-me") === "some-value")
    +          assert(sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) === null)
    +        } catch {
    +          case t: Throwable =>
    +            throwable = Some(t)
    +        }
    +      }
    +    }
    +    thread.start()
    +    thread.join()
    +    throwable.foreach { t => throw t }
    +  }
    +
    +  // This is the end-to-end version of the previous test.
    +  test("parallel query execution (SPARK-10548)") {
    +    (1 to 5).foreach { i =>
    +      // Scala's parallel collections spawns new threads as children of the existing threads.
    +      // We need to run this multiple times to ensure new threads are spawned. Without the fix
    +      // for SPARK-10548, this usually fails on the second try.
    +      val df = sparkContext.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b")
    +      (1 to 10).par.foreach { _ => df.count() }
    --- End diff --
    
    What makes you think that?  SparkContext/SQLContext/DataFrames should be threadsafe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139703821
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42361/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139910836
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139607592
  
    > Ah, I see. So the issue is:
    >
    > Thread A creates a new Thread B
    > Thread A starts to run a query (set the execution id property)
    > Thread A is running a query
    > Thread B sees the execution id in Thread A's properties set by step 2, then it will throw an exception.
    
    correct!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139892400
  
      [Test build #42379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42379/consoleFull) for   PR 8710 at commit [`35bb6f0`](https://github.com/apache/spark/commit/35bb6f08befdab26edf5535f27f0b6a405f142f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139848194
  
    @zsxwing I just noticed a potential source of confusion. If I understand correctly your view is that we should just clone the properties instead of having the `nonInheritedLocalProperties`. However, just cloning the properties won't fix SPARK-10548, because the issue is that the execution ID is passed directly to the child thread, NOT that it is mutated after the child thread is spawned.
    
    Does that make sense? Please let me know if I'm missing something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-220396923
  
    @ljwagerfield it should be fixed in https://github.com/apache/spark/pull/11586


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139445058
  
      [Test build #42308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42308/console) for   PR 8710 at commit [`d48c114`](https://github.com/apache/spark/commit/d48c11400b11e58d6ee3a4f5d9c330dd1736f25c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ExecutorLostFailure(execId: String, isNormalExit: Boolean = false)`
      * `class ExecutorLossReason(val message: String) extends Serializable `
      * `case class ExecutorExited(exitCode: Int, isNormalExit: Boolean, reason: String)`
      * `  case class RemoveExecutor(executorId: String, reason: ExecutorLossReason)`
      * `  case class GetExecutorLossReason(executorId: String) extends CoarseGrainedClusterMessage`
      * `case class ConvertToSafeNode(conf: SQLConf, child: LocalNode) extends UnaryLocalNode(conf) `
      * `case class ConvertToUnsafeNode(conf: SQLConf, child: LocalNode) extends UnaryLocalNode(conf) `
      * `case class FilterNode(conf: SQLConf, condition: Expression, child: LocalNode)`
      * `case class HashJoinNode(`
      * `case class LimitNode(conf: SQLConf, limit: Int, child: LocalNode) extends UnaryLocalNode(conf) `
      * `abstract class LocalNode(conf: SQLConf) extends TreeNode[LocalNode] with Logging `
      * `abstract class LeafLocalNode(conf: SQLConf) extends LocalNode(conf) `
      * `abstract class UnaryLocalNode(conf: SQLConf) extends LocalNode(conf) `
      * `abstract class BinaryLocalNode(conf: SQLConf) extends LocalNode(conf) `
      * `case class ProjectNode(conf: SQLConf, projectList: Seq[NamedExpression], child: LocalNode)`
      * `case class SeqScanNode(conf: SQLConf, output: Seq[Attribute], data: Seq[InternalRow])`
      * `case class UnionNode(conf: SQLConf, children: Seq[LocalNode]) extends LocalNode(conf) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140233125
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39345154
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,24 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    +    override protected def childValue(parent: Properties): Properties = {
    +      // Note: make a clone such that changes in the parent properties aren't reflected in
    +      // the those of the children threads, which has confusing semantics (SPARK-10564).
    +      val p = new Properties
    +      val filtered = parent.asScala.filter { case (k, _) =>
    +        !nonInheritedLocalProperties.contains(k)
    +      }
    +      p.putAll(filtered.asJava)
    +      p
    +    }
         override protected def initialValue(): Properties = new Properties()
       }
     
    +  /**
    +   * Keys of local properties that should not be inherited by children threads.
    +   */
    +  private[spark] val nonInheritedLocalProperties: HashSet[String] = new HashSet[String]
    --- End diff --
    
    I made this private in the latest commit and added a setter method for it. Does this address your concern?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140235789
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42447/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140242351
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42452/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140241862
  
      [Test build #1752 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1752/consoleFull) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139613347
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140233081
  
      [Test build #42441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42441/console) for   PR 8710 at commit [`b4bcc3c`](https://github.com/apache/spark/commit/b4bcc3c76b0cea33cbc38997f160f1e128d66a45).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39345043
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,24 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    +    override protected def childValue(parent: Properties): Properties = {
    +      // Note: make a clone such that changes in the parent properties aren't reflected in
    +      // the those of the children threads, which has confusing semantics (SPARK-10564).
    +      val p = new Properties
    +      val filtered = parent.asScala.filter { case (k, _) =>
    +        !nonInheritedLocalProperties.contains(k)
    +      }
    +      p.putAll(filtered.asJava)
    +      p
    +    }
         override protected def initialValue(): Properties = new Properties()
       }
     
    +  /**
    +   * Keys of local properties that should not be inherited by children threads.
    +   */
    +  private[spark] val nonInheritedLocalProperties: HashSet[String] = new HashSet[String]
    --- End diff --
    
    the whole point of SPARK-10548 is to avoid inheriting the SQL execution ID. How can we avoid that with just cloning?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by ljwagerfield <gi...@git.apache.org>.

Github user ljwagerfield commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-220395710
  
    We're seeing this exception too. We're also running our operations in serial (at least on the surface it seems as if we are). If we execute a `df.save` operation in a `Future` and wait for that `Future` to complete, then all `df.save` operations we perform within subsequent `Future`s will fail.
    
    This specifically happens when we load Avro files from S3 and save them as Parquet back to S3. The loading works fine but the saving fails on 2nd attempt. Furthermore, if we simply generate a `DataFrame` from an in-memory list (so we're not loading from S3 - only saving to S3) then the error goes away... I'm not sure how helpful this is.
    
    We're using Java 1.8, Scala 2.10.5, with our Spark codebase at commit https://github.com/apache/spark/commit/15de51c238a7340fa81cb0b80d029a05d97bfc5c.
    
    Our exact reproduction steps are:
    
    **1. Run a Spark Shell with appropriate dependencies**
    ```
    ./spark-shell --packages com.amazonaws:aws-java-sdk:1.10.75,org.apache.hadoop:hadoop-aws:2.7.2,com.databricks:spark-avro_2.10:2.0.1
    ```
    
    **2. Run the following setup code within the shell**
    ```
    import scala.concurrent.{ExecutionContext, Future}
    import scala.concurrent.ExecutionContext.Implicits.global
    import sqlContext.implicits._
    import org.apache.spark.sql._
    implicit val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    
    val hadoopConf = sc.hadoopConfiguration;
    hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
    hadoopConf.set("fs.s3.awsAccessKeyId", "...")
    hadoopConf.set("fs.s3.awsSecretAccessKey", "...")
    
    val df = sqlContext.read.format("com.databricks.spark.avro").load("s3://bucket/input.avro")
    
    def doWrite() {
        df.write.format("org.apache.spark.sql.parquet").mode(SaveMode.Overwrite).save("s3://bucket/output")
    }
    ```
    
    **3. Run this _twice_ - but leaving time for the first execution to finish (so the operations are serialised)**
    ```
    Future { doWrite(); println("SUCCEEDED") }.recover { case e: Throwable => println("FAILED: " + e.getMessage()); e.printStackTrace() }
    ```
    
    **Result:**
    ```
    spark.sql.execution.id is already set
    java.lang.IllegalArgumentException: spark.sql.execution.id is already set
    	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
    	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
    	at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
    	at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
    	at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
    	at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
    	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
    	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
    	at $line38.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.doWrite(<console>:41)
    	at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply$mcV$sp(<console>:43)
    	at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:43)
    	at $line40.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:43)
    	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    	at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
    	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140263014
  
      [Test build #1751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1751/console) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139421019
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140235733
  
      [Test build #42447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42447/console) for   PR 8710 at commit [`fce3819`](https://github.com/apache/spark/commit/fce38190f1252ddc8ac20777d121a290a0155234).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`
      * `class MultilayerPerceptronClassificationModel(JavaModel):`
      * `class MinMaxScaler(JavaEstimator, HasInputCol, HasOutputCol):`
      * `class MinMaxScalerModel(JavaModel):`
      * `        ("thresholds", "Thresholds in multi-class classification to adjust the probability of " +`
      * `class HasElasticNetParam(Params):`
      * `class HasFitIntercept(Params):`
      * `class HasStandardization(Params):`
      * `class HasThresholds(Params):`
      * `    thresholds = Param(Params._dummy(), "thresholds", "Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.")`
      * `        self.thresholds = Param(self, "thresholds", "Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.")`
      * `case class Stddev(child: Expression) extends StddevAgg(child) `
      * `case class StddevPop(child: Expression) extends StddevAgg(child) `
      * `case class StddevSamp(child: Expression) extends StddevAgg(child) `
      * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate `
      * `abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1 `
      * `case class Stddev(child: Expression) extends StddevAgg1(child) `
      * `case class StddevPop(child: Expression) extends StddevAgg1(child) `
      * `case class StddevSamp(child: Expression) extends StddevAgg1(child) `
      * `case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1 `
      * `case class ComputePartialStdFunction (`
      * `case class MergePartialStd(`
      * `case class MergePartialStdFunction(`
      * `case class StddevFunction(`
      * `case class IntersectNode(conf: SQLConf, left: LocalNode, right: LocalNode)`
      * `case class SampleNode(`
      * `case class TakeOrderedAndProjectNode(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39235319
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class SQLExecutionSuite extends SharedSQLContext {
    +  import testImplicits._
    +
    +  test("query execution IDs are not inherited across threads") {
    +    sparkContext.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, "123")
    +    sparkContext.setLocalProperty("do-inherit-me", "some-value")
    +    var throwable: Option[Throwable] = None
    +    val thread = new Thread {
    +      override def run(): Unit = {
    +        try {
    +          assert(sparkContext.getLocalProperty("do-inherit-me") === "some-value")
    +          assert(sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) === null)
    +        } catch {
    +          case t: Throwable =>
    +            throwable = Some(t)
    +        }
    +      }
    +    }
    +    thread.start()
    +    thread.join()
    +    throwable.foreach { t => throw t }
    +  }
    +
    +  // This is the end-to-end version of the previous test.
    +  test("parallel query execution (SPARK-10548)") {
    +    (1 to 5).foreach { i =>
    +      // Scala's parallel collections spawns new threads as children of the existing threads.
    +      // We need to run this multiple times to ensure new threads are spawned. Without the fix
    +      // for SPARK-10548, this usually fails on the second try.
    +      val df = sparkContext.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b")
    +      (1 to 10).par.foreach { _ => df.count() }
    --- End diff --
    
    I saw your reply here: http://mail-archives.us.apache.org/mod_mbox/spark-user/201504.mbox/%3CCAAswR-5m2EGZZU0_Lns2=tT9k+ydR4AJ9c1TNb7Cs31nbDzsEw@mail.gmail.com%3E


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140202638
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139422013
  
      [Test build #42307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42307/console) for   PR 8710 at commit [`3ec715c`](https://github.com/apache/spark/commit/3ec715c4e5af5fa8d5e58c7aa93d01cd09970ae7).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139424528
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140210442
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139437487
  
    Ah, I see. So the issue is:
    
    1. Thread A creates a new Thread B
    2. Thread A starts to run a query
    3. Thread A is running a query
    4. Thread B saw the execution id in Thread A's properties, then it will throw an exception.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140210426
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139683670
  
      [Test build #42357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42357/console) for   PR 8710 at commit [`5297f79`](https://github.com/apache/spark/commit/5297f795e4e0ad5cf3bb6ff520312c2b82684f86).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139436365
  
    > The cause is SparkContext's local properties are inherited by default.
    
    I just realized this is not the cause. There is no new thread between `sc.setLocalProperty(EXECUTION_ID_KEY, executionId.toString)` and `sc.setLocalProperty(EXECUTION_ID_KEY, null)` in your example. Actually, I found `sc.setLocalProperty(EXECUTION_ID_KEY, null)` sometimes cannot clear `EXECUTION_ID_KEY`. I will investigate it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140233126
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42441/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by nicerobot <gi...@git.apache.org>.

Github user nicerobot commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-193436138
  
    We are still experiencing this. See [SPARK-10548](https://issues.apache.org/jira/browse/SPARK-10548).
    
    I've verified that we are indeed using a version of Spark with [SPARK-10548](https://issues.apache.org/jira/browse/SPARK-10548) implementation yet the issue is still reproducible. In fact, if in the test case, you:
    
        println(null != sc.getLocalProperties("spark.sql.execution.id"))
        df.count()
    
    you can anticipate when your thread will throw the exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139683720
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42357/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139657519
  
      [Test build #42342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42342/console) for   PR 8710 at commit [`5297f79`](https://github.com/apache/spark/commit/5297f795e4e0ad5cf3bb6ff520312c2b82684f86).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`
      * `class MultilayerPerceptronClassificationModel(JavaModel):`
      * `class MinMaxScaler(JavaEstimator, HasInputCol, HasOutputCol):`
      * `class MinMaxScalerModel(JavaModel):`
      * `        ("thresholds", "Thresholds in multi-class classification to adjust the probability of " +`
      * `class HasElasticNetParam(Params):`
      * `class HasFitIntercept(Params):`
      * `class HasStandardization(Params):`
      * `class HasThresholds(Params):`
      * `    thresholds = Param(Params._dummy(), "thresholds", "Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.")`
      * `        self.thresholds = Param(self, "thresholds", "Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139849953
  
    > the execution ID is passed directly to the child thread
    
    I believe that this is not the cause of `SPARK-10548`. The cause of `SPARK-10548` is the child thread can see the execution id that is set by the parent thread after the child thread is spawned.
    
    If there is no execution id in the local properties when creating a child thread and we change it to clone the properties, then the child thread won't see the execution id that is set by the parent thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139688084
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by d-ee <gi...@git.apache.org>.

Github user d-ee commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-210439407
  
    This still seems to be around.
    We're using Spark 1.5.2.
    
    java.lang.IllegalArgumentException: "spark.sql.execution.id is already set"
    
    Trace:
    org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
    org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
    org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)
    org.apache.spark.sql.DataFrame.head(DataFrame.scala:1314)
    org.apache.spark.sql.DataFrame.head(DataFrame.scala:1321)
    org.apache.spark.sql.DataFrame.first(DataFrame.scala:1328)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39337844
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,24 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    +    override protected def childValue(parent: Properties): Properties = {
    +      // Note: make a clone such that changes in the parent properties aren't reflected in
    +      // the those of the children threads, which has confusing semantics (SPARK-10564).
    +      val p = new Properties
    +      val filtered = parent.asScala.filter { case (k, _) =>
    +        !nonInheritedLocalProperties.contains(k)
    +      }
    +      p.putAll(filtered.asJava)
    +      p
    +    }
         override protected def initialValue(): Properties = new Properties()
       }
     
    +  /**
    +   * Keys of local properties that should not be inherited by children threads.
    +   */
    +  private[spark] val nonInheritedLocalProperties: HashSet[String] = new HashSet[String]
    --- End diff --
    
    Exposing a mutable `HashSet` in the thread-safe `SparkContext` looks dangerous. Actually, I suggest we don't need to add `nonInheritedLocalProperties` in the master branch. How about just cloning the parent properties without adding the `nonInheritedLocalProperties` logic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140210223
  
    @zsxwing Alright I have updated it. Please have another look. I also updated the one for branch-1.5 (#8721), which has all of the changes here except the new behavior is triggered only in SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139688072
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140210859
  
      [Test build #42447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42447/consoleFull) for   PR 8710 at commit [`fce3819`](https://github.com/apache/spark/commit/fce38190f1252ddc8ac20777d121a290a0155234).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139656159
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139657637
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139422018
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42307/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139425790
  
      [Test build #42308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42308/consoleFull) for   PR 8710 at commit [`d48c114`](https://github.com/apache/spark/commit/d48c11400b11e58d6ee3a4f5d9c330dd1736f25c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140235788
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140241253
  
      [Test build #1751 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1751/consoleFull) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140242350
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-210541878
  
    @d-ee do you have a reproducer? Let's move the discussion to JIRA instead of here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140262892
  
      [Test build #1753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1753/console) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139421798
  
      [Test build #42307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42307/consoleFull) for   PR 8710 at commit [`3ec715c`](https://github.com/apache/spark/commit/3ec715c4e5af5fa8d5e58c7aa93d01cd09970ae7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139614917
  
      [Test build #42342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42342/consoleFull) for   PR 8710 at commit [`5297f79`](https://github.com/apache/spark/commit/5297f795e4e0ad5cf3bb6ff520312c2b82684f86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140221278
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140221863
  
      [Test build #42452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42452/consoleFull) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139656140
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139689485
  
      [Test build #42361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42361/consoleFull) for   PR 8710 at commit [`3c00cc6`](https://github.com/apache/spark/commit/3c00cc64938d365abb783bb4041f6fdeff48d83f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39241760
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,27 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    --- End diff --
    
    Hi @zsxwing , reasons to use `InheritableThreadLocal` can be seen here (https://github.com/mesos/spark/pull/937). mainly it is used for Spark Streaming with FIFO scheduling strategy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139848070
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139657638
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42342/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139656574
  
      [Test build #42357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42357/consoleFull) for   PR 8710 at commit [`5297f79`](https://github.com/apache/spark/commit/5297f795e4e0ad5cf3bb6ff520312c2b82684f86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140221309
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39297788
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,27 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    --- End diff --
    
    OK, I have updated this in the latest commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139683718
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139420986
  
    @davies @zsxwing FYI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139655823
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139445094
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139421027
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140283134
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140241529
  
      [Test build #1753 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1753/consoleFull) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39238759
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,27 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    --- End diff --
    
    @andrewor14 I'm thinking maybe we should not use `new Properties(parent)` here. Instead, always copy the parent's Properties to the child's Properties. Do you think if the child thread needs to see the further changes to the parent thread's Properties after creating?
    
    This is really confusing when using `Executor` like `ForkJoinPool`, in which thread A creates thread B but thread B is not a child of thread A. But thread B still can see the changes in thread A.
    
    /cc @jerryshao since you added this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139910837
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42379/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139613452
  
    As of the latest commit this patch should only be merged into master. I consider the fix for SPARK-10563 a little too risky for 1.5.1, so I will open a separate patch for branch-1.5 without that fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139445096
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42308/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39233675
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class SQLExecutionSuite extends SharedSQLContext {
    +  import testImplicits._
    +
    +  test("query execution IDs are not inherited across threads") {
    +    sparkContext.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, "123")
    +    sparkContext.setLocalProperty("do-inherit-me", "some-value")
    +    var throwable: Option[Throwable] = None
    +    val thread = new Thread {
    +      override def run(): Unit = {
    +        try {
    +          assert(sparkContext.getLocalProperty("do-inherit-me") === "some-value")
    +          assert(sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) === null)
    +        } catch {
    +          case t: Throwable =>
    +            throwable = Some(t)
    +        }
    +      }
    +    }
    +    thread.start()
    +    thread.join()
    +    throwable.foreach { t => throw t }
    +  }
    +
    +  // This is the end-to-end version of the previous test.
    +  test("parallel query execution (SPARK-10548)") {
    +    (1 to 5).foreach { i =>
    +      // Scala's parallel collections spawns new threads as children of the existing threads.
    +      // We need to run this multiple times to ensure new threads are spawned. Without the fix
    +      // for SPARK-10548, this usually fails on the second try.
    +      val df = sparkContext.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b")
    +      (1 to 10).par.foreach { _ => df.count() }
    --- End diff --
    
    I thought DataFrame is not thread-safe and should not be used like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139703819
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39242886
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,27 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    --- End diff --
    
    @jerryshao Thanks :)
    
    @andrewor14 how about just copying the parent properties rather than adding `nonInheritedLocalProperties`? It looks simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139422017
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139703794
  
      [Test build #42361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42361/console) for   PR 8710 at commit [`3c00cc6`](https://github.com/apache/spark/commit/3c00cc64938d365abb783bb4041f6fdeff48d83f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`
      * `class MultilayerPerceptronClassificationModel(JavaModel):`
      * `class MinMaxScaler(JavaEstimator, HasInputCol, HasOutputCol):`
      * `class MinMaxScalerModel(JavaModel):`
      * `        ("thresholds", "Thresholds in multi-class classification to adjust the probability of " +`
      * `class HasElasticNetParam(Params):`
      * `class HasFitIntercept(Params):`
      * `class HasStandardization(Params):`
      * `class HasThresholds(Params):`
      * `    thresholds = Param(Params._dummy(), "thresholds", "Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.")`
      * `        self.thresholds = Param(self, "thresholds", "Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.")`
      * `case class IntersectNode(conf: SQLConf, left: LocalNode, right: LocalNode)`
      * `case class SampleNode(`
      * `case class TakeOrderedAndProjectNode(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140203217
  
    > If there is no execution id in the local properties when creating a child thread and we change it to clone the properties, then the child thread won't see the execution id that is set by the parent thread.
    
    Ah, I see. You're saying the child thread must be created *before* the query is run, not while it's running. That makes sense. Previously I accounted for the case where the child thread is created in the middle of the query, which I suppose is not possible. I have updated the code based on your suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39242707
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,27 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    --- End diff --
    
    Yes I agree with you that copying is more reasonable, for now I cannot image any scenario which requires to keep track of parent's properties, I think it is OK for me to change it, we can always fix this if there's any special scenario.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140204367
  
      [Test build #42441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42441/consoleFull) for   PR 8710 at commit [`b4bcc3c`](https://github.com/apache/spark/commit/b4bcc3c76b0cea33cbc38997f160f1e128d66a45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140242310
  
      [Test build #42452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42452/console) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-139613307
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SQL] Fix concurrent SQL executi...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8710#discussion_r39242391
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,10 +348,27 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
     
       // Thread Local variable that can be used by users to pass information down the stack
       private val localProperties = new InheritableThreadLocal[Properties] {
    -    override protected def childValue(parent: Properties): Properties = new Properties(parent)
    --- End diff --
    
    I see. However, `new Properties(parent)` keeps a reference to parent rather than copying them. So If we make any change to the parent thread's properties after creating the child thread, the child thread will see them.
    
    Is it necessary that the child thread keeps track of the following updates of the parent thread's properties? I think copying them would be more reasonable.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140267623
  
      [Test build #1752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1752/console) for   PR 8710 at commit [`75a8d90`](https://github.com/apache/spark/commit/75a8d904f9935226df6872d15f99c8344615213c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140582020
  
    Thanks, I'm merging this into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10548] [SPARK-10563] [SQL] Fix concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8710#issuecomment-140202665
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org