You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/05/20 11:21:32 UTC

[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21376

    [SPARK-24250][SQL] support accessing SQLConf inside tasks

    re-submit https://github.com/apache/spark/pull/21299 which breaks build.
    
    A new commit is added to fix the SQLConf problem in `JsonSchemaInference.infer`.
    
    ## What changes were proposed in this pull request?
    
    Previously in #20136 we decided to forbid tasks to access `SQLConf`, because it doesn't work and always give you the default conf value. In #21190 we fixed the check and all the places that violate it.
    
    Currently the pattern of accessing configs at the executor side is: read the configs at the driver side, then access the variables holding the config values in the RDD closure, so that they will be serialized to the executor side. Something like
    ```
    val someConf = conf.getXXX
    child.execute().mapPartitions {
      if (someConf == ...) ...
      ...
    }
    ```
    
    However, this pattern is hard to apply if the config needs to be propagated via a long call stack. An example is `DataType.sameType`, and see how many changes were made in #21190 .
    
    When it comes to code generation, it's even worse. I tried it locally and we need to change a ton of files to propagate configs to code generators.
    
    This PR proposes to allow tasks to access `SQLConf`. The idea is, we can save all the SQL configs to job properties when an SQL execution is triggered. At executor side we rebuild the `SQLConf` from job properties.
    
    ## How was this patch tested?
    
    a new test suite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark config

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21376.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21376
    
----
commit ba467036fdd2e6efe3ef2be66f378da341c73423
Author: Wenchen Fan <we...@...>
Date:   2018-05-19T10:51:02Z

    support accessing SQLConf at executor side

commit a1519d4aa692adceef1f3878a2ccd1715bf6175a
Author: Wenchen Fan <we...@...>
Date:   2018-05-20T10:33:00Z

    fix json schema inference

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189472746
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -107,7 +107,20 @@ object SQLConf {
        * run tests in parallel. At the time this feature was implemented, this was a no-op since we
        * run unit tests (that does not involve SparkSession) in serial order.
        */
    -  def get: SQLConf = confGetter.get()()
    +  def get: SQLConf = {
    +    if (TaskContext.get != null) {
    +      new ReadOnlySQLConf(TaskContext.get())
    +    } else {
    +      if (Utils.isTesting && SparkContext.getActive.isDefined) {
    +        val schedulerEventLoopThread =
    +          SparkContext.getActive.get.dagScheduler.eventProcessLoop.eventThread
    +        if (schedulerEventLoopThread.getId == Thread.currentThread().getId) {
    +          throw new RuntimeException("Cannot get SQLConf inside scheduler event loop thread.")
    --- End diff --
    
    Not sure if we need to add a short comment for the reason. WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90862/testReport)** for PR 21376 at commit [`d25e846`](https://github.com/apache/spark/commit/d25e8467aac66cbe6f7a2b2b038d2450a37e19c9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90864/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3395/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3393/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90864/testReport)** for PR 21376 at commit [`d25e846`](https://github.com/apache/spark/commit/d25e8467aac66cbe6f7a2b2b038d2450a37e19c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189458452
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala ---
    @@ -0,0 +1,66 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.internal
    +
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.test.SQLTestUtils
    +
    +class ExecutorSideSQLConfSuite extends SparkFunSuite with SQLTestUtils {
    +  import testImplicits._
    +
    +  protected var spark: SparkSession = null
    +
    +  // Create a new [[SparkSession]] running in local-cluster mode.
    +  override def beforeAll(): Unit = {
    +    super.beforeAll()
    +    spark = SparkSession.builder()
    +      .master("local-cluster[2,1,1024]")
    +      .appName("testing")
    +      .getOrCreate()
    +  }
    +
    +  override def afterAll(): Unit = {
    +    spark.stop()
    +    spark = null
    +  }
    +
    +  test("ReadonlySQLConf is correctly created at the executor side") {
    --- End diff --
    
    nit: `ReadOnlySQLConf `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90856/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90886/testReport)** for PR 21376 at commit [`f7a6299`](https://github.com/apache/spark/commit/f7a629906f1ba15b663eb8ab1c6b49daa48c34d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21376


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3411/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189457296
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala ---
    @@ -66,9 +67,13 @@ private[sql] object JsonInferSchema {
                     s"Parse Mode: ${FailFastMode.name}.", e)
               }
             }
    -      }
    -    }.fold(StructType(Nil))(
    -      compatibleRootType(columnNameOfCorruptRecord, parseMode))
    +      }.reduceOption(typeMerger).toIterator
    +    }
    +
    +    // Here we get RDD local iterator then fold, instead of calling `RDD.fold` directly, because
    +    // `RDD.fold` will run the fold function in DAGScheduler event loop thread, which may not have
    +    // active SparkSession and `SQLConf.get` may point to the wrong configs.
    +    val rootType = mergedTypesFromPartitions.toLocalIterator.fold(StructType(Nil))(typeMerger)
    --- End diff --
    
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189457193
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala ---
    @@ -66,9 +67,13 @@ private[sql] object JsonInferSchema {
                     s"Parse Mode: ${FailFastMode.name}.", e)
               }
             }
    -      }
    -    }.fold(StructType(Nil))(
    -      compatibleRootType(columnNameOfCorruptRecord, parseMode))
    +      }.reduceOption(typeMerger).toIterator
    +    }
    +
    +    // Here we get RDD local iterator then fold, instead of calling `RDD.fold` directly, because
    +    // `RDD.fold` will run the fold function in DAGScheduler event loop thread, which may not have
    +    // active SparkSession and `SQLConf.get` may point to the wrong configs.
    +    val rootType = mergedTypesFromPartitions.toLocalIterator.fold(StructType(Nil))(typeMerger)
    --- End diff --
    
    I re-run the `JsonBenmark` and no performance regression is observed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90864/testReport)** for PR 21376 at commit [`d25e846`](https://github.com/apache/spark/commit/d25e8467aac66cbe6f7a2b2b038d2450a37e19c9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90886/testReport)** for PR 21376 at commit [`f7a6299`](https://github.com/apache/spark/commit/f7a629906f1ba15b663eb8ab1c6b49daa48c34d2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3387/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90856/testReport)** for PR 21376 at commit [`a1519d4`](https://github.com/apache/spark/commit/a1519d4aa692adceef1f3878a2ccd1715bf6175a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90862/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    I'm still seeing a lot of build failures which seem to be related to this (accessing a conf in a task in turn accesses the LiveListenerBus).  Is this something new?  Or related to this change?  eg.
    https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4193/testReport/org.apache.spark.sql.execution/UnsafeRowSerializerSuite/toUnsafeRow___test_helper_method/
    
    ```
    sbt.ForkMain$ForkError: java.lang.IllegalStateException: LiveListenerBus is stopped.
    	at org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
    	at org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
    	at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:93)
    	at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120)
    	at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120)
    	at scala.Option.getOrElse(Option.scala:121)
    	at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:120)
    	at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:119)
    	at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
    	at org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
    	at org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
    	at org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95)
    	at org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95)
    	at scala.Option.map(Option.scala:146)
    	at org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:95)
    	at org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:94)
    	at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:126)
    	at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:54)
    	at org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:157)
    	at org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:150)
    	at org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$unsafeRowConverter(UnsafeRowSerializerSuite.scala:54)
    	at org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$toUnsafeRow(UnsafeRowSerializerSuite.scala:49)
    	at org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:63)
    	at org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:60)
    ...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    LGTM, thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90886/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189472699
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -107,7 +107,20 @@ object SQLConf {
        * run tests in parallel. At the time this feature was implemented, this was a no-op since we
        * run unit tests (that does not involve SparkSession) in serial order.
        */
    -  def get: SQLConf = confGetter.get()()
    +  def get: SQLConf = {
    +    if (TaskContext.get != null) {
    +      new ReadOnlySQLConf(TaskContext.get())
    +    } else {
    +      if (Utils.isTesting && SparkContext.getActive.isDefined) {
    +        val schedulerEventLoopThread =
    +          SparkContext.getActive.get.dagScheduler.eventProcessLoop.eventThread
    +        if (schedulerEventLoopThread.getId == Thread.currentThread().getId) {
    +          throw new RuntimeException("Cannot get SQLConf inside scheduler event loop thread.")
    +        }
    --- End diff --
    
    nit: Should we also add a note in PR description that we prevent to access SQLConf in scheduler event loop?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189459239
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala ---
    @@ -66,9 +67,13 @@ private[sql] object JsonInferSchema {
                     s"Parse Mode: ${FailFastMode.name}.", e)
               }
             }
    -      }
    -    }.fold(StructType(Nil))(
    -      compatibleRootType(columnNameOfCorruptRecord, parseMode))
    +      }.reduceOption(typeMerger).toIterator
    +    }
    +
    +    // Here we get RDD local iterator then fold, instead of calling `RDD.fold` directly, because
    +    // `RDD.fold` will run the fold function in DAGScheduler event loop thread, which may not have
    +    // active SparkSession and `SQLConf.get` may point to the wrong configs.
    +    val rootType = mergedTypesFromPartitions.toLocalIterator.fold(StructType(Nil))(typeMerger)
    --- End diff --
    
    can the same problem happen also in other places? this seems to be quite a tricky issue which may happen in general. Can we avoid it somehow?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90862/testReport)** for PR 21376 at commit [`d25e846`](https://github.com/apache/spark/commit/d25e8467aac66cbe6f7a2b2b038d2450a37e19c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    cc @viirya @mgaido91 @dongjoon-hyun @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189465135
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -107,7 +106,13 @@ object SQLConf {
        * run tests in parallel. At the time this feature was implemented, this was a no-op since we
        * run unit tests (that does not involve SparkSession) in serial order.
    --- End diff --
    
    Need to update the above comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189512968
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
    @@ -90,13 +92,37 @@ object SQLExecution {
        * thread from the original one, this method can be used to connect the Spark jobs in this action
        * with the known executionId, e.g., `BroadcastExchangeExec.relationFuture`.
        */
    -  def withExecutionId[T](sc: SparkContext, executionId: String)(body: => T): T = {
    +  def withExecutionId[T](sparkSession: SparkSession, executionId: String)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
         val oldExecutionId = sc.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
    +    withSQLConfPropagated(sparkSession) {
    +      try {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, executionId)
    +        body
    +      } finally {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
    +      }
    +    }
    +  }
    +
    +  def withSQLConfPropagated[T](sparkSession: SparkSession)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
    +    // Set all the specified SQL configs to local properties, so that they can be available at
    +    // the executor side.
    +    val allConfigs = sparkSession.sessionState.conf.getAllConfs
    +    val originalLocalProps = allConfigs.collect {
    +      case (key, value) if key.startsWith("spark") =>
    +        val originalValue = sc.getLocalProperty(key)
    +        sc.setLocalProperty(key, value)
    +        (key, originalValue)
    +    }
    +
         try {
    -      sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, executionId)
           body
         } finally {
    -      sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
    +      for ((key, value) <- originalLocalProps) {
    --- End diff --
    
    `originalLocalProps` already contains entries with null value.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189465227
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
    @@ -90,13 +92,37 @@ object SQLExecution {
        * thread from the original one, this method can be used to connect the Spark jobs in this action
        * with the known executionId, e.g., `BroadcastExchangeExec.relationFuture`.
        */
    -  def withExecutionId[T](sc: SparkContext, executionId: String)(body: => T): T = {
    +  def withExecutionId[T](sparkSession: SparkSession, executionId: String)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
         val oldExecutionId = sc.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
    +    withSQLConfPropagated(sparkSession) {
    +      try {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, executionId)
    +        body
    +      } finally {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
    +      }
    +    }
    +  }
    +
    +  def withSQLConfPropagated[T](sparkSession: SparkSession)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
    +    // Set all the specified SQL configs to local properties, so that they can be available at
    +    // the executor side.
    +    val allConfigs = sparkSession.sessionState.conf.getAllConfs
    +    val originalLocalProps = allConfigs.collect {
    +      case (key, value) if key.startsWith("spark") =>
    +        val originalValue = sc.getLocalProperty(key)
    +        sc.setLocalProperty(key, value)
    --- End diff --
    
    Is that possible the same key already exists?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189513129
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
    @@ -90,13 +92,37 @@ object SQLExecution {
        * thread from the original one, this method can be used to connect the Spark jobs in this action
        * with the known executionId, e.g., `BroadcastExchangeExec.relationFuture`.
        */
    -  def withExecutionId[T](sc: SparkContext, executionId: String)(body: => T): T = {
    +  def withExecutionId[T](sparkSession: SparkSession, executionId: String)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
         val oldExecutionId = sc.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
    +    withSQLConfPropagated(sparkSession) {
    +      try {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, executionId)
    +        body
    +      } finally {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
    +      }
    +    }
    +  }
    +
    +  def withSQLConfPropagated[T](sparkSession: SparkSession)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
    +    // Set all the specified SQL configs to local properties, so that they can be available at
    +    // the executor side.
    +    val allConfigs = sparkSession.sessionState.conf.getAllConfs
    +    val originalLocalProps = allConfigs.collect {
    +      case (key, value) if key.startsWith("spark") =>
    +        val originalValue = sc.getLocalProperty(key)
    +        sc.setLocalProperty(key, value)
    --- End diff --
    
    If users happen to set the same key in the local properties and want to access them in tasks, we will break it. It's very unlikely to happen and I'd say SQL config keys should be reserved for internal usage only.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189465557
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -107,7 +107,20 @@ object SQLConf {
        * run tests in parallel. At the time this feature was implemented, this was a no-op since we
        * run unit tests (that does not involve SparkSession) in serial order.
        */
    -  def get: SQLConf = confGetter.get()()
    +  def get: SQLConf = {
    +    if (TaskContext.get != null) {
    +      new ReadOnlySQLConf(TaskContext.get())
    +    } else {
    +      if (Utils.isTesting && SparkContext.getActive.isDefined) {
    --- End diff --
    
    good check!!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21376: [SPARK-24250][SQL] support accessing SQLConf insi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189465307
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
    @@ -90,13 +92,37 @@ object SQLExecution {
        * thread from the original one, this method can be used to connect the Spark jobs in this action
        * with the known executionId, e.g., `BroadcastExchangeExec.relationFuture`.
        */
    -  def withExecutionId[T](sc: SparkContext, executionId: String)(body: => T): T = {
    +  def withExecutionId[T](sparkSession: SparkSession, executionId: String)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
         val oldExecutionId = sc.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
    +    withSQLConfPropagated(sparkSession) {
    +      try {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, executionId)
    +        body
    +      } finally {
    +        sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
    +      }
    +    }
    +  }
    +
    +  def withSQLConfPropagated[T](sparkSession: SparkSession)(body: => T): T = {
    +    val sc = sparkSession.sparkContext
    +    // Set all the specified SQL configs to local properties, so that they can be available at
    +    // the executor side.
    +    val allConfigs = sparkSession.sessionState.conf.getAllConfs
    +    val originalLocalProps = allConfigs.collect {
    +      case (key, value) if key.startsWith("spark") =>
    +        val originalValue = sc.getLocalProperty(key)
    +        sc.setLocalProperty(key, value)
    +        (key, originalValue)
    +    }
    +
         try {
    -      sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, executionId)
           body
         } finally {
    -      sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
    +      for ((key, value) <- originalLocalProps) {
    --- End diff --
    
    before we set the original one, should we reset the new key with null values?
    
    ```Scala
      def setLocalProperty(key: String, value: String) {
        if (value == null) {
          localProperties.get.remove(key)
        } else {
          localProperties.get.setProperty(key, value)
        }
      }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21376
  
    **[Test build #90856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90856/testReport)** for PR 21376 at commit [`a1519d4`](https://github.com/apache/spark/commit/a1519d4aa692adceef1f3878a2ccd1715bf6175a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org