You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/04 02:00:18 UTC

[GitHub] [spark] yaooqinn commented on a change in pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause perf regression

yaooqinn commented on a change in pull request #31460:
URL: https://github.com/apache/spark/pull/31460#discussion_r569889222



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
##########
@@ -220,31 +219,16 @@ object SharedState extends Logging {
   }
 
   /**
-   * Load hive-site.xml into hadoopConf and determine the warehouse path we want to use, based on
-   * the config from both hive and Spark SQL. Finally set the warehouse config value to sparkConf.
+   * Determine the warehouse path by spark conf, hadoop configuration and the initial options from
+   * the very first created SparkSession instance.
    */
-  def loadHiveConfFile(
+  def determineWarehouse(
       sparkConf: SparkConf,
       hadoopConf: Configuration,
       initialConfigs: scala.collection.Map[String, String] = Map.empty)
     : scala.collection.Map[String, String] = {
 
-    def containsInSparkConf(key: String): Boolean = {
-      sparkConf.contains(key) || sparkConf.contains("spark.hadoop." + key) ||
-        (key.startsWith("hive") && sparkConf.contains("spark." + key))
-    }
-
     val hiveWarehouseKey = "hive.metastore.warehouse.dir"
-    val configFile = Utils.getContextOrSparkClassLoader.getResourceAsStream("hive-site.xml")
-    if (configFile != null) {
-      logInfo(s"loading hive config file: $configFile")
-      val hadoopConfTemp = new Configuration()
-      hadoopConfTemp.clear()
-      hadoopConfTemp.addResource(configFile)
-      for (entry <- hadoopConfTemp.asScala if !containsInSparkConf(entry.getKey)) {
-        hadoopConf.set(entry.getKey, entry.getValue)

Review comment:
       According to the current usage restrictions of Hive in Spark, for documented behaviors, there is no side-effect that makes practical sense. But in some undocumented areas, there do have some kind of side effects, e.g. dynamically load the `hive-site.xml` which is unreachable at the start of a Spark app, but added later through some APIs, then those configurations will be added anymore.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org