You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/01/24 05:08:00 UTC

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20377

    [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' option when creating client

    ## What changes were proposed in this pull request?
    This PR is to remove useless  `warehouseDir`, which is already contained in `hadoopConf `
    
    ## How was this patch tested?
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark followUp17088

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20377.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20377
    
----
commit 0574ec71d830d8c4b5acdbfc910ddc0f5622b0ed
Author: gatorsmile <ga...@...>
Date:   2018-01-24T05:05:47Z

    remove warehousePath

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    The original test covers the original scenario. I think the one pointed by @vanzin is another issue. However, I do not have time to try it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163920410
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    Let me ask the question: what exactly is the problem with the argument I added? It solves the issue without having to write all this code.
    
    If you really dislike the argument for some odd reason, you can get the config by iterating over the `Iterable` in `HiveClientImpl`, making an operation that is currently O(1) to be O(n).
    
    But I really don't understand why you guys care about this argument so much. There are *2* call sites to that constructor in the whole code base, both in the same method in `IsolatedClientLoader.scala`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163936367
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    This is not the hot path. Passing extra parameters for this looks unnecessary. We need to keep the interface clean for better code maintenance. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r164032685
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    I think many people may get confused when they see this `JIterable[JMap.Entry[String, String]]` and the extra `warehouse` parameter. I don't agree that we can sacrifices code quality because it's only called twice.
    
    So the argument is, is this wrapper solution cleaner than the current one? If it is, and can fix the issue, let's go for it.
    
    I think refactoring should be encouraged, it happens a lot in the spark code base, people see hacky code and people fix it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    I'm a little confused, I think [this test](https://github.com/apache/spark/pull/20169/files#diff-0456ca985f0d885d5b72654e10be77ccR204) should help us detect the wrong fix, but this PR passed all tests. Does it indicate that the test actually can't expose the original bug?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163774842
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    so the major concern is to hide the `Configuration` class through the code path. How about we create a wrapper?
    ```
    trait HadoopConfWrapper {
      def get(key: String): String
      def toIterator: Iterator[(String, String)]
    }
    ```
    
    and here
    ```
    val wrapper = new HadoopConfWrapper {
      def get(key: String) = hadoopConf.get(key)
    
      def toIterator = hadoopConf.iterator().asScala
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86562/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r164551253
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    Adding a comment about exactly why any of the proposed changes are needed is really the only thing that can make this code more understandable. I had that in the bug and in my commit message, but in hindsight, it really should be in the code.
    
    Without the comment, all versions are equally complex, because the complexity has nothing to do with how many arguments you have or their type.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163631446
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -132,7 +131,8 @@ private[hive] class HiveClientImpl(
           if (ret != null) {
             // hive.metastore.warehouse.dir is determined in SharedState after the CliSessionState
             // instance constructed, we need to follow that change here.
    -        warehouseDir.foreach { dir =>
    +        val conf = hadoopConf.asInstanceOf[Configuration]
    --- End diff --
    
    Actually I'm not sure just what I suggested will work. You need to reproduce the condition to trigger the `if` above:
    
    ```
           val ret = SessionState.get
            if (ret != null) {		        if (ret != null) {
    ```
    
    And I'm not sure how to do that. I'm just sure that if you do, your code will not work.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    **[Test build #86562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86562/testReport)** for PR 20377 at commit [`0574ec7`](https://github.com/apache/spark/commit/0574ec71d830d8c4b5acdbfc910ddc0f5622b0ed).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163938641
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    This is just a general rule we follow when we developing the software, especially when this does not affect the performance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r164043733
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    Or simply just iterate it with O(n) as said in https://github.com/apache/spark/pull/20377/files#r163920410.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    **[Test build #86562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86562/testReport)** for PR 20377 at commit [`0574ec7`](https://github.com/apache/spark/commit/0574ec71d830d8c4b5acdbfc910ddc0f5622b0ed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    @gatorsmile this fix is wrong. If you want to test for yourself, you need to set `METASTOREWAREHOUSE` before instantiating the client in `VersionsSuite`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163627927
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -132,7 +131,8 @@ private[hive] class HiveClientImpl(
           if (ret != null) {
             // hive.metastore.warehouse.dir is determined in SharedState after the CliSessionState
             // instance constructed, we need to follow that change here.
    -        warehouseDir.foreach { dir =>
    +        val conf = hadoopConf.asInstanceOf[Configuration]
    --- End diff --
    
    How to reproduce it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163630721
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -132,7 +131,8 @@ private[hive] class HiveClientImpl(
           if (ret != null) {
             // hive.metastore.warehouse.dir is determined in SharedState after the CliSessionState
             // instance constructed, we need to follow that change here.
    -        warehouseDir.foreach { dir =>
    +        val conf = hadoopConf.asInstanceOf[Configuration]
    --- End diff --
    
    Thanks! Will try to reproduce it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r164043569
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    How about something like this then if it really matters?
    
    ```diff
    --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
    +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
    @@ -83,9 +83,8 @@ import org.apache.spark.util.{CircularBuffer, Utils}
      */
     private[hive] class HiveClientImpl(
         override val version: HiveVersion,
    -    warehouseDir: Option[String],
         sparkConf: SparkConf,
    -    hadoopConf: JIterable[JMap.Entry[String, String]],
    +    hadoopConf: HadoopConfiguration,
         extraConfig: Map[String, String],
         initClassLoader: ClassLoader,
         val clientLoader: IsolatedClientLoader)
    @@ -106,6 +105,8 @@ private[hive] class HiveClientImpl(
         case hive.v2_1 => new Shim_v2_1()
       }
    
    +  private val hadoopConfMap = hadoopConf.iterator().asScala.map(e => e.getKey -> e.getValue).toMap
    +
       // Create an internal session state for this HiveClientImpl.
       val state: SessionState = {
         val original = Thread.currentThread().getContextClassLoader
    @@ -132,7 +133,7 @@ private[hive] class HiveClientImpl(
           if (ret != null) {
             // hive.metastore.warehouse.dir is determined in SharedState after the CliSessionState
             // instance constructed, we need to follow that change here.
    -        warehouseDir.foreach { dir =>
    +        hadoopConfMap.get(ConfVars.METASTOREWAREHOUSE.varname).foreach { dir =>
               ret.getConf.setVar(ConfVars.METASTOREWAREHOUSE, dir)
             }
             ret
    @@ -166,8 +167,7 @@ private[hive] class HiveClientImpl(
         // has hive-site.xml. So, HiveConf will use that to override its default values.
         // 2: we set all spark confs to this hiveConf.
         // 3: we set all entries in config to this hiveConf.
    -    (hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue)
    -      ++ sparkConf.getAll.toMap ++ extraConfig).foreach { case (k, v) =>
    +    (hadoopConfMap ++ sparkConf.getAll.toMap ++ extraConfig).foreach { case (k, v) =>
           logDebug(
             s"""
                |Applying Hadoop/Hive/Spark and extra properties to Hive Conf:
    @@ -847,6 +847,11 @@ private[hive] class HiveClientImpl(
     }
    
     private[hive] object HiveClientImpl {
    +
    +  // wider signature for hadoop conf for different versions blabla ..
    +  // See SPARK-17088.
    +  private type HadoopConfiguration = JIterable[JMap.Entry[String, String]]
    +
       /** Converts the native StructField to Hive's FieldSchema. */
       def toHiveColumn(c: StructField): FieldSchema = {
         val typeString = if (c.metadata.contains(HIVE_TYPE_STRING)) {
    ```
    
    I think this could roughly address all concerns listed here.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163628671
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -132,7 +131,8 @@ private[hive] class HiveClientImpl(
           if (ret != null) {
             // hive.metastore.warehouse.dir is determined in SharedState after the CliSessionState
             // instance constructed, we need to follow that change here.
    -        warehouseDir.foreach { dir =>
    +        val conf = hadoopConf.asInstanceOf[Configuration]
    --- End diff --
    
    See other comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163655221
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -132,7 +131,8 @@ private[hive] class HiveClientImpl(
           if (ret != null) {
             // hive.metastore.warehouse.dir is determined in SharedState after the CliSessionState
             // instance constructed, we need to follow that change here.
    -        warehouseDir.foreach { dir =>
    +        val conf = hadoopConf.asInstanceOf[Configuration]
    --- End diff --
    
    Over heeerre ... you could say this is a good example of why getting reviews is important!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163936822
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    That's such a subjective argument. The extra argument does not make the code more complicated, especially compared to anything else going on here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163774967
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    cc @vanzin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r163627455
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -132,7 +131,8 @@ private[hive] class HiveClientImpl(
           if (ret != null) {
             // hive.metastore.warehouse.dir is determined in SharedState after the CliSessionState
             // instance constructed, we need to follow that change here.
    -        warehouseDir.foreach { dir =>
    +        val conf = hadoopConf.asInstanceOf[Configuration]
    --- End diff --
    
    You're reintroducing the original bug here. You cannot cast this to `Configuration`, because in the `sharesHadoopClasses = false` case, there are two different instances of that class involved, and this will fail with a `ClassCastException`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r164035915
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    I'm sure that when people think about non-trivial code in Spark, this is exactly the line in the whole code base they think of... :-/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    cc @vanzin @cloud-fan @felixcheung @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasse...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20377#discussion_r164050130
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ---
    @@ -251,10 +251,8 @@ private[hive] class IsolatedClientLoader(
     
       /** The isolated client interface to Hive. */
       private[hive] def createClient(): HiveClient = synchronized {
    -    val warehouseDir = Option(hadoopConf.get(ConfVars.METASTOREWAREHOUSE.varname))
         if (!isolationOn) {
    -      return new HiveClientImpl(version, warehouseDir, sparkConf, hadoopConf, config,
    -        baseClassLoader, this)
    +      return new HiveClientImpl(version, sparkConf, hadoopConf, config, baseClassLoader, this)
    --- End diff --
    
    @HyukjinKwon 's proposal looks good with this type alias and comments, so people can know what happened here. But due to personal taste, I prefer the wrapper solution as it looks cleaner to me and don't need to build a map or a O(n) lookup :-/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20377: [SPARK-17088] [FOLLOW-UP] Fix 'sharesHadoopClasses' opti...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20377
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/169/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org