You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yhuai <gi...@git.apache.org> on 2016/05/20 22:58:06 UTC

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/13236

    [SPARK-15455] For IsolatedClientLoader, we need to provide a conf to disable sharing Hadoop classes

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark notsharehadoop

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13236
    
----
commit 06168a65816e7a0b01f031c58b0cffbf7306114d
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-20T22:12:06Z

    Add a conf to control if we want to share Hadoop classes for IsolatedClientLoader.

commit 4639c8583f8ff7b45c0b4ceb48df2aefa0288691
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-20T22:54:13Z

    Do not pass hadoop Configuration to HiveClientImpl

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220745379
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13236#discussion_r64123606
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ---
    @@ -129,6 +129,14 @@ private[spark] object HiveUtils extends Logging {
         .toSequence
         .createWithDefault(Nil)
     
    +  val HIVE_METASTORE_SHARE_HADOOPCLASSES =
    +    SQLConfigBuilder("spark.sql.hive.metastore.shareHadoopClasses")
    +      .doc("When set to true, the classloader used to load Hive metastore client will " +
    +        "not explicitly load Hadoop classes. Instead, Hadoop classes loaded by the classloader " +
    +        "that loads Spark will be used.")
    +      .booleanConf
    +      .createWithDefault(false)
    --- End diff --
    
    @marmbrus I am setting this conf to false by default, which changes the behavior (our master always share hadoop clsses).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai closed the pull request at:

    https://github.com/apache/spark/pull/13236


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-221074125
  
    Why is Hive's ClassLoader loading Hadoop classes itself rather than delegating to the ClassLoader that is responsible for Hadoop? Hive should be using shims to interact with whatever Hadoop classes are available. Is there a reason not to share Hadoop classes?
    
    I think rebuilding a Configuration and passing it is fine, but then you have the problem of making an exact replica of a Configuration object. I'm not sure what needs to happen (other than at least the final properties). And that still doesn't keep the behavior or addDefaultResource. Like I said, I think the ideal solution is to use the incoming Configuration. Otherwise you can probably get away with copying properties and preserving which properties are final.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220753234
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-221083829
  
    @yhuai, I know that the IsolatedClientLoader is used to load multiple versions of Hive, but Hive should be able to use the version of Hadoop that Spark has already loaded without problems. I think there should only be one version of Hadoop, the one that Spark provides.
    
    I don't see the value in loading a version of Hadoop only for Hive, when Hive is built to be able to work with any version of Hadoop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220862905
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220747250
  
    **[Test build #59049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59049/consoleFull)** for PR 13236 at commit [`8adc940`](https://github.com/apache/spark/commit/8adc9406aa24bc4d1af85e672befb0629cc71df1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220750855
  
    **[Test build #59050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59050/consoleFull)** for PR 13236 at commit [`8adc940`](https://github.com/apache/spark/commit/8adc9406aa24bc4d1af85e672befb0629cc71df1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220753236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59050/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220859959
  
    **[Test build #59115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59115/consoleFull)** for PR 13236 at commit [`084ae00`](https://github.com/apache/spark/commit/084ae00b981c44d4bf2650b3a41745af065a7d02).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220738248
  
    **[Test build #59043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59043/consoleFull)** for PR 13236 at commit [`c0e3ae0`](https://github.com/apache/spark/commit/c0e3ae027e72e3baab687cb7a4e97ff39ca09592).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220750504
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59049/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-221063988

@yhuai, Hive uses shims to be compatible with Hadoop 1 and Hadoop 2. I think it would be better to use the existing mechanism in Hive to deal with this.

I know that that this didn't use the copy constructor before, but that was a bug. That had a few flaws, including the fact that properties set in the hadoopConf didn't show up in the HiveConf. While this version accounts for missing properties, the behavior of those properties will not be correct according to the normal way Hadoop/Hive work: final properties can be accidentally overwritten by this implementation. The addDefaultResource problem is just another way this causes Spark's behavior to diverge from what users expect and there may be others that I didn't think of.

I think the ideal situation is to continue using the copy constructor to avoid unknown behavior differences, but at a minimum I think this needs to correctly copy final properties to the new HiveConf. The class loader situation sounds fine as you describe it.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13236#discussion_r64157490
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -73,7 +74,7 @@ import org.apache.spark.util.{CircularBuffer, Utils}
     private[hive] class HiveClientImpl(
         override val version: HiveVersion,
         sparkConf: SparkConf,
    -    hadoopConf: Configuration,
    +    hadoopConf: Map[String, String],
    --- End diff --
    
    Also, the reason that I am adding a flag to not Hadoop classes sharing is that the versions used by Spark and the HiveClientImpl can be different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-221033268

I don't think a map preserves behavior. Hadoop `Configuration` instances have a set of final properties that can't be changed. This loses that information and then applies properties from the SparkConf and extraConfig, which could potentially overwrite final properties.

There are also two more issues that come to mind:
# The `Configuration` has a `ClassLoader` that will be used by [`getClass`](https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html#getClass(java.lang.String, java.lang.Class)) methods.
# [`Configuration.addDefaultResource`](https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/conf/Configuration.html#addDefaultResource(java.lang.String)) can be called at any time and adds properties to all configurations.

The main reason I added `hadoopConf` was to ensure properties that I set in my Spark defaults were applied to this HiveConf, and this PR preserves that behavior. But, best practice in Hadoop is to use the copy constructor to preserve final props and class loading. I think that those two issues should be fixed.

I don't see a good way around the `addDefaultResource` problem other than to have Hive to use the same version of Hadoop that Spark uses (that seems reasonable to me, but we can discuss that separately).

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220749880
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59046/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220745292
  
    **[Test build #59043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59043/consoleFull)** for PR 13236 at commit [`c0e3ae0`](https://github.com/apache/spark/commit/c0e3ae027e72e3baab687cb7a4e97ff39ca09592).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220750462
  
    **[Test build #59049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59049/consoleFull)** for PR 13236 at commit [`8adc940`](https://github.com/apache/spark/commit/8adc9406aa24bc4d1af85e672befb0629cc71df1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220753220
  
    **[Test build #59050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59050/consoleFull)** for PR 13236 at commit [`8adc940`](https://github.com/apache/spark/commit/8adc9406aa24bc4d1af85e672befb0629cc71df1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13236#discussion_r64157471
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -73,7 +74,7 @@ import org.apache.spark.util.{CircularBuffer, Utils}
     private[hive] class HiveClientImpl(
         override val version: HiveVersion,
         sparkConf: SparkConf,
    -    hadoopConf: Configuration,
    +    hadoopConf: Map[String, String],
    --- End diff --
    
    @rdblue FYI. I am changing the type from `Configuration` to `Map[String, String]` because it is possible that Hadoop classes are not shared between Spark side and the `IsolatedClientLoader` side. When Hadoop classes are not shared, if we pass a `Configuration` object to `HiveClientImpl`, the class of this `Configuration` object (this class is loaded by Spark's classloader) is different from the `Configuration` class loaded by `IsolatedClientLoader`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220745477
  
    **[Test build #59046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59046/consoleFull)** for PR 13236 at commit [`c696264`](https://github.com/apache/spark/commit/c69626487593c2ec179e92d0cf4baa4a3d4e6e92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220745381
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59043/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220749828
  
    **[Test build #59046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59046/consoleFull)** for PR 13236 at commit [`c696264`](https://github.com/apache/spark/commit/c69626487593c2ec179e92d0cf4baa4a3d4e6e92).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220750748
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220749879
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-221133646
  
    @rdblue Thanks. That is a good point :) I am closing this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220862857
  
    **[Test build #59115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59115/consoleFull)** for PR 13236 at commit [`084ae00`](https://github.com/apache/spark/commit/084ae00b981c44d4bf2650b3a41745af065a7d02).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-221069266
  
    I can create a Hadoop Configuration inside HiveClientImpl and use it to create the HiveConf. The main issue is that we cannot pass a Hadoop Configuration in a HiveClientImpl because there may be two versions of `Configuration` loaded by two classloaders. What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220750500
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-220862906
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59115/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/13236#issuecomment-221080895
  
    We support different versions of Hive metastore (for example, Spark depends on Hive 1.2.1 but we can talk to a metastore using Hive 0.12). Also, the Hadoop used by this client can be different from the Hadoop used by Spark. We have a IsolatedClientLoader for the class isolation purpose. So, it is possible that we have to reload Hadoop classes (those classes are from a different version of Hadoop than the Hadoop used by Spark). This is the reason that we need to have a option to allow us load Hadoop classes instead of always sharing Hadoop classes. Because of this, we cannot pass Configuration to HiveClientImpl. 
    
    At here, Hadoop Configuration is only the container of all users' Hadoop settings. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-221050299

@rdblue Thank you for looking at this!

The reason that I added the flag to disable sharing Hadoop classes is that hadoops used by Spark and metastore client may not be binary compatible (e.g. hadoop 2 used by spark and hadoop 1 used by the metastore client).

For HiveClientImpl, it is a wrapper of a Hive's metastore client. As long as we can propagate user set configurations to its internal HiveConf (to make it talk the metastore correctly), it is fine.

Regarding the classloader, we actually do not use the classloader originally associated with the Hadoop Configuration and we always explicitly set the classloader associated with the HiveConf created inside HiveClientImpl.

Regarding Configuration.addDefaultResource, before 2.0, we did not pass Hadoop Configuration to HiveClientImpl (it was called ClientWrapper in Spark 1.6). Since we were not relying on Configuration.addDefaultResource, using a Map should not change anything.

btw, I am fine to change the default value of the flag to true (sharing hadoop classes) if you think it represents common use cases.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org