You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GavinGavinNo1 <gi...@git.apache.org> on 2015/09/11 08:37:43 UTC

[GitHub] spark pull request: Update IsolatedClientLoader.scala

GitHub user GavinGavinNo1 opened a pull request:

    https://github.com/apache/spark/pull/8713

    Update IsolatedClientLoader.scala

    To resolve problem mentioned in SPARK-10529, I add an attribute of ThreadLocal<URLClassLoader> type in object IsolatedClientLoader. So, no matter how many HiveContext objects are created in the same jvm, they use the same URLClassLoader object. Therefore, no more classes will be loaded and no more jdbc connections will be created.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/GavinGavinNo1/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8713
    
----
commit 33f46d965e36b147b7c52911c9d998ba40e8ff42
Author: GavinGavinNo1 <10...@qq.com>
Date:   2015-09-11T06:35:09Z

    Update IsolatedClientLoader.scala
    
    To resolve problem mentioned in SPARK-10529, I add an attribute of ThreadLocal<URLClassLoader> type in object IsolatedClientLoader. So, no matter how many HiveContext objects are created in the same jvm, they use the same URLClassLoader object. Therefore, no more classes will be loaded and no more jdbc connections will be created.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-141166255
  
    You can have multiple contexts, you just have to increase the size of your permgen (or run Java 8).  The problem with this change is it makes things less flexible since you would not longer be able to connect to multiple different metastores from the same JVM.  Given that mind closing this issue?
    
    I'll also add that Spark 1.5 was released last week and we'll be releasing Spark 1.5.1 shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-139670064
  
    Another reason for the isolation is the ability to connect to multiple metastores.  Since hive uses global static state, new classloaders is likely the only way to accomplish this.  Why are you trying to create more than one HiveContext in a JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-140835592
  
    I would suggest increasing the size of your perm gen, and/or restructuring your app to avoid creating multiple HiveContexts.  Spark 1.5 adds the ability to do dynamic allocation in standalone mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by GavinGavinNo1 <gi...@git.apache.org>.
Github user GavinGavinNo1 commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-141073796
  
    @marmbrus Thanks a lot. I'm so sorry I didn't make myself clear. I mean I'm not familiar with submitting an issue or contributing to spark. What you suggest I have considered in fact, however I can neither push forward restructuring our app nor wait for stable spark 1.5. Anyway, Spark won't adapt to our app. But I still wander if it'll be a function to support multi HiveContext in one JVM, which I think more flexable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by GavinGavinNo1 <gi...@git.apache.org>.
Github user GavinGavinNo1 commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-141414230
  
    @marmbrus Well, we both know that we can have multiple contexts. The difference is that it can't support continuous creating contexts. No matter how much size my permgen is, it'll lead to memory leak  and cause too many jdbc connections error. Another thing you said about different metastores, I think a certain environment normally have a certain version of metastore.
    I‘m sure you have brilliant idea for denying my opinion, not only for what you have expressed. Otherwise, adding a parameter to control can deal with both problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-139465728
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by GavinGavinNo1 <gi...@git.apache.org>.
Github user GavinGavinNo1 commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-140621464
  
    @marmbrus Sorry to disturb again. Could you please give me a reply? It's my first try. Maybe I need some advice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8713


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10529][SQL]When creating multiple HiveC...

Posted by GavinGavinNo1 <gi...@git.apache.org>.
Github user GavinGavinNo1 commented on the pull request:

    https://github.com/apache/spark/pull/8713#issuecomment-139702061
  
    Thank you much for your comment. I think I haven't got what you mean for the ability to connect to multiple metastores.One HiveContext can only connect to one metastore, right? Or you mean creating multiple HiveContext to connect to multiple metastores with one SparkContext in one JVM? If so, it'll lead to the same JVM OOM problem in theory.
    We use spark 1.3.1 formerly. You know it isn't supported for dynamic allocation in standalone mode. We have several apps and each one launches timely tasks using HiveContext. Due to the limit of hardware resources, we must stop SparkContext to release CPU and memory resources when a task is done. When Spark 1.4.1 comes out, it brings many new features and we want to switch to this version. However, problems mentioned in my issue make a lot of trouble to us.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org