You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Fi (JIRA)" <ji...@apache.org> on 2015/06/04 07:51:42 UTC
[jira] [Commented] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error

    [ https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572205#comment-14572205 ] 

Fi commented on SPARK-7819:
---------------------------

Hello, sorry for not responding sooner, been quite hectic at work.

We have a smoke test that I run whenever I'm testing a new Spark custom build.

Basically it's a python script that test various parts of the Spark API.
During the course of the execution, several Spark Contexts are created, as is HiveContext and SQLContext wrappers.
The test is rather light, but it does a decent job of giving me a heads up when an API changes underneath me so I can give our developers fair warning. :)
It does things like reading/writing parquet files, reading/writing files to MARPFS, word count jobs, hive queries, DataFrame API calls, etc.
It also serves as a light benchmark suite, so that I can keep an eye on performance that may have been introduced by the spark distribution, or by regular operational shenanigans on our Mesos cluster.

The test takes a simple 4-node dev/integration cluster about 200 seconds to run, moving around 100 GB of data from a non-local MAPRFS cluster via raw textFile and HiveContext/SQLContext queries.

Anyway, per my last comment, we ran out of PermGen in this script.

I created an even newer Spark 1.4 build, git 84da653192a2d9edb82d0dbe50f577c4dc6a0c78 and deployed it to our test cluster.
I then updated the spark-defaults.conf per your suggestions, as well as increasing the JVM PermGen settings:

    spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni

    spark.driver.extraJavaOptions -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:MaxPermSize=512M

I'm not sure if CMSClassUnloadingEnabled and CMSPermGenSweepingEnabled is needed. I came across these settings on StackOverflow, and it sounded like it wouldn't hurt, considering what the Isolated Hive Client Loader might be trying to do.

Incidentally, I typically run this smoke test script as an Ipython Notebook, this lets me also do smoke tests on non-spark related apis (such as using matplotlib).

With the above settings, I was able to get through the smoke test without errors.
Just for kicks, I ran it a second time (WITHIN the same running kernel), hoping (or not) to see a OOM.
It worked! So a third time, and it still worked.
I kicked it off a fourth time (still within the same ipython kernel) and was about to declare this a success, when the script failed with an InvalidClassCastException (attached).

Very strange! Not sure what could cause it.

Anyway, I tried a fifth time (still within the same kernel), and it passed just fine.

Considering the smoke tests worked fine 4 out of 5 times, I'm satisfied enough, and will chalk this up as some flakiness in the JVM and all the funky class loading. Also, did I mention that this ipython Notebook is also running in a docker container on a XEN Hypervisor VM ? Maybe that had something to do with it. :) 

So it would appear that increasing the PermGen space should be highly recommended (and maybe a default stock setting) in order to avoid the PermGen OOM error.







> Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-7819
>                 URL: https://issues.apache.org/jira/browse/SPARK-7819
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Fi
>            Priority: Critical
>         Attachments: invalidClassException.log, stacktrace.txt, test.py
>
>
> In reference to the pull request: https://github.com/apache/spark/pull/5876
> I have been running the Spark 1.3 branch for some time with no major hiccups, and recently switched to the Spark 1.4 branch.
> I build my spark distribution with the following build command:
> {noformat}
> make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver
> {noformat}
> When running a python script containing a series of smoke tests I use to validate the build, I encountered an error under the following conditions:
> * start a spark context
> * start a hive context
> * run any hive query
> * stop the spark context
> * start a second spark context
> * run any hive query
> ** ERROR
> From what I can tell, the Isolated Class Loader is hitting a MapR class that is loading its native library (presumedly as part of a static initializer).
> Unfortunately, the JVM prohibits this the second time around.
> I would think that shutting down the SparkContext would clear out any vestigials of the JVM, so I'm surprised that this would even be a problem.
> Note: all other smoke tests we are running passes fine.
> I will attach the stacktrace and a python script reproducing the issue (at least for my environment and build).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org