You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by gen tang <ge...@gmail.com> on 2015/07/15 08:15:14 UTC

Strange behavoir of pyspark with --jars option

Hi,

I met some interesting problems with --jars options
As I use the third party dependencies: elasticsearch-spark, I pass this jar
with the following command:
./bin/spark-submit --jars path-to-dependencies ...
It works well.
However, if I use HiveContext.sql, spark will lost the dependencies that I
passed.It seems that the execution of HiveContext will override the
configuration.(But if we check sparkContext._conf, the configuration is
unchanged)

But if I passed dependencies with --driver-class-path
and spark.executor.extraClassPath. The problem will disappear.

Is there anyone know why this interesting problem happens?

Thanks a lot for your help in advance.

Cheers
Gen

Re: Strange behavoir of pyspark with --jars option

Posted by Burak Yavuz <br...@gmail.com>.
Hi,
I believe the HiveContext uses a different class loader. It then falls back
to the system class loader if it can't find the classes in the context
class loader. The system class loader contains the classpath passed
through --driver-class-path
and spark.executor.extraClassPath. The JVM is already running during the
resolution of jars in --jars, therefore, they can't be added to the System
Classloader. Instead they live in a separate context classloader, which the
HiveContext doesn't use, hence the lost dependencies.

I know what I wrote may be a little complicated, please let me know if you
have any problems. I HTH.

Best,
Burak

On Tue, Jul 14, 2015 at 11:15 PM, gen tang <ge...@gmail.com> wrote:

> Hi,
>
> I met some interesting problems with --jars options
> As I use the third party dependencies: elasticsearch-spark, I pass this
> jar with the following command:
> ./bin/spark-submit --jars path-to-dependencies ...
> It works well.
> However, if I use HiveContext.sql, spark will lost the dependencies that I
> passed.It seems that the execution of HiveContext will override the
> configuration.(But if we check sparkContext._conf, the configuration is
> unchanged)
>
> But if I passed dependencies with --driver-class-path
> and spark.executor.extraClassPath. The problem will disappear.
>
> Is there anyone know why this interesting problem happens?
>
> Thanks a lot for your help in advance.
>
> Cheers
> Gen
>