You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jb44 <jb...@gmail.com> on 2018/04/13 01:32:40 UTC

Spark LOCAL mode and external jar (extraClassPath)

I'm running spark in LOCAL mode and trying to get it to talk to alluxio. I'm
getting the error: java.lang.ClassNotFoundException: Class
alluxio.hadoop.FileSystem not found
The cause of this error is apparently that Spark cannot find the alluxio
client jar in its classpath.

I have looked at the page here:
https://www.alluxio.org/docs/master/en/Debugging-Guide.html#q-why-do-i-see-exceptions-like-javalangruntimeexception-javalangclassnotfoundexception-class-alluxiohadoopfilesystem-not-found

Which details the steps to take in this situation, but I'm not finding
success.

According to Spark documentation, I can instance a local Spark like so:

SparkSession.builder
  .appName("App")
  .getOrCreate

Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

I have verified that the proper jar file exists in the right location on my
local machine with:
logger.error(sparkSession.conf.get("spark.driver.extraClassPath"))
logger.error(sparkSession.conf.get("spark.executor.extraClassPath"))

But I still get the error. Is there anything else I can do to figure out why
Spark is not picking the library up?

Please note I am not using spark-submit - I am aware of the methods for
adding the client jar to a spark-submit job. My Spark instance is being
created as local within my application and this is the use case I want to
solve.

As an FYI there is another application in the cluster which is connecting to
my alluxio using the fs client and that all works fine. In that case,
though, the fs client is being packaged as part of the application through
standard sbt dependencies.





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

if you start spark or pyspark from command line and then add the option
--jars and see that things are working fine, then it means that you will
have to add the jar either to SPARK_HOME jars file or modify the spark-env
file to include the path pointing to the location where the jar file is
stored. This location has to be accessible by all the worker nodes.


Regards,
Gourav Sengupta

On Sat, Apr 14, 2018 at 6:02 PM, Jason Boorn <jb...@gmail.com> wrote:

> Ok great I’ll give that a shot -
>
> Thanks for all the help
>
>
> On Apr 14, 2018, at 12:08 PM, Gene Pang <ge...@gmail.com> wrote:
>
> Yes, I think that is the case. I haven't tried that before, but it should
> work.
>
> Thanks,
> Gene
>
> On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn <jb...@gmail.com> wrote:
>
>> Hi Gene -
>>
>> Are you saying that I just need to figure out how to get the Alluxio jar
>> into the classpath of my parent application?  If it shows up in the
>> classpath then Spark will automatically know that it needs to use it when
>> communicating with Alluxio?
>>
>> Apologies for going back-and-forth on this - I feel like my particular
>> use case is clouding what is already a tricky issue.
>>
>> On Apr 13, 2018, at 2:26 PM, Gene Pang <ge...@gmail.com> wrote:
>>
>> Hi Jason,
>>
>> Alluxio does work with Spark in master=local mode. This is because both
>> spark-submit and spark-shell have command-line options to set the classpath
>> for the JVM that is being started.
>>
>> If you are not using spark-submit or spark-shell, you will have to figure
>> out how to configure that JVM instance with the proper properties.
>>
>> Thanks,
>> Gene
>>
>> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <jb...@gmail.com> wrote:
>>
>>> Ok thanks - I was basing my design on this:
>>>
>>> https://databricks.com/blog/2016/08/15/how-to-use-sparksessi
>>> on-in-apache-spark-2-0.html
>>>
>>> Wherein it says:
>>> Once the SparkSession is instantiated, you can configure Spark’s runtime
>>> config properties.
>>> Apparently the suite of runtime configs you can change does not include
>>> classpath.
>>>
>>> So the answer to my original question is basically this:
>>>
>>> When using local (pseudo-cluster) mode, there is no way to add external
>>> jars to the spark instance.  This means that Alluxio will not work with
>>> Spark when Spark is run in master=local mode.
>>>
>>> Thanks again - often getting a definitive “no” is almost as good as a
>>> yes.  Almost ;)
>>>
>>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>
>>> There are two things you're doing wrong here:
>>>
>>> On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jb...@gmail.com> wrote:
>>>
>>> Then I can add the alluxio client library like so:
>>> sparkSession.conf.set("spark.driver.extraClassPath",
>>> ALLUXIO_SPARK_CLIENT)
>>>
>>>
>>> First one, you can't modify JVM configuration after it has already
>>> started. So this line does nothing since it can't re-launch your
>>> application with a new JVM.
>>>
>>> sparkSession.conf.set("spark.executor.extraClassPath",
>>> ALLUXIO_SPARK_CLIENT)
>>>
>>>
>>> There is a lot of configuration that you cannot set after the
>>> application has already started. For example, after the session is
>>> created, most probably this option will be ignored, since executors
>>> will already have started.
>>>
>>> I'm not so sure about what happens when you use dynamic allocation,
>>> but these post-hoc config changes in general are not expected to take
>>> effect.
>>>
>>> The documentation could be clearer about this (especially stuff that
>>> only applies to spark-submit), but that's the gist of it.
>>>
>>>
>>> --
>>> Marcelo
>>>
>>>
>>>
>>
>>
>
>

Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Jason Boorn <jb...@gmail.com>.
Ok great I’ll give that a shot -

Thanks for all the help

> On Apr 14, 2018, at 12:08 PM, Gene Pang <ge...@gmail.com> wrote:
> 
> Yes, I think that is the case. I haven't tried that before, but it should work.
> 
> Thanks,
> Gene
> 
> On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn <jboorn@gmail.com <ma...@gmail.com>> wrote:
> Hi Gene - 
> 
> Are you saying that I just need to figure out how to get the Alluxio jar into the classpath of my parent application?  If it shows up in the classpath then Spark will automatically know that it needs to use it when communicating with Alluxio?
> 
> Apologies for going back-and-forth on this - I feel like my particular use case is clouding what is already a tricky issue.
> 
>> On Apr 13, 2018, at 2:26 PM, Gene Pang <gene.pang@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi Jason,
>> 
>> Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started.
>> 
>> If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM instance with the proper properties.
>> 
>> Thanks,
>> Gene
>> 
>> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <jboorn@gmail.com <ma...@gmail.com>> wrote:
>> Ok thanks - I was basing my design on this:
>> 
>> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html <https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html>
>> 
>> Wherein it says:
>> Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
>> Apparently the suite of runtime configs you can change does not include classpath.  
>> 
>> So the answer to my original question is basically this:
>> 
>> When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.
>> 
>> Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)
>> 
>>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <vanzin@cloudera.com <ma...@cloudera.com>> wrote:
>>> 
>>> There are two things you're doing wrong here:
>>> 
>>> On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jboorn@gmail.com <ma...@gmail.com>> wrote:
>>>> Then I can add the alluxio client library like so:
>>>> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
>>> 
>>> First one, you can't modify JVM configuration after it has already
>>> started. So this line does nothing since it can't re-launch your
>>> application with a new JVM.
>>> 
>>>> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)
>>> 
>>> There is a lot of configuration that you cannot set after the
>>> application has already started. For example, after the session is
>>> created, most probably this option will be ignored, since executors
>>> will already have started.
>>> 
>>> I'm not so sure about what happens when you use dynamic allocation,
>>> but these post-hoc config changes in general are not expected to take
>>> effect.
>>> 
>>> The documentation could be clearer about this (especially stuff that
>>> only applies to spark-submit), but that's the gist of it.
>>> 
>>> 
>>> -- 
>>> Marcelo
>> 
>> 
> 
> 


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Gene Pang <ge...@gmail.com>.
Yes, I think that is the case. I haven't tried that before, but it should
work.

Thanks,
Gene

On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn <jb...@gmail.com> wrote:

> Hi Gene -
>
> Are you saying that I just need to figure out how to get the Alluxio jar
> into the classpath of my parent application?  If it shows up in the
> classpath then Spark will automatically know that it needs to use it when
> communicating with Alluxio?
>
> Apologies for going back-and-forth on this - I feel like my particular use
> case is clouding what is already a tricky issue.
>
> On Apr 13, 2018, at 2:26 PM, Gene Pang <ge...@gmail.com> wrote:
>
> Hi Jason,
>
> Alluxio does work with Spark in master=local mode. This is because both
> spark-submit and spark-shell have command-line options to set the classpath
> for the JVM that is being started.
>
> If you are not using spark-submit or spark-shell, you will have to figure
> out how to configure that JVM instance with the proper properties.
>
> Thanks,
> Gene
>
> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <jb...@gmail.com> wrote:
>
>> Ok thanks - I was basing my design on this:
>>
>> https://databricks.com/blog/2016/08/15/how-to-use-sparksessi
>> on-in-apache-spark-2-0.html
>>
>> Wherein it says:
>> Once the SparkSession is instantiated, you can configure Spark’s runtime
>> config properties.
>> Apparently the suite of runtime configs you can change does not include
>> classpath.
>>
>> So the answer to my original question is basically this:
>>
>> When using local (pseudo-cluster) mode, there is no way to add external
>> jars to the spark instance.  This means that Alluxio will not work with
>> Spark when Spark is run in master=local mode.
>>
>> Thanks again - often getting a definitive “no” is almost as good as a
>> yes.  Almost ;)
>>
>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>
>> There are two things you're doing wrong here:
>>
>> On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jb...@gmail.com> wrote:
>>
>> Then I can add the alluxio client library like so:
>> sparkSession.conf.set("spark.driver.extraClassPath",
>> ALLUXIO_SPARK_CLIENT)
>>
>>
>> First one, you can't modify JVM configuration after it has already
>> started. So this line does nothing since it can't re-launch your
>> application with a new JVM.
>>
>> sparkSession.conf.set("spark.executor.extraClassPath",
>> ALLUXIO_SPARK_CLIENT)
>>
>>
>> There is a lot of configuration that you cannot set after the
>> application has already started. For example, after the session is
>> created, most probably this option will be ignored, since executors
>> will already have started.
>>
>> I'm not so sure about what happens when you use dynamic allocation,
>> but these post-hoc config changes in general are not expected to take
>> effect.
>>
>> The documentation could be clearer about this (especially stuff that
>> only applies to spark-submit), but that's the gist of it.
>>
>>
>> --
>> Marcelo
>>
>>
>>
>
>

Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Jason Boorn <jb...@gmail.com>.
Hi Gene - 

Are you saying that I just need to figure out how to get the Alluxio jar into the classpath of my parent application?  If it shows up in the classpath then Spark will automatically know that it needs to use it when communicating with Alluxio?

Apologies for going back-and-forth on this - I feel like my particular use case is clouding what is already a tricky issue.

> On Apr 13, 2018, at 2:26 PM, Gene Pang <ge...@gmail.com> wrote:
> 
> Hi Jason,
> 
> Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started.
> 
> If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM instance with the proper properties.
> 
> Thanks,
> Gene
> 
> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <jboorn@gmail.com <ma...@gmail.com>> wrote:
> Ok thanks - I was basing my design on this:
> 
> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html <https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html>
> 
> Wherein it says:
> Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
> Apparently the suite of runtime configs you can change does not include classpath.  
> 
> So the answer to my original question is basically this:
> 
> When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.
> 
> Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)
> 
>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <vanzin@cloudera.com <ma...@cloudera.com>> wrote:
>> 
>> There are two things you're doing wrong here:
>> 
>> On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jboorn@gmail.com <ma...@gmail.com>> wrote:
>>> Then I can add the alluxio client library like so:
>>> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
>> 
>> First one, you can't modify JVM configuration after it has already
>> started. So this line does nothing since it can't re-launch your
>> application with a new JVM.
>> 
>>> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)
>> 
>> There is a lot of configuration that you cannot set after the
>> application has already started. For example, after the session is
>> created, most probably this option will be ignored, since executors
>> will already have started.
>> 
>> I'm not so sure about what happens when you use dynamic allocation,
>> but these post-hoc config changes in general are not expected to take
>> effect.
>> 
>> The documentation could be clearer about this (especially stuff that
>> only applies to spark-submit), but that's the gist of it.
>> 
>> 
>> -- 
>> Marcelo
> 
> 


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Gene Pang <ge...@gmail.com>.
Hi Jason,

Alluxio does work with Spark in master=local mode. This is because both
spark-submit and spark-shell have command-line options to set the classpath
for the JVM that is being started.

If you are not using spark-submit or spark-shell, you will have to figure
out how to configure that JVM instance with the proper properties.

Thanks,
Gene

On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <jb...@gmail.com> wrote:

> Ok thanks - I was basing my design on this:
>
> https://databricks.com/blog/2016/08/15/how-to-use-
> sparksession-in-apache-spark-2-0.html
>
> Wherein it says:
> Once the SparkSession is instantiated, you can configure Spark’s runtime
> config properties.
> Apparently the suite of runtime configs you can change does not include
> classpath.
>
> So the answer to my original question is basically this:
>
> When using local (pseudo-cluster) mode, there is no way to add external
> jars to the spark instance.  This means that Alluxio will not work with
> Spark when Spark is run in master=local mode.
>
> Thanks again - often getting a definitive “no” is almost as good as a
> yes.  Almost ;)
>
> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>
> There are two things you're doing wrong here:
>
> On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jb...@gmail.com> wrote:
>
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
>
>
> First one, you can't modify JVM configuration after it has already
> started. So this line does nothing since it can't re-launch your
> application with a new JVM.
>
> sparkSession.conf.set("spark.executor.extraClassPath",
> ALLUXIO_SPARK_CLIENT)
>
>
> There is a lot of configuration that you cannot set after the
> application has already started. For example, after the session is
> created, most probably this option will be ignored, since executors
> will already have started.
>
> I'm not so sure about what happens when you use dynamic allocation,
> but these post-hoc config changes in general are not expected to take
> effect.
>
> The documentation could be clearer about this (especially stuff that
> only applies to spark-submit), but that's the gist of it.
>
>
> --
> Marcelo
>
>
>

Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Jason Boorn <jb...@gmail.com>.
Ok thanks - I was basing my design on this:

https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html <https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html>

Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config properties. 
Apparently the suite of runtime configs you can change does not include classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to the spark instance.  This means that Alluxio will not work with Spark when Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  Almost ;)

> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> 
> There are two things you're doing wrong here:
> 
> On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jb...@gmail.com> wrote:
>> Then I can add the alluxio client library like so:
>> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
> 
> First one, you can't modify JVM configuration after it has already
> started. So this line does nothing since it can't re-launch your
> application with a new JVM.
> 
>> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)
> 
> There is a lot of configuration that you cannot set after the
> application has already started. For example, after the session is
> created, most probably this option will be ignored, since executors
> will already have started.
> 
> I'm not so sure about what happens when you use dynamic allocation,
> but these post-hoc config changes in general are not expected to take
> effect.
> 
> The documentation could be clearer about this (especially stuff that
> only applies to spark-submit), but that's the gist of it.
> 
> 
> -- 
> Marcelo


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Marcelo Vanzin <va...@cloudera.com>.
There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jb...@gmail.com> wrote:
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Jason Boorn <jb...@gmail.com>.
Thanks - I’ve seen this SO post, it covers spark-submit, which I am not using.

Regarding the ALLUXIO_SPARK_CLIENT variable, it is located on the machine that is running the job which spawns the master=local spark.  According to the Spark documentation, this should be possible, but it appears it is not.

Once again - I’m trying to solve the use case for master=local, NOT for a cluster and NOT with spark-submit.  

> On Apr 13, 2018, at 12:47 PM, yohann jardin <yo...@hotmail.com> wrote:
> 
> Hey Jason,
> Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and where is located the lib (is it on HDFS, on the node that submits the job, or locally to all spark workers?)
> There is a great post on SO about it: https://stackoverflow.com/a/37348234 <https://stackoverflow.com/a/37348234>
> We might as well check that you provide correctly the jar based on its location. I have found it tricky in some cases.
> As a debug try, if the jar is not on HDFS, you can copy it there and then specify the full path in the extraclasspath property. 
> Regards,
> Yohann Jardin
> 
> Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
>> I do, and this is what I will fall back to if nobody has a better idea :)
>> 
>> I was just hoping to get this working as it is much more convenient for my testing pipeline.
>> 
>> Thanks again for the help
>> 
>>> On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen <geoff@ibleducation.com <ma...@ibleducation.com>> wrote:
>>> 
>>> Ok - `LOCAL` makes sense now.
>>> 
>>> Do you have the option to still use `spark-submit` in this scenario, but using the following options:
>>> 
>>> ```bash
>>> --master "local[*]" \
>>> --deploy-mode "client" \
>>> ...
>>> ```
>>> 
>>> I know in the past, I have setup some options using `.config("Option", "value")` when creating the spark session, and then other runtime options as you describe above with `spark.conf.set`. At this point though I've just moved everything out into a `spark-submit` script.
>>> 
>>> On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <jboorn@gmail.com <ma...@gmail.com>> wrote:
>>> Hi Geoff -
>>> 
>>> Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.
>>> 
>>> I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.
>>> 
>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html>
>>> 
>>> While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:
>>> 
>>> spark.conf.set(“”)
>>> 
>>> To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.
>>> 
>>> 
>>>> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <geoff@ibleducation.com <ma...@ibleducation.com>> wrote:
>>>> 
>>>> I fought with a 
>>>> ClassNotFoundException for quite some time, but it was for kafka.
>>>> 
>>>> The final configuration that got everything working was running 
>>>> spark-submit with the following options:
>>>> 
>>>> --jars "/path/to/.ivy2/jars/package.jar" \
>>>> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
>>>> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
>>>> --packages org.some.package:package_name:version
>>>> While this was needed for me to run in 
>>>> cluster mode, it works equally well for 
>>>> client mode as well.
>>>> 
>>>> One other note when needing to supplied multiple items to these args - 
>>>> --jars and 
>>>> --packages should be comma separated, 
>>>> --driver-class-path and 
>>>> extraClassPath should be 
>>>> : separated
>>>> 
>>>> HTH
>>>> 
>>>> 
>>>> On Fri, Apr 13, 2018 at 4:28 AM, jb44 <jboorn@gmail.com <ma...@gmail.com>> wrote:
>>>> Haoyuan -
>>>> 
>>>> As I mentioned below, I've been through the documentation already.  It has
>>>> not helped me to resolve the issue.
>>>> 
>>>> Here is what I have tried so far:
>>>> 
>>>> - setting extraClassPath as explained below
>>>> - adding fs.alluxio.impl through sparkconf
>>>> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
>>>> this matters in my case)
>>>> - compiling the client from source 
>>>> 
>>>> Do you have any other suggestions on how to get this working?  
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ <http://apache-spark-user-list.1001560.n3.nabble.com/>
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by yohann jardin <yo...@hotmail.com>.
Hey Jason,

Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and where is located the lib (is it on HDFS, on the node that submits the job, or locally to all spark workers?)
There is a great post on SO about it: https://stackoverflow.com/a/37348234

We might as well check that you provide correctly the jar based on its location. I have found it tricky in some cases.
As a debug try, if the jar is not on HDFS, you can copy it there and then specify the full path in the extraclasspath property.

Regards,

Yohann Jardin

Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
I do, and this is what I will fall back to if nobody has a better idea :)

I was just hoping to get this working as it is much more convenient for my testing pipeline.

Thanks again for the help

On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen <ge...@ibleducation.com>> wrote:

Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but using the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option", "value")` when creating the spark session, and then other runtime options as you describe above with `spark.conf.set`. At this point though I've just moved everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <jb...@gmail.com>> wrote:
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html

While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.


On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <ge...@ibleducation.com>> wrote:


I fought with a ClassNotFoundException for quite some time, but it was for kafka.

The final configuration that got everything working was running spark-submit with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version


While this was needed for me to run in cluster mode, it works equally well for client mode as well.

One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated

HTH

​

On Fri, Apr 13, 2018 at 4:28 AM, jb44 <jb...@gmail.com>> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working?

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>







Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Jason Boorn <jb...@gmail.com>.
I do, and this is what I will fall back to if nobody has a better idea :)

I was just hoping to get this working as it is much more convenient for my testing pipeline.

Thanks again for the help

> On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen <ge...@ibleducation.com> wrote:
> 
> Ok - `LOCAL` makes sense now.
> 
> Do you have the option to still use `spark-submit` in this scenario, but using the following options:
> 
> ```bash
> --master "local[*]" \
> --deploy-mode "client" \
> ...
> ```
> 
> I know in the past, I have setup some options using `.config("Option", "value")` when creating the spark session, and then other runtime options as you describe above with `spark.conf.set`. At this point though I've just moved everything out into a `spark-submit` script.
> 
> On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <jboorn@gmail.com <ma...@gmail.com>> wrote:
> Hi Geoff -
> 
> Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.
> 
> I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.
> 
> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html>
> 
> While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:
> 
> spark.conf.set(“”)
> 
> To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.
> 
> 
>> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <geoff@ibleducation.com <ma...@ibleducation.com>> wrote:
>> 
>> I fought with a ClassNotFoundException for quite some time, but it was for kafka.
>> 
>> The final configuration that got everything working was running spark-submit with the following options:
>> 
>> --jars "/path/to/.ivy2/jars/package.jar" \
>> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
>> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
>> --packages org.some.package:package_name:version
>> While this was needed for me to run in cluster mode, it works equally well for client mode as well.
>> 
>> One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated
>> 
>> HTH
>> 
>> 
>> On Fri, Apr 13, 2018 at 4:28 AM, jb44 <jboorn@gmail.com <ma...@gmail.com>> wrote:
>> Haoyuan -
>> 
>> As I mentioned below, I've been through the documentation already.  It has
>> not helped me to resolve the issue.
>> 
>> Here is what I have tried so far:
>> 
>> - setting extraClassPath as explained below
>> - adding fs.alluxio.impl through sparkconf
>> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
>> this matters in my case)
>> - compiling the client from source 
>> 
>> Do you have any other suggestions on how to get this working?  
>> 
>> Thanks
>> 
>> 
>> 
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ <http://apache-spark-user-list.1001560.n3.nabble.com/>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
>> 
>> 
> 
> 


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Geoff Von Allmen <ge...@ibleducation.com>.
Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but
using the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option",
"value")` when creating the spark session, and then other runtime options
as you describe above with `spark.conf.set`. At this point though I've just
moved everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <jb...@gmail.com> wrote:

> Hi Geoff -
>
> Appreciate the help here - I do understand what you’re saying below.  And
> I am able to get this working when I submit a job to a local cluster.
>
> I think part of the issue here is that there’s ambiguity in the
> terminology.  When I say “LOCAL” spark, I mean an instance of spark that is
> created by my driver program, and is not a cluster itself.  It means that
> my master node is “local”, and this mode is primarily used for testing.
>
> https://jaceklaskowski.gitbooks.io/mastering-apache-
> spark/content/spark-local.html
>
> While I am able to get alluxio working with spark-submit, I am unable to
> get it working when using local mode.  The mechanisms for setting class
> paths during spark-submit are not available in local mode.  My
> understanding is that all one is able to use is:
>
> spark.conf.set(“”)
>
> To set any runtime properties of the local instance.  Note that it is
> possible (and I am more convinced of this as time goes on) that alluxio
> simply does not work in spark local mode as described above.
>
>
> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <ge...@ibleducation.com>
> wrote:
>
> I fought with a ClassNotFoundException for quite some time, but it was
> for kafka.
>
> The final configuration that got everything working was running
> spark-submit with the following options:
>
> --jars "/path/to/.ivy2/jars/package.jar" \
> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
> --packages org.some.package:package_name:version
>
> While this was needed for me to run in cluster mode, it works equally
> well for client mode as well.
>
> One other note when needing to supplied multiple items to these args -
> --jars and --packages should be comma separated, --driver-class-path and
> extraClassPath should be : separated
>
> HTH
> ​
>
> On Fri, Apr 13, 2018 at 4:28 AM, jb44 <jb...@gmail.com> wrote:
>
>> Haoyuan -
>>
>> As I mentioned below, I've been through the documentation already.  It has
>> not helped me to resolve the issue.
>>
>> Here is what I have tried so far:
>>
>> - setting extraClassPath as explained below
>> - adding fs.alluxio.impl through sparkconf
>> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
>> this matters in my case)
>> - compiling the client from source
>>
>> Do you have any other suggestions on how to get this working?
>>
>> Thanks
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>

Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Jason Boorn <jb...@gmail.com>.
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  When I say “LOCAL” spark, I mean an instance of spark that is created by my driver program, and is not a cluster itself.  It means that my master node is “local”, and this mode is primarily used for testing.

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html>

While I am able to get alluxio working with spark-submit, I am unable to get it working when using local mode.  The mechanisms for setting class paths during spark-submit are not available in local mode.  My understanding is that all one is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible (and I am more convinced of this as time goes on) that alluxio simply does not work in spark local mode as described above.


> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <ge...@ibleducation.com> wrote:
> 
> I fought with a ClassNotFoundException for quite some time, but it was for kafka.
> 
> The final configuration that got everything working was running spark-submit with the following options:
> 
> --jars "/path/to/.ivy2/jars/package.jar" \
> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
> --packages org.some.package:package_name:version
> While this was needed for me to run in cluster mode, it works equally well for client mode as well.
> 
> One other note when needing to supplied multiple items to these args - --jars and --packages should be comma separated, --driver-class-path and extraClassPath should be : separated
> 
> HTH
> 
> 
> On Fri, Apr 13, 2018 at 4:28 AM, jb44 <jboorn@gmail.com <ma...@gmail.com>> wrote:
> Haoyuan -
> 
> As I mentioned below, I've been through the documentation already.  It has
> not helped me to resolve the issue.
> 
> Here is what I have tried so far:
> 
> - setting extraClassPath as explained below
> - adding fs.alluxio.impl through sparkconf
> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
> this matters in my case)
> - compiling the client from source 
> 
> Do you have any other suggestions on how to get this working?  
> 
> Thanks
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ <http://apache-spark-user-list.1001560.n3.nabble.com/>
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
> 


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Geoff Von Allmen <ge...@ibleducation.com>.
I fought with a ClassNotFoundException for quite some time, but it was for
kafka.

The final configuration that got everything working was running spark-submit
with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well
for client mode as well.

One other note when needing to supplied multiple items to these args -
--jars and --packages should be comma separated, --driver-class-path and
extraClassPath should be : separated

HTH
​

On Fri, Apr 13, 2018 at 4:28 AM, jb44 <jb...@gmail.com> wrote:

> Haoyuan -
>
> As I mentioned below, I've been through the documentation already.  It has
> not helped me to resolve the issue.
>
> Here is what I have tried so far:
>
> - setting extraClassPath as explained below
> - adding fs.alluxio.impl through sparkconf
> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
> this matters in my case)
> - compiling the client from source
>
> Do you have any other suggestions on how to get this working?
>
> Thanks
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by jb44 <jb...@gmail.com>.
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source 

Do you have any other suggestions on how to get this working?  

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark LOCAL mode and external jar (extraClassPath)

Posted by Haoyuan Li <ha...@gmail.com>.
This link should be helpful:
https://alluxio.org/docs/1.7/en/Running-Spark-on-Alluxio.html

Best regards,

Haoyuan (HY)

alluxio.com <http://bit.ly/2EmpC7u> | alluxio.org
<http://bit.ly/2G7XIIO> | powered
by Alluxio <http://bit.ly/2JD5Cwk>


On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jb...@gmail.com> wrote:

> I'm running spark in LOCAL mode and trying to get it to talk to alluxio.
> I'm
> getting the error: java.lang.ClassNotFoundException: Class
> alluxio.hadoop.FileSystem not found
> The cause of this error is apparently that Spark cannot find the alluxio
> client jar in its classpath.
>
> I have looked at the page here:
> https://www.alluxio.org/docs/master/en/Debugging-Guide.
> html#q-why-do-i-see-exceptions-like-javalangruntimeexception-
> javalangclassnotfoundexception-class-alluxiohadoopfilesystem-not-found
>
> Which details the steps to take in this situation, but I'm not finding
> success.
>
> According to Spark documentation, I can instance a local Spark like so:
>
> SparkSession.builder
>   .appName("App")
>   .getOrCreate
>
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
> sparkSession.conf.set("spark.executor.extraClassPath",
> ALLUXIO_SPARK_CLIENT)
>
> I have verified that the proper jar file exists in the right location on my
> local machine with:
> logger.error(sparkSession.conf.get("spark.driver.extraClassPath"))
> logger.error(sparkSession.conf.get("spark.executor.extraClassPath"))
>
> But I still get the error. Is there anything else I can do to figure out
> why
> Spark is not picking the library up?
>
> Please note I am not using spark-submit - I am aware of the methods for
> adding the client jar to a spark-submit job. My Spark instance is being
> created as local within my application and this is the use case I want to
> solve.
>
> As an FYI there is another application in the cluster which is connecting
> to
> my alluxio using the fs client and that all works fine. In that case,
> though, the fs client is being packaged as part of the application through
> standard sbt dependencies.
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>