You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Julien Carme <ju...@gmail.com> on 2014/03/25 18:05:55 UTC

Using an external jar in the driver, in yarn-standalone mode.

Hello,

I have been struggling for ages to use an external jar in my spark driver
program, in yarn-standalone mode. I just want to use in my main program,
outside the calls to spark functions, objects that are defined in another
jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the
spark-class arguments, I always end up with a "Class not found exception"
when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,

Re: Using an external jar in the driver, in yarn-standalone mode.

Posted by Sandy Ryza <sa...@cloudera.com>.
Andrew,
Spark automatically deploys the jar on the DFS cache if it's included with
the addJars option.  It then still needs to be SparkContext.addJar'd to get
it to the executors.

-Sandy


On Wed, Mar 26, 2014 at 6:14 AM, Julien Carme <ju...@gmail.com>wrote:

> Hello Andrew,
>
> Thanks for the tip, I accessed the Classpath Entries on the yarn
> monitoring (in case of yarn it is not localhost:4040 but
> yarn_master:8088//proxy/[application_id]/environment). I saw that my jar
> was actually on the CLASSPATH and was available to my application.
>
> I realized that I could not access my .jar because there were something
> wrong with it, it was only partially transfered to my cluster and was
> therefore not usable. I am confused.
>
> Sorry, and thanks for your help.
>
>
>
> 2014-03-26 1:01 GMT+01:00 Andrew Lee <al...@hotmail.com>:
>
>  Hi Julien,
>>
>> The ADD_JAR doesn't work in the command line. I checked spark-class, and
>> I couldn't find any Bash shell bringing in the variable ADD_JAR to the
>> CLASSPATH.
>>
>> Were you able to print out the properties and environment variables from
>> the Web GUI?
>>
>> localhost:4040
>>
>> This should give you an idea what is included in the current Spark shell.
>> The bin/spark-shell invokes bin/spark-class, and I don't see ADD_JAR in
>> bin/spark-class as well.
>>
>> Hi Sandy,
>>
>> Does Spark automatically deploy the JAR for you on the DFS cache if Spark
>> is running on cluster mode? I haven't got that far yet to deploy my own
>> one-time JAR for testing. Just setup a local cluster for practice.
>>
>>
>> ------------------------------
>> Date: Tue, 25 Mar 2014 23:13:58 +0100
>> Subject: Re: Using an external jar in the driver, in yarn-standalone mode.
>> From: julien.carme@gmail.com
>> To: user@spark.apache.org
>>
>>
>> Thanks for your answer.
>>
>> I am using
>> bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar
>> --class myclass ...
>>
>> myclass in myjar.jar contains a main that initializes a SparkContext in
>> yarn-standalone mode.
>>
>> Then I am using some code that uses myotherjar.jar, but I do not execute
>> it using the spark context or a RDD, so my understanding is that it is not
>> excuted on yarn slaves, only on the yarn master.
>>
>> I found no way to make my code being able to find myotherjar.jar.
>> CLASSPATH is set by Spark (or Yarn?) before being executed on the Yarn
>> Master, it is not set by me. It seems that the idea is to set
>> SPARK_CLASSPATH and/or ADD_JAR and then these jars becomes automatically
>> available in the Yarn Master but it did not work for me.
>>
>> I tried also to use sc.addJar, it did not work either, but anyway it
>> seems clear that this is used for dependancies in the code exectued on the
>> slaves, not on the master. Tell me if I am wrong
>>
>>
>>
>>
>>
>>
>> 2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <nk...@oculusinfo.com>
>> :
>>
>> by 'use ... my main program' I presume you mean you have a main function
>> in a class file you want to use as your entry point.
>>
>> SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the
>> workers... but they don't on the client.
>> For that, you're just using ordinary, everyday java/scala - so it just
>> has to be on the normal java classpath.
>>
>> Could that be your issue?
>>
>>           -Nathan
>>
>>
>>
>> On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>> Hi Julien,
>>
>> Have you called SparkContext#addJars?
>>
>> -Sandy
>>
>>
>> On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <ju...@gmail.com>wrote:
>>
>> Hello,
>>
>> I have been struggling for ages to use an external jar in my spark driver
>> program, in yarn-standalone mode. I just want to use in my main program,
>> outside the calls to spark functions, objects that are defined in another
>> jar.
>>
>> I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the
>> spark-class arguments, I always end up with a "Class not found exception"
>> when I want to use classes defined in my jar.
>>
>> Any ideas?
>>
>> Thanks a lot,
>>
>>
>>
>>
>>
>> --
>> Nathan Kronenfeld
>> Senior Visualization Developer
>> Oculus Info Inc
>> 2 Berkeley Street, Suite 600,
>> Toronto, Ontario M5A 4J5
>> Phone:  +1-416-203-3003 x 238
>> Email:  nkronenfeld@oculusinfo.com
>>
>>
>>
>

Re: Using an external jar in the driver, in yarn-standalone mode.

Posted by Julien Carme <ju...@gmail.com>.
Hello Andrew,

Thanks for the tip, I accessed the Classpath Entries on the yarn monitoring
(in case of yarn it is not localhost:4040 but
yarn_master:8088//proxy/[application_id]/environment). I saw that my jar
was actually on the CLASSPATH and was available to my application.

I realized that I could not access my .jar because there were something
wrong with it, it was only partially transfered to my cluster and was
therefore not usable. I am confused.

Sorry, and thanks for your help.



2014-03-26 1:01 GMT+01:00 Andrew Lee <al...@hotmail.com>:

> Hi Julien,
>
> The ADD_JAR doesn't work in the command line. I checked spark-class, and I
> couldn't find any Bash shell bringing in the variable ADD_JAR to the
> CLASSPATH.
>
> Were you able to print out the properties and environment variables from
> the Web GUI?
>
> localhost:4040
>
> This should give you an idea what is included in the current Spark shell.
> The bin/spark-shell invokes bin/spark-class, and I don't see ADD_JAR in
> bin/spark-class as well.
>
> Hi Sandy,
>
> Does Spark automatically deploy the JAR for you on the DFS cache if Spark
> is running on cluster mode? I haven't got that far yet to deploy my own
> one-time JAR for testing. Just setup a local cluster for practice.
>
>
> ------------------------------
> Date: Tue, 25 Mar 2014 23:13:58 +0100
> Subject: Re: Using an external jar in the driver, in yarn-standalone mode.
> From: julien.carme@gmail.com
> To: user@spark.apache.org
>
>
> Thanks for your answer.
>
> I am using
> bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar
> --class myclass ...
>
> myclass in myjar.jar contains a main that initializes a SparkContext in
> yarn-standalone mode.
>
> Then I am using some code that uses myotherjar.jar, but I do not execute
> it using the spark context or a RDD, so my understanding is that it is not
> excuted on yarn slaves, only on the yarn master.
>
> I found no way to make my code being able to find myotherjar.jar.
> CLASSPATH is set by Spark (or Yarn?) before being executed on the Yarn
> Master, it is not set by me. It seems that the idea is to set
> SPARK_CLASSPATH and/or ADD_JAR and then these jars becomes automatically
> available in the Yarn Master but it did not work for me.
>
> I tried also to use sc.addJar, it did not work either, but anyway it seems
> clear that this is used for dependancies in the code exectued on the
> slaves, not on the master. Tell me if I am wrong
>
>
>
>
>
>
> 2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <nk...@oculusinfo.com>:
>
> by 'use ... my main program' I presume you mean you have a main function
> in a class file you want to use as your entry point.
>
> SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the
> workers... but they don't on the client.
> For that, you're just using ordinary, everyday java/scala - so it just has
> to be on the normal java classpath.
>
> Could that be your issue?
>
>           -Nathan
>
>
>
> On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
> Hi Julien,
>
> Have you called SparkContext#addJars?
>
> -Sandy
>
>
> On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <ju...@gmail.com>wrote:
>
> Hello,
>
> I have been struggling for ages to use an external jar in my spark driver
> program, in yarn-standalone mode. I just want to use in my main program,
> outside the calls to spark functions, objects that are defined in another
> jar.
>
> I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the
> spark-class arguments, I always end up with a "Class not found exception"
> when I want to use classes defined in my jar.
>
> Any ideas?
>
> Thanks a lot,
>
>
>
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenfeld@oculusinfo.com
>
>
>

RE: Using an external jar in the driver, in yarn-standalone mode.

Posted by Andrew Lee <al...@hotmail.com>.
Hi Julien,
The ADD_JAR doesn't work in the command line. I checked spark-class, and I couldn't find any Bash shell bringing in the variable ADD_JAR to the CLASSPATH.
Were you able to print out the properties and environment variables from the Web GUI?
localhost:4040
This should give you an idea what is included in the current Spark shell. The bin/spark-shell invokes bin/spark-class, and I don't see ADD_JAR in bin/spark-class as well.
Hi Sandy,
Does Spark automatically deploy the JAR for you on the DFS cache if Spark is running on cluster mode? I haven't got that far yet to deploy my own one-time JAR for testing. Just setup a local cluster for practice.

Date: Tue, 25 Mar 2014 23:13:58 +0100
Subject: Re: Using an external jar in the driver, in yarn-standalone mode.
From: julien.carme@gmail.com
To: user@spark.apache.org

Thanks for your answer.
I am using bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar --class myclass ...

myclass in myjar.jar contains a main that initializes a SparkContext in yarn-standalone mode.

Then I am using some code that uses myotherjar.jar, but I do not execute it using the spark context or a RDD, so my understanding is that it is not excuted on yarn slaves, only on the yarn master. 

I found no way to make my code being able to find myotherjar.jar. CLASSPATH is set by Spark (or Yarn?) before being executed on the Yarn Master, it is not set by me. It seems that the idea is to set SPARK_CLASSPATH and/or ADD_JAR and then these jars becomes automatically available in the Yarn Master but it did not work for me. 

I tried also to use sc.addJar, it did not work either, but anyway it seems clear that this is used for dependancies in the code exectued on the slaves, not on the master. Tell me if I am wrong








2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <nk...@oculusinfo.com>:

by 'use ... my main program' I presume you mean you have a main function in a class file you want to use as your entry point.

SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the workers... but they don't on the client.
For that, you're just using ordinary, everyday java/scala - so it just has to be on the normal java classpath.
Could that be your issue?
          -Nathan




On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <sa...@cloudera.com> wrote:


Hi Julien,
Have you called SparkContext#addJars?
-Sandy



On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <ju...@gmail.com> wrote:



Hello,



I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.




I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.




Any ideas?




Thanks a lot,




-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,

Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238

Email:  nkronenfeld@oculusinfo.com



 		 	   		  

Re: Using an external jar in the driver, in yarn-standalone mode.

Posted by Julien Carme <ju...@gmail.com>.
Thanks for your answer.

I am using
bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar
--class myclass ...

myclass in myjar.jar contains a main that initializes a SparkContext in
yarn-standalone mode.

Then I am using some code that uses myotherjar.jar, but I do not execute it
using the spark context or a RDD, so my understanding is that it is not
excuted on yarn slaves, only on the yarn master.

I found no way to make my code being able to find myotherjar.jar. CLASSPATH
is set by Spark (or Yarn?) before being executed on the Yarn Master, it is
not set by me. It seems that the idea is to set SPARK_CLASSPATH and/or
ADD_JAR and then these jars becomes automatically available in the Yarn
Master but it did not work for me.

I tried also to use sc.addJar, it did not work either, but anyway it seems
clear that this is used for dependancies in the code exectued on the
slaves, not on the master. Tell me if I am wrong






2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <nk...@oculusinfo.com>:

> by 'use ... my main program' I presume you mean you have a main function
> in a class file you want to use as your entry point.
>
> SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the
> workers... but they don't on the client.
> For that, you're just using ordinary, everyday java/scala - so it just has
> to be on the normal java classpath.
>
> Could that be your issue?
>
>           -Nathan
>
>
>
> On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Julien,
>>
>> Have you called SparkContext#addJars?
>>
>> -Sandy
>>
>>
>> On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <ju...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have been struggling for ages to use an external jar in my spark
>>> driver program, in yarn-standalone mode. I just want to use in my main
>>> program, outside the calls to spark functions, objects that are defined in
>>> another jar.
>>>
>>> I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the
>>> spark-class arguments, I always end up with a "Class not found exception"
>>> when I want to use classes defined in my jar.
>>>
>>> Any ideas?
>>>
>>> Thanks a lot,
>>>
>>
>>
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenfeld@oculusinfo.com
>

Re: Using an external jar in the driver, in yarn-standalone mode.

Posted by Nathan Kronenfeld <nk...@oculusinfo.com>.
by 'use ... my main program' I presume you mean you have a main function in
a class file you want to use as your entry point.

SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the
workers... but they don't on the client.
For that, you're just using ordinary, everyday java/scala - so it just has
to be on the normal java classpath.

Could that be your issue?

          -Nathan



On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Julien,
>
> Have you called SparkContext#addJars?
>
> -Sandy
>
>
> On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <ju...@gmail.com>wrote:
>
>> Hello,
>>
>> I have been struggling for ages to use an external jar in my spark driver
>> program, in yarn-standalone mode. I just want to use in my main program,
>> outside the calls to spark functions, objects that are defined in another
>> jar.
>>
>> I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the
>> spark-class arguments, I always end up with a "Class not found exception"
>> when I want to use classes defined in my jar.
>>
>> Any ideas?
>>
>> Thanks a lot,
>>
>
>


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenfeld@oculusinfo.com

Re: Using an external jar in the driver, in yarn-standalone mode.

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Julien,

Have you called SparkContext#addJars?

-Sandy


On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <ju...@gmail.com>wrote:

> Hello,
>
> I have been struggling for ages to use an external jar in my spark driver
> program, in yarn-standalone mode. I just want to use in my main program,
> outside the calls to spark functions, objects that are defined in another
> jar.
>
> I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the
> spark-class arguments, I always end up with a "Class not found exception"
> when I want to use classes defined in my jar.
>
> Any ideas?
>
> Thanks a lot,
>