You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Christophe Préaud <ch...@kelkoo.com> on 2014/04/16 18:27:46 UTC
SPARK_YARN_APP_JAR, SPARK_CLASSPATH and ADD_JARS in a spark-shell
on YARN
Hi,
I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is the
correct way to add external jars when running a spark shell on a YARN cluster.
Packaging all this dependencies in an assembly which path is then set in
SPARK_YARN_APP_JAR (as written in the doc:
http://spark.apache.org/docs/latest/running-on-yarn.html) does not work in my
case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the
spark-shell is still unable to find it (unless ADD_JARS and/or SPARK_CLASSPATH
is defined)
Defining all the dependencies (either in an assembly, or separately) in ADD_JARS
or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to /dev/null), but
defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH does not!
Hence I'm still wondering which are the differences between ADD_JARS and
SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR.
Thanks for any insights!
Christophe.
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: SPARK_YARN_APP_JAR, SPARK_CLASSPATH and ADD_JARS in a spark-shell
on YARN
Posted by Christophe Préaud <ch...@kelkoo.com>.
Good to know, thanks for pointing this out to me!
On 23/04/2014 19:55, Sandy Ryza wrote:
Ah, you're right about SPARK_CLASSPATH and ADD_JARS. My bad.
SPARK_YARN_APP_JAR is going away entirely - https://issues.apache.org/jira/browse/SPARK-1053
On Wed, Apr 23, 2014 at 8:07 AM, Christophe Préaud <ch...@kelkoo.com>> wrote:
Hi Sandy,
Thanks for your reply !
I thought adding the jars in both SPARK_CLASSPATH and ADD_JARS was only required as a temporary workaround in spark 0.9.0 (see https://issues.apache.org/jira/browse/SPARK-1089), and that it was not necessary anymore in 0.9.1
As for SPARK_YARN_APP_JAR, is it really useful, or is it planned to be removed in future versions of Spark? I personally always set it to /dev/null when launching a spark-shell in yarn-client mode.
Thanks again for your time!
Christophe.
On 21/04/2014 19:16, Sandy Ryza wrote:
Hi Christophe,
Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required. The former makes them available to the spark-shell driver process, and the latter tells Spark to make them available to the executor processes running on the cluster.
-Sandy
On Wed, Apr 16, 2014 at 9:27 AM, Christophe Préaud <ch...@kelkoo.com>> wrote:
Hi,
I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is the
correct way to add external jars when running a spark shell on a YARN cluster.
Packaging all this dependencies in an assembly which path is then set in
SPARK_YARN_APP_JAR (as written in the doc:
http://spark.apache.org/docs/latest/running-on-yarn.html) does not work in my
case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the
spark-shell is still unable to find it (unless ADD_JARS and/or SPARK_CLASSPATH
is defined)
Defining all the dependencies (either in an assembly, or separately) in ADD_JARS
or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to /dev/null), but
defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH does not!
Hence I'm still wondering which are the differences between ADD_JARS and
SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR.
Thanks for any insights!
Christophe.
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: SPARK_YARN_APP_JAR, SPARK_CLASSPATH and ADD_JARS in a spark-shell
on YARN
Posted by Sandy Ryza <sa...@cloudera.com>.
Ah, you're right about SPARK_CLASSPATH and ADD_JARS. My bad.
SPARK_YARN_APP_JAR is going away entirely -
https://issues.apache.org/jira/browse/SPARK-1053
On Wed, Apr 23, 2014 at 8:07 AM, Christophe Préaud <
christophe.preaud@kelkoo.com> wrote:
> Hi Sandy,
>
> Thanks for your reply !
>
> I thought adding the jars in both SPARK_CLASSPATH and ADD_JARS was only
> required as a temporary workaround in spark 0.9.0 (see
> https://issues.apache.org/jira/browse/SPARK-1089), and that it was not
> necessary anymore in 0.9.1
>
> As for SPARK_YARN_APP_JAR, is it really useful, or is it planned to be
> removed in future versions of Spark? I personally always set it to
> /dev/null when launching a spark-shell in yarn-client mode.
>
> Thanks again for your time!
> Christophe.
>
>
> On 21/04/2014 19:16, Sandy Ryza wrote:
>
> Hi Christophe,
>
> Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required. The
> former makes them available to the spark-shell driver process, and the
> latter tells Spark to make them available to the executor processes running
> on the cluster.
>
> -Sandy
>
>
> On Wed, Apr 16, 2014 at 9:27 AM, Christophe Préaud <
> christophe.preaud@kelkoo.com> wrote:
>
>> Hi,
>>
>> I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is
>> the
>> correct way to add external jars when running a spark shell on a YARN
>> cluster.
>>
>> Packaging all this dependencies in an assembly which path is then set in
>> SPARK_YARN_APP_JAR (as written in the doc:
>> http://spark.apache.org/docs/latest/running-on-yarn.html) does not work
>> in my
>> case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the
>> spark-shell is still unable to find it (unless ADD_JARS and/or
>> SPARK_CLASSPATH
>> is defined)
>>
>> Defining all the dependencies (either in an assembly, or separately) in
>> ADD_JARS
>> or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to
>> /dev/null), but
>> defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH
>> does not!
>>
>> Hence I'm still wondering which are the differences between ADD_JARS and
>> SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR.
>>
>> Thanks for any insights!
>> Christophe.
>>
>>
>>
>> Kelkoo SAS
>> Société par Actions Simplifiée
>> Au capital de EURO 4.168.964,30
>> Siège social : 8, rue du Sentier 75002 Paris
>> 425 093 069 RCS Paris
>>
>> Ce message et les pièces jointes sont confidentiels et établis à
>> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
>> destinataire de ce message, merci de le détruire et d'en avertir
>> l'expéditeur.
>>
>
>
>
> ------------------------------
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de EURO 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>
Re: SPARK_YARN_APP_JAR, SPARK_CLASSPATH and ADD_JARS in a spark-shell
on YARN
Posted by Christophe Préaud <ch...@kelkoo.com>.
Hi Sandy,
Thanks for your reply !
I thought adding the jars in both SPARK_CLASSPATH and ADD_JARS was only required as a temporary workaround in spark 0.9.0 (see https://issues.apache.org/jira/browse/SPARK-1089), and that it was not necessary anymore in 0.9.1
As for SPARK_YARN_APP_JAR, is it really useful, or is it planned to be removed in future versions of Spark? I personally always set it to /dev/null when launching a spark-shell in yarn-client mode.
Thanks again for your time!
Christophe.
On 21/04/2014 19:16, Sandy Ryza wrote:
Hi Christophe,
Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required. The former makes them available to the spark-shell driver process, and the latter tells Spark to make them available to the executor processes running on the cluster.
-Sandy
On Wed, Apr 16, 2014 at 9:27 AM, Christophe Préaud <ch...@kelkoo.com>> wrote:
Hi,
I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is the
correct way to add external jars when running a spark shell on a YARN cluster.
Packaging all this dependencies in an assembly which path is then set in
SPARK_YARN_APP_JAR (as written in the doc:
http://spark.apache.org/docs/latest/running-on-yarn.html) does not work in my
case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the
spark-shell is still unable to find it (unless ADD_JARS and/or SPARK_CLASSPATH
is defined)
Defining all the dependencies (either in an assembly, or separately) in ADD_JARS
or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to /dev/null), but
defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH does not!
Hence I'm still wondering which are the differences between ADD_JARS and
SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR.
Thanks for any insights!
Christophe.
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: SPARK_YARN_APP_JAR, SPARK_CLASSPATH and ADD_JARS in a spark-shell
on YARN
Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Christophe,
Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required. The
former makes them available to the spark-shell driver process, and the
latter tells Spark to make them available to the executor processes running
on the cluster.
-Sandy
On Wed, Apr 16, 2014 at 9:27 AM, Christophe Préaud <
christophe.preaud@kelkoo.com> wrote:
> Hi,
>
> I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is the
> correct way to add external jars when running a spark shell on a YARN
> cluster.
>
> Packaging all this dependencies in an assembly which path is then set in
> SPARK_YARN_APP_JAR (as written in the doc:
> http://spark.apache.org/docs/latest/running-on-yarn.html) does not work
> in my
> case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the
> spark-shell is still unable to find it (unless ADD_JARS and/or
> SPARK_CLASSPATH
> is defined)
>
> Defining all the dependencies (either in an assembly, or separately) in
> ADD_JARS
> or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to /dev/null),
> but
> defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH
> does not!
>
> Hence I'm still wondering which are the differences between ADD_JARS and
> SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR.
>
> Thanks for any insights!
> Christophe.
>
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de EURO 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>