You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2017/07/17 15:41:50 UTC

running spark job with fat jar file

hi guys,


an uber/fat jar file has been created to run with spark in CDH yarc client
mode.

As usual job is submitted to the edge node.

does the jar file has to be placed in the same directory ewith spark is
running in the cluster to make it work?

Also what will happen if say out of 9 nodes running spark, 3 have not got
the jar file. will that job fail or it will carry on on the fremaing 6
nodes that have that jar file?

thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: running spark job with fat jar file

Posted by ayan guha <gu...@gmail.com>.
Hi Mitch - YARN uses a specific folder convention comprising application
id, container id, attempt number and so on. Once you run a spark-submit
using Yarn, you can see your application in Yarn RM UI page. Once the app
finishes, you can see all logs using

yarn logs -applicationId <app_id>

In this log, you can see all details of transient folders, what goes where
and so on.

These local folders get created on OS filesystem, not on HDFS. But they are
transient so once your job finishes, Yarn cleans them up.

On Tue, Jul 18, 2017 at 5:46 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> great Ayan.
>
> Is that local folder on HDFS? Will that be a hidden folder specific to the
> user executing the spark job?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 17 July 2017 at 19:34, ayan guha <gu...@gmail.com> wrote:
>
>> Hi
>>
>> Here is my understanding:
>>
>> 1. For each container, there will be a local folder be created and
>> application jar will be copied over there
>> 2. Jars mentioned in --jars switch will be copied over to container to
>> the class path of the application.
>>
>> So to your question, --jars is not required to be copied over to all
>> nodes during submission time. YARN will take care of it.
>>
>> Best
>> Ayan
>>
>> On Tue, Jul 18, 2017 at 4:10 AM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>>
>>> Yes.
>>>
>>> On Mon, Jul 17, 2017 at 10:47 AM, Mich Talebzadeh
>>> <mi...@gmail.com> wrote:
>>> > thanks Marcelo.
>>> >
>>> > are these files distributed through hdfs?
>>> >
>>> > Dr Mich Talebzadeh
>>> >
>>> >
>>> >
>>> > LinkedIn
>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>> d6zP6AcPCCdOABUrV8Pw
>>> >
>>> >
>>> >
>>> > http://talebzadehmich.wordpress.com
>>> >
>>> >
>>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>>> > loss, damage or destruction of data or any other property which may
>>> arise
>>> > from relying on this email's technical content is explicitly
>>> disclaimed. The
>>> > author will in no case be liable for any monetary damages arising from
>>> such
>>> > loss, damage or destruction.
>>> >
>>> >
>>> >
>>> >
>>> > On 17 July 2017 at 18:46, Marcelo Vanzin <va...@cloudera.com> wrote:
>>> >>
>>> >> The YARN backend distributes all files and jars you submit with your
>>> >> application.
>>> >>
>>> >> On Mon, Jul 17, 2017 at 10:45 AM, Mich Talebzadeh
>>> >> <mi...@gmail.com> wrote:
>>> >> > thanks guys.
>>> >> >
>>> >> > just to clarify let us assume i am doing spark-submit as below:
>>> >> >
>>> >> > ${SPARK_HOME}/bin/spark-submit \
>>> >> >                 --packages ${PACKAGES} \
>>> >> >                 --driver-memory 2G \
>>> >> >                 --num-executors 2 \
>>> >> >                 --executor-memory 2G \
>>> >> >                 --executor-cores 2 \
>>> >> >                 --master yarn \
>>> >> >                 --deploy-mode client \
>>> >> >                 --conf "${SCHEDULER}" \
>>> >> >                 --conf "${EXTRAJAVAOPTIONS}" \
>>> >> >                 --jars ${JARS} \
>>> >> >                 --class "${FILE_NAME}" \
>>> >> >                 --conf "${SPARKUIPORT}" \
>>> >> >                 --conf "${SPARKDRIVERPORT}" \
>>> >> >                 --conf "${SPARKFILESERVERPORT}" \
>>> >> >                 --conf "${SPARKBLOCKMANAGERPORT}" \
>>> >> >                 --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
>>> >> >                 ${JAR_FILE}
>>> >> >
>>> >> > The ${JAR_FILE} is the one. As I understand Spark should distribute
>>> that
>>> >> > ${JAR_FILE} to each container?
>>> >> >
>>> >> > Also --jars ${JARS} are the list of normal jar files that need to
>>> exist
>>> >> > in
>>> >> > the same directory on each executor node?
>>> >> >
>>> >> > cheers,
>>> >> >
>>> >> >
>>> >> >
>>> >> > Dr Mich Talebzadeh
>>> >> >
>>> >> >
>>> >> >
>>> >> > LinkedIn
>>> >> >
>>> >> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>> d6zP6AcPCCdOABUrV8Pw
>>> >> >
>>> >> >
>>> >> >
>>> >> > http://talebzadehmich.wordpress.com
>>> >> >
>>> >> >
>>> >> > Disclaimer: Use it at your own risk. Any and all responsibility for
>>> any
>>> >> > loss, damage or destruction of data or any other property which may
>>> >> > arise
>>> >> > from relying on this email's technical content is explicitly
>>> disclaimed.
>>> >> > The
>>> >> > author will in no case be liable for any monetary damages arising
>>> from
>>> >> > such
>>> >> > loss, damage or destruction.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On 17 July 2017 at 18:18, ayan guha <gu...@gmail.com> wrote:
>>> >> >>
>>> >> >> Hi Mitch
>>> >> >>
>>> >> >> your jar file can be anywhere in the file system, including hdfs.
>>> >> >>
>>> >> >> If using yarn, preferably use cluster mode in terms of deployment.
>>> >> >>
>>> >> >> Yarn will distribute the jar to each container.
>>> >> >>
>>> >> >> Best
>>> >> >> Ayan
>>> >> >>
>>> >> >> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <
>>> vanzin@cloudera.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Spark distributes your application jar for you.
>>> >> >>>
>>> >> >>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
>>> >> >>> <mi...@gmail.com> wrote:
>>> >> >>> > hi guys,
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > an uber/fat jar file has been created to run with spark in CDH
>>> yarc
>>> >> >>> > client
>>> >> >>> > mode.
>>> >> >>> >
>>> >> >>> > As usual job is submitted to the edge node.
>>> >> >>> >
>>> >> >>> > does the jar file has to be placed in the same directory ewith
>>> spark
>>> >> >>> > is
>>> >> >>> > running in the cluster to make it work?
>>> >> >>> >
>>> >> >>> > Also what will happen if say out of 9 nodes running spark, 3
>>> have
>>> >> >>> > not
>>> >> >>> > got
>>> >> >>> > the jar file. will that job fail or it will carry on on the
>>> fremaing
>>> >> >>> > 6
>>> >> >>> > nodes
>>> >> >>> > that have that jar file?
>>> >> >>> >
>>> >> >>> > thanks
>>> >> >>> >
>>> >> >>> > Dr Mich Talebzadeh
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > LinkedIn
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>> d6zP6AcPCCdOABUrV8Pw
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > http://talebzadehmich.wordpress.com
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > Disclaimer: Use it at your own risk. Any and all responsibility
>>> for
>>> >> >>> > any
>>> >> >>> > loss, damage or destruction of data or any other property which
>>> may
>>> >> >>> > arise
>>> >> >>> > from relying on this email's technical content is explicitly
>>> >> >>> > disclaimed. The
>>> >> >>> > author will in no case be liable for any monetary damages
>>> arising
>>> >> >>> > from
>>> >> >>> > such
>>> >> >>> > loss, damage or destruction.
>>> >> >>> >
>>> >> >>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Marcelo
>>> >> >>>
>>> >> >>> ------------------------------------------------------------
>>> ---------
>>> >> >>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>> >> >>>
>>> >> >> --
>>> >> >> Best Regards,
>>> >> >> Ayan Guha
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: running spark job with fat jar file

Posted by Mich Talebzadeh <mi...@gmail.com>.
great Ayan.

Is that local folder on HDFS? Will that be a hidden folder specific to the
user executing the spark job?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 July 2017 at 19:34, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> Here is my understanding:
>
> 1. For each container, there will be a local folder be created and
> application jar will be copied over there
> 2. Jars mentioned in --jars switch will be copied over to container to the
> class path of the application.
>
> So to your question, --jars is not required to be copied over to all nodes
> during submission time. YARN will take care of it.
>
> Best
> Ayan
>
> On Tue, Jul 18, 2017 at 4:10 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> Yes.
>>
>> On Mon, Jul 17, 2017 at 10:47 AM, Mich Talebzadeh
>> <mi...@gmail.com> wrote:
>> > thanks Marcelo.
>> >
>> > are these files distributed through hdfs?
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>> d6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly
>> disclaimed. The
>> > author will in no case be liable for any monetary damages arising from
>> such
>> > loss, damage or destruction.
>> >
>> >
>> >
>> >
>> > On 17 July 2017 at 18:46, Marcelo Vanzin <va...@cloudera.com> wrote:
>> >>
>> >> The YARN backend distributes all files and jars you submit with your
>> >> application.
>> >>
>> >> On Mon, Jul 17, 2017 at 10:45 AM, Mich Talebzadeh
>> >> <mi...@gmail.com> wrote:
>> >> > thanks guys.
>> >> >
>> >> > just to clarify let us assume i am doing spark-submit as below:
>> >> >
>> >> > ${SPARK_HOME}/bin/spark-submit \
>> >> >                 --packages ${PACKAGES} \
>> >> >                 --driver-memory 2G \
>> >> >                 --num-executors 2 \
>> >> >                 --executor-memory 2G \
>> >> >                 --executor-cores 2 \
>> >> >                 --master yarn \
>> >> >                 --deploy-mode client \
>> >> >                 --conf "${SCHEDULER}" \
>> >> >                 --conf "${EXTRAJAVAOPTIONS}" \
>> >> >                 --jars ${JARS} \
>> >> >                 --class "${FILE_NAME}" \
>> >> >                 --conf "${SPARKUIPORT}" \
>> >> >                 --conf "${SPARKDRIVERPORT}" \
>> >> >                 --conf "${SPARKFILESERVERPORT}" \
>> >> >                 --conf "${SPARKBLOCKMANAGERPORT}" \
>> >> >                 --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
>> >> >                 ${JAR_FILE}
>> >> >
>> >> > The ${JAR_FILE} is the one. As I understand Spark should distribute
>> that
>> >> > ${JAR_FILE} to each container?
>> >> >
>> >> > Also --jars ${JARS} are the list of normal jar files that need to
>> exist
>> >> > in
>> >> > the same directory on each executor node?
>> >> >
>> >> > cheers,
>> >> >
>> >> >
>> >> >
>> >> > Dr Mich Talebzadeh
>> >> >
>> >> >
>> >> >
>> >> > LinkedIn
>> >> >
>> >> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>> d6zP6AcPCCdOABUrV8Pw
>> >> >
>> >> >
>> >> >
>> >> > http://talebzadehmich.wordpress.com
>> >> >
>> >> >
>> >> > Disclaimer: Use it at your own risk. Any and all responsibility for
>> any
>> >> > loss, damage or destruction of data or any other property which may
>> >> > arise
>> >> > from relying on this email's technical content is explicitly
>> disclaimed.
>> >> > The
>> >> > author will in no case be liable for any monetary damages arising
>> from
>> >> > such
>> >> > loss, damage or destruction.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On 17 July 2017 at 18:18, ayan guha <gu...@gmail.com> wrote:
>> >> >>
>> >> >> Hi Mitch
>> >> >>
>> >> >> your jar file can be anywhere in the file system, including hdfs.
>> >> >>
>> >> >> If using yarn, preferably use cluster mode in terms of deployment.
>> >> >>
>> >> >> Yarn will distribute the jar to each container.
>> >> >>
>> >> >> Best
>> >> >> Ayan
>> >> >>
>> >> >> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <vanzin@cloudera.com
>> >
>> >> >> wrote:
>> >> >>>
>> >> >>> Spark distributes your application jar for you.
>> >> >>>
>> >> >>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
>> >> >>> <mi...@gmail.com> wrote:
>> >> >>> > hi guys,
>> >> >>> >
>> >> >>> >
>> >> >>> > an uber/fat jar file has been created to run with spark in CDH
>> yarc
>> >> >>> > client
>> >> >>> > mode.
>> >> >>> >
>> >> >>> > As usual job is submitted to the edge node.
>> >> >>> >
>> >> >>> > does the jar file has to be placed in the same directory ewith
>> spark
>> >> >>> > is
>> >> >>> > running in the cluster to make it work?
>> >> >>> >
>> >> >>> > Also what will happen if say out of 9 nodes running spark, 3 have
>> >> >>> > not
>> >> >>> > got
>> >> >>> > the jar file. will that job fail or it will carry on on the
>> fremaing
>> >> >>> > 6
>> >> >>> > nodes
>> >> >>> > that have that jar file?
>> >> >>> >
>> >> >>> > thanks
>> >> >>> >
>> >> >>> > Dr Mich Talebzadeh
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > LinkedIn
>> >> >>> >
>> >> >>> >
>> >> >>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>> d6zP6AcPCCdOABUrV8Pw
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > http://talebzadehmich.wordpress.com
>> >> >>> >
>> >> >>> >
>> >> >>> > Disclaimer: Use it at your own risk. Any and all responsibility
>> for
>> >> >>> > any
>> >> >>> > loss, damage or destruction of data or any other property which
>> may
>> >> >>> > arise
>> >> >>> > from relying on this email's technical content is explicitly
>> >> >>> > disclaimed. The
>> >> >>> > author will in no case be liable for any monetary damages arising
>> >> >>> > from
>> >> >>> > such
>> >> >>> > loss, damage or destruction.
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Marcelo
>> >> >>>
>> >> >>> ------------------------------------------------------------
>> ---------
>> >> >>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >> >>>
>> >> >> --
>> >> >> Best Regards,
>> >> >> Ayan Guha
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: running spark job with fat jar file

Posted by ayan guha <gu...@gmail.com>.
Hi

Here is my understanding:

1. For each container, there will be a local folder be created and
application jar will be copied over there
2. Jars mentioned in --jars switch will be copied over to container to the
class path of the application.

So to your question, --jars is not required to be copied over to all nodes
during submission time. YARN will take care of it.

Best
Ayan

On Tue, Jul 18, 2017 at 4:10 AM, Marcelo Vanzin <va...@cloudera.com> wrote:

> Yes.
>
> On Mon, Jul 17, 2017 at 10:47 AM, Mich Talebzadeh
> <mi...@gmail.com> wrote:
> > thanks Marcelo.
> >
> > are these files distributed through hdfs?
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
> > On 17 July 2017 at 18:46, Marcelo Vanzin <va...@cloudera.com> wrote:
> >>
> >> The YARN backend distributes all files and jars you submit with your
> >> application.
> >>
> >> On Mon, Jul 17, 2017 at 10:45 AM, Mich Talebzadeh
> >> <mi...@gmail.com> wrote:
> >> > thanks guys.
> >> >
> >> > just to clarify let us assume i am doing spark-submit as below:
> >> >
> >> > ${SPARK_HOME}/bin/spark-submit \
> >> >                 --packages ${PACKAGES} \
> >> >                 --driver-memory 2G \
> >> >                 --num-executors 2 \
> >> >                 --executor-memory 2G \
> >> >                 --executor-cores 2 \
> >> >                 --master yarn \
> >> >                 --deploy-mode client \
> >> >                 --conf "${SCHEDULER}" \
> >> >                 --conf "${EXTRAJAVAOPTIONS}" \
> >> >                 --jars ${JARS} \
> >> >                 --class "${FILE_NAME}" \
> >> >                 --conf "${SPARKUIPORT}" \
> >> >                 --conf "${SPARKDRIVERPORT}" \
> >> >                 --conf "${SPARKFILESERVERPORT}" \
> >> >                 --conf "${SPARKBLOCKMANAGERPORT}" \
> >> >                 --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
> >> >                 ${JAR_FILE}
> >> >
> >> > The ${JAR_FILE} is the one. As I understand Spark should distribute
> that
> >> > ${JAR_FILE} to each container?
> >> >
> >> > Also --jars ${JARS} are the list of normal jar files that need to
> exist
> >> > in
> >> > the same directory on each executor node?
> >> >
> >> > cheers,
> >> >
> >> >
> >> >
> >> > Dr Mich Talebzadeh
> >> >
> >> >
> >> >
> >> > LinkedIn
> >> >
> >> > https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> >
> >> >
> >> >
> >> > http://talebzadehmich.wordpress.com
> >> >
> >> >
> >> > Disclaimer: Use it at your own risk. Any and all responsibility for
> any
> >> > loss, damage or destruction of data or any other property which may
> >> > arise
> >> > from relying on this email's technical content is explicitly
> disclaimed.
> >> > The
> >> > author will in no case be liable for any monetary damages arising from
> >> > such
> >> > loss, damage or destruction.
> >> >
> >> >
> >> >
> >> >
> >> > On 17 July 2017 at 18:18, ayan guha <gu...@gmail.com> wrote:
> >> >>
> >> >> Hi Mitch
> >> >>
> >> >> your jar file can be anywhere in the file system, including hdfs.
> >> >>
> >> >> If using yarn, preferably use cluster mode in terms of deployment.
> >> >>
> >> >> Yarn will distribute the jar to each container.
> >> >>
> >> >> Best
> >> >> Ayan
> >> >>
> >> >> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <va...@cloudera.com>
> >> >> wrote:
> >> >>>
> >> >>> Spark distributes your application jar for you.
> >> >>>
> >> >>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
> >> >>> <mi...@gmail.com> wrote:
> >> >>> > hi guys,
> >> >>> >
> >> >>> >
> >> >>> > an uber/fat jar file has been created to run with spark in CDH
> yarc
> >> >>> > client
> >> >>> > mode.
> >> >>> >
> >> >>> > As usual job is submitted to the edge node.
> >> >>> >
> >> >>> > does the jar file has to be placed in the same directory ewith
> spark
> >> >>> > is
> >> >>> > running in the cluster to make it work?
> >> >>> >
> >> >>> > Also what will happen if say out of 9 nodes running spark, 3 have
> >> >>> > not
> >> >>> > got
> >> >>> > the jar file. will that job fail or it will carry on on the
> fremaing
> >> >>> > 6
> >> >>> > nodes
> >> >>> > that have that jar file?
> >> >>> >
> >> >>> > thanks
> >> >>> >
> >> >>> > Dr Mich Talebzadeh
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > LinkedIn
> >> >>> >
> >> >>> >
> >> >>> > https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > http://talebzadehmich.wordpress.com
> >> >>> >
> >> >>> >
> >> >>> > Disclaimer: Use it at your own risk. Any and all responsibility
> for
> >> >>> > any
> >> >>> > loss, damage or destruction of data or any other property which
> may
> >> >>> > arise
> >> >>> > from relying on this email's technical content is explicitly
> >> >>> > disclaimed. The
> >> >>> > author will in no case be liable for any monetary damages arising
> >> >>> > from
> >> >>> > such
> >> >>> > loss, damage or destruction.
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Marcelo
> >> >>>
> >> >>> ------------------------------------------------------------
> ---------
> >> >>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >> >>>
> >> >> --
> >> >> Best Regards,
> >> >> Ayan Guha
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Marcelo
> >
> >
>
>
>
> --
> Marcelo
>



-- 
Best Regards,
Ayan Guha

Re: running spark job with fat jar file

Posted by Marcelo Vanzin <va...@cloudera.com>.
Yes.

On Mon, Jul 17, 2017 at 10:47 AM, Mich Talebzadeh
<mi...@gmail.com> wrote:
> thanks Marcelo.
>
> are these files distributed through hdfs?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 17 July 2017 at 18:46, Marcelo Vanzin <va...@cloudera.com> wrote:
>>
>> The YARN backend distributes all files and jars you submit with your
>> application.
>>
>> On Mon, Jul 17, 2017 at 10:45 AM, Mich Talebzadeh
>> <mi...@gmail.com> wrote:
>> > thanks guys.
>> >
>> > just to clarify let us assume i am doing spark-submit as below:
>> >
>> > ${SPARK_HOME}/bin/spark-submit \
>> >                 --packages ${PACKAGES} \
>> >                 --driver-memory 2G \
>> >                 --num-executors 2 \
>> >                 --executor-memory 2G \
>> >                 --executor-cores 2 \
>> >                 --master yarn \
>> >                 --deploy-mode client \
>> >                 --conf "${SCHEDULER}" \
>> >                 --conf "${EXTRAJAVAOPTIONS}" \
>> >                 --jars ${JARS} \
>> >                 --class "${FILE_NAME}" \
>> >                 --conf "${SPARKUIPORT}" \
>> >                 --conf "${SPARKDRIVERPORT}" \
>> >                 --conf "${SPARKFILESERVERPORT}" \
>> >                 --conf "${SPARKBLOCKMANAGERPORT}" \
>> >                 --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
>> >                 ${JAR_FILE}
>> >
>> > The ${JAR_FILE} is the one. As I understand Spark should distribute that
>> > ${JAR_FILE} to each container?
>> >
>> > Also --jars ${JARS} are the list of normal jar files that need to exist
>> > in
>> > the same directory on each executor node?
>> >
>> > cheers,
>> >
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> > arise
>> > from relying on this email's technical content is explicitly disclaimed.
>> > The
>> > author will in no case be liable for any monetary damages arising from
>> > such
>> > loss, damage or destruction.
>> >
>> >
>> >
>> >
>> > On 17 July 2017 at 18:18, ayan guha <gu...@gmail.com> wrote:
>> >>
>> >> Hi Mitch
>> >>
>> >> your jar file can be anywhere in the file system, including hdfs.
>> >>
>> >> If using yarn, preferably use cluster mode in terms of deployment.
>> >>
>> >> Yarn will distribute the jar to each container.
>> >>
>> >> Best
>> >> Ayan
>> >>
>> >> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <va...@cloudera.com>
>> >> wrote:
>> >>>
>> >>> Spark distributes your application jar for you.
>> >>>
>> >>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
>> >>> <mi...@gmail.com> wrote:
>> >>> > hi guys,
>> >>> >
>> >>> >
>> >>> > an uber/fat jar file has been created to run with spark in CDH yarc
>> >>> > client
>> >>> > mode.
>> >>> >
>> >>> > As usual job is submitted to the edge node.
>> >>> >
>> >>> > does the jar file has to be placed in the same directory ewith spark
>> >>> > is
>> >>> > running in the cluster to make it work?
>> >>> >
>> >>> > Also what will happen if say out of 9 nodes running spark, 3 have
>> >>> > not
>> >>> > got
>> >>> > the jar file. will that job fail or it will carry on on the fremaing
>> >>> > 6
>> >>> > nodes
>> >>> > that have that jar file?
>> >>> >
>> >>> > thanks
>> >>> >
>> >>> > Dr Mich Talebzadeh
>> >>> >
>> >>> >
>> >>> >
>> >>> > LinkedIn
>> >>> >
>> >>> >
>> >>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >>> >
>> >>> >
>> >>> >
>> >>> > http://talebzadehmich.wordpress.com
>> >>> >
>> >>> >
>> >>> > Disclaimer: Use it at your own risk. Any and all responsibility for
>> >>> > any
>> >>> > loss, damage or destruction of data or any other property which may
>> >>> > arise
>> >>> > from relying on this email's technical content is explicitly
>> >>> > disclaimed. The
>> >>> > author will in no case be liable for any monetary damages arising
>> >>> > from
>> >>> > such
>> >>> > loss, damage or destruction.
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Marcelo
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >>>
>> >> --
>> >> Best Regards,
>> >> Ayan Guha
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: running spark job with fat jar file

Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks Marcelo.

are these files distributed through hdfs?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 July 2017 at 18:46, Marcelo Vanzin <va...@cloudera.com> wrote:

> The YARN backend distributes all files and jars you submit with your
> application.
>
> On Mon, Jul 17, 2017 at 10:45 AM, Mich Talebzadeh
> <mi...@gmail.com> wrote:
> > thanks guys.
> >
> > just to clarify let us assume i am doing spark-submit as below:
> >
> > ${SPARK_HOME}/bin/spark-submit \
> >                 --packages ${PACKAGES} \
> >                 --driver-memory 2G \
> >                 --num-executors 2 \
> >                 --executor-memory 2G \
> >                 --executor-cores 2 \
> >                 --master yarn \
> >                 --deploy-mode client \
> >                 --conf "${SCHEDULER}" \
> >                 --conf "${EXTRAJAVAOPTIONS}" \
> >                 --jars ${JARS} \
> >                 --class "${FILE_NAME}" \
> >                 --conf "${SPARKUIPORT}" \
> >                 --conf "${SPARKDRIVERPORT}" \
> >                 --conf "${SPARKFILESERVERPORT}" \
> >                 --conf "${SPARKBLOCKMANAGERPORT}" \
> >                 --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
> >                 ${JAR_FILE}
> >
> > The ${JAR_FILE} is the one. As I understand Spark should distribute that
> > ${JAR_FILE} to each container?
> >
> > Also --jars ${JARS} are the list of normal jar files that need to exist
> in
> > the same directory on each executor node?
> >
> > cheers,
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
> > On 17 July 2017 at 18:18, ayan guha <gu...@gmail.com> wrote:
> >>
> >> Hi Mitch
> >>
> >> your jar file can be anywhere in the file system, including hdfs.
> >>
> >> If using yarn, preferably use cluster mode in terms of deployment.
> >>
> >> Yarn will distribute the jar to each container.
> >>
> >> Best
> >> Ayan
> >>
> >> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <va...@cloudera.com>
> >> wrote:
> >>>
> >>> Spark distributes your application jar for you.
> >>>
> >>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
> >>> <mi...@gmail.com> wrote:
> >>> > hi guys,
> >>> >
> >>> >
> >>> > an uber/fat jar file has been created to run with spark in CDH yarc
> >>> > client
> >>> > mode.
> >>> >
> >>> > As usual job is submitted to the edge node.
> >>> >
> >>> > does the jar file has to be placed in the same directory ewith spark
> is
> >>> > running in the cluster to make it work?
> >>> >
> >>> > Also what will happen if say out of 9 nodes running spark, 3 have not
> >>> > got
> >>> > the jar file. will that job fail or it will carry on on the fremaing
> 6
> >>> > nodes
> >>> > that have that jar file?
> >>> >
> >>> > thanks
> >>> >
> >>> > Dr Mich Talebzadeh
> >>> >
> >>> >
> >>> >
> >>> > LinkedIn
> >>> >
> >>> > https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> >
> >>> >
> >>> >
> >>> > http://talebzadehmich.wordpress.com
> >>> >
> >>> >
> >>> > Disclaimer: Use it at your own risk. Any and all responsibility for
> any
> >>> > loss, damage or destruction of data or any other property which may
> >>> > arise
> >>> > from relying on this email's technical content is explicitly
> >>> > disclaimed. The
> >>> > author will in no case be liable for any monetary damages arising
> from
> >>> > such
> >>> > loss, damage or destruction.
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Marcelo
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >>>
> >> --
> >> Best Regards,
> >> Ayan Guha
> >
> >
>
>
>
> --
> Marcelo
>

Re: running spark job with fat jar file

Posted by Marcelo Vanzin <va...@cloudera.com>.
The YARN backend distributes all files and jars you submit with your
application.

On Mon, Jul 17, 2017 at 10:45 AM, Mich Talebzadeh
<mi...@gmail.com> wrote:
> thanks guys.
>
> just to clarify let us assume i am doing spark-submit as below:
>
> ${SPARK_HOME}/bin/spark-submit \
>                 --packages ${PACKAGES} \
>                 --driver-memory 2G \
>                 --num-executors 2 \
>                 --executor-memory 2G \
>                 --executor-cores 2 \
>                 --master yarn \
>                 --deploy-mode client \
>                 --conf "${SCHEDULER}" \
>                 --conf "${EXTRAJAVAOPTIONS}" \
>                 --jars ${JARS} \
>                 --class "${FILE_NAME}" \
>                 --conf "${SPARKUIPORT}" \
>                 --conf "${SPARKDRIVERPORT}" \
>                 --conf "${SPARKFILESERVERPORT}" \
>                 --conf "${SPARKBLOCKMANAGERPORT}" \
>                 --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
>                 ${JAR_FILE}
>
> The ${JAR_FILE} is the one. As I understand Spark should distribute that
> ${JAR_FILE} to each container?
>
> Also --jars ${JARS} are the list of normal jar files that need to exist in
> the same directory on each executor node?
>
> cheers,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 17 July 2017 at 18:18, ayan guha <gu...@gmail.com> wrote:
>>
>> Hi Mitch
>>
>> your jar file can be anywhere in the file system, including hdfs.
>>
>> If using yarn, preferably use cluster mode in terms of deployment.
>>
>> Yarn will distribute the jar to each container.
>>
>> Best
>> Ayan
>>
>> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>>>
>>> Spark distributes your application jar for you.
>>>
>>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
>>> <mi...@gmail.com> wrote:
>>> > hi guys,
>>> >
>>> >
>>> > an uber/fat jar file has been created to run with spark in CDH yarc
>>> > client
>>> > mode.
>>> >
>>> > As usual job is submitted to the edge node.
>>> >
>>> > does the jar file has to be placed in the same directory ewith spark is
>>> > running in the cluster to make it work?
>>> >
>>> > Also what will happen if say out of 9 nodes running spark, 3 have not
>>> > got
>>> > the jar file. will that job fail or it will carry on on the fremaing 6
>>> > nodes
>>> > that have that jar file?
>>> >
>>> > thanks
>>> >
>>> > Dr Mich Talebzadeh
>>> >
>>> >
>>> >
>>> > LinkedIn
>>> >
>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> >
>>> >
>>> >
>>> > http://talebzadehmich.wordpress.com
>>> >
>>> >
>>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>>> > loss, damage or destruction of data or any other property which may
>>> > arise
>>> > from relying on this email's technical content is explicitly
>>> > disclaimed. The
>>> > author will in no case be liable for any monetary damages arising from
>>> > such
>>> > loss, damage or destruction.
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>> --
>> Best Regards,
>> Ayan Guha
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: running spark job with fat jar file

Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks guys.

just to clarify let us assume i am doing spark-submit as below:

${SPARK_HOME}/bin/spark-submit \
                --packages ${PACKAGES} \
                --driver-memory 2G \
                --num-executors 2 \
                --executor-memory 2G \
                --executor-cores 2 \
                --master yarn \
                --deploy-mode client \
                --conf "${SCHEDULER}" \
                --conf "${EXTRAJAVAOPTIONS}" \
                --jars ${JARS} \
                --class "${FILE_NAME}" \
                --conf "${SPARKUIPORT}" \
                --conf "${SPARKDRIVERPORT}" \
                --conf "${SPARKFILESERVERPORT}" \
                --conf "${SPARKBLOCKMANAGERPORT}" \
                --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
                *${JAR_FILE}*

The* ${JAR_FILE}* is the one. As I understand Spark should distribute that
${JAR_FILE} to each container?

Also --jars ${JARS} are the list of normal jar files that need to exist in
the same directory on each executor node?

cheers,



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 July 2017 at 18:18, ayan guha <gu...@gmail.com> wrote:

> Hi Mitch
>
> your jar file can be anywhere in the file system, including hdfs.
>
> If using yarn, preferably use cluster mode in terms of deployment.
>
> Yarn will distribute the jar to each container.
>
> Best
> Ayan
>
> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> Spark distributes your application jar for you.
>>
>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
>> <mi...@gmail.com> wrote:
>> > hi guys,
>> >
>> >
>> > an uber/fat jar file has been created to run with spark in CDH yarc
>> client
>> > mode.
>> >
>> > As usual job is submitted to the edge node.
>> >
>> > does the jar file has to be placed in the same directory ewith spark is
>> > running in the cluster to make it work?
>> >
>> > Also what will happen if say out of 9 nodes running spark, 3 have not
>> got
>> > the jar file. will that job fail or it will carry on on the fremaing 6
>> nodes
>> > that have that jar file?
>> >
>> > thanks
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>> OABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly
>> disclaimed. The
>> > author will in no case be liable for any monetary damages arising from
>> such
>> > loss, damage or destruction.
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>> --
> Best Regards,
> Ayan Guha
>

Re: running spark job with fat jar file

Posted by ayan guha <gu...@gmail.com>.
Hi Mitch

your jar file can be anywhere in the file system, including hdfs.

If using yarn, preferably use cluster mode in terms of deployment.

Yarn will distribute the jar to each container.

Best
Ayan

On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <va...@cloudera.com> wrote:

> Spark distributes your application jar for you.
>
> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
> <mi...@gmail.com> wrote:
> > hi guys,
> >
> >
> > an uber/fat jar file has been created to run with spark in CDH yarc
> client
> > mode.
> >
> > As usual job is submitted to the edge node.
> >
> > does the jar file has to be placed in the same directory ewith spark is
> > running in the cluster to make it work?
> >
> > Also what will happen if say out of 9 nodes running spark, 3 have not got
> > the jar file. will that job fail or it will carry on on the fremaing 6
> nodes
> > that have that jar file?
> >
> > thanks
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Best Regards,
Ayan Guha

Re: running spark job with fat jar file

Posted by Marcelo Vanzin <va...@cloudera.com>.
Spark distributes your application jar for you.

On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
<mi...@gmail.com> wrote:
> hi guys,
>
>
> an uber/fat jar file has been created to run with spark in CDH yarc client
> mode.
>
> As usual job is submitted to the edge node.
>
> does the jar file has to be placed in the same directory ewith spark is
> running in the cluster to make it work?
>
> Also what will happen if say out of 9 nodes running spark, 3 have not got
> the jar file. will that job fail or it will carry on on the fremaing 6 nodes
> that have that jar file?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org