You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by chopinxb <ch...@gmail.com> on 2017/12/20 09:46:31 UTC

Can spark shuffle leverage Alluxio to abtain higher stability？

In my use case, i run spark on yarn-client mode with dynamicAllocation
enabled,  When a node shutting down abnormally, my spark application will
fails because of task fail to fetch shuffle blocks from that node 4 times.
Why spark do not leverage Alluxio（distributed in-memory filesystem) to write
shuffle blocks with replicas ?  In this situation，when a node shutdown，task
can fetch shuffle blocks from other replicas. we can abtain higher stability 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Can spark shuffle leverage Alluxio to abtain higher stability？

Posted by vincent gromakowski <vi...@gmail.com>.

If not resilient at spark level, can't you just relaunch you job with your
orchestration tool ?

Le 21 déc. 2017 09:34, "Georg Heiler" <ge...@gmail.com> a écrit :

> Die you try to use the yarn Shuffle Service?
> chopinxb <ch...@gmail.com> schrieb am Do. 21. Dez. 2017 um 04:43:
>
>> In my practice of spark application(almost Spark-SQL) , when there is a
>> complete node failure in my cluster, jobs which have shuffle blocks on the
>> node will completely fail after 4 task retries.  It seems that data
>> lineage
>> didn't work. What' more, our applications use multiple SQL statements for
>> data analysis. After a lengthy calculation, entire application failed
>> because of one job failure is unacceptable.  So we consider more stability
>> rather than speed in some way.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Can spark shuffle leverage Alluxio to abtain higher stability？

Posted by Georg Heiler <ge...@gmail.com>.

Die you try to use the yarn Shuffle Service?
chopinxb <ch...@gmail.com> schrieb am Do. 21. Dez. 2017 um 04:43:

> In my practice of spark application(almost Spark-SQL) , when there is a
> complete node failure in my cluster, jobs which have shuffle blocks on the
> node will completely fail after 4 task retries.  It seems that data lineage
> didn't work. What' more, our applications use multiple SQL statements for
> data analysis. After a lengthy calculation, entire application failed
> because of one job failure is unacceptable.  So we consider more stability
> rather than speed in some way.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Can spark shuffle leverage Alluxio to abtain higher stability？

Posted by chopinxb <ch...@gmail.com>.

In my practice of spark application(almost Spark-SQL) , when there is a
complete node failure in my cluster, jobs which have shuffle blocks on the
node will completely fail after 4 task retries.  It seems that data lineage
didn't work. What' more, our applications use multiple SQL statements for
data analysis. After a lengthy calculation, entire application failed
because of one job failure is unacceptable.  So we consider more stability
rather than speed in some way.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Can spark shuffle leverage Alluxio to abtain higher stability？

Posted by vincent gromakowski <vi...@gmail.com>.

Probability of a complete node failure is low. I would rely on data lineage
and accept the reprocessing overhead. Another option would be to Write on
distributed FS but it will drastically reduce all your jobs speed

Le 20 déc. 2017 11:23, "chopinxb" <ch...@gmail.com> a écrit :

> Yes，shuffle service was already started in each NodeManager. What i mean
> about node fails is the machine is down，all the service include nodemanager
> process in this machine  is down. So in this case, shuffle service is no
> longer helpfull
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Can spark shuffle leverage Alluxio to abtain higher stability？

Posted by chopinxb <ch...@gmail.com>.

Yes，shuffle service was already started in each NodeManager. What i mean
about node fails is the machine is down，all the service include nodemanager
process in this machine  is down. So in this case, shuffle service is no
longer helpfull



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Can spark shuffle leverage Alluxio to abtain higher stability？

Posted by vincent gromakowski <vi...@gmail.com>.

In your case you need to externalize the shuffle files to a component
outside of your spark cluster to make them persist after spark workers
death.
https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service


2017-12-20 10:46 GMT+01:00 chopinxb <ch...@gmail.com>:

> In my use case, i run spark on yarn-client mode with dynamicAllocation
> enabled,  When a node shutting down abnormally, my spark application will
> fails because of task fail to fetch shuffle blocks from that node 4 times.
> Why spark do not leverage Alluxio（distributed in-memory filesystem) to
> write
> shuffle blocks with replicas ?  In this situation，when a node shutdown，task
> can fetch shuffle blocks from other replicas. we can abtain higher
> stability
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>