You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joe <jo...@net2020.org> on 2018/11/12 15:33:35 UTC

question about barrier execution mode in Spark 2.4.0

Hello,
I was reading Spark 2.4.0 release docs and I'd like to find out more 
about barrier execution mode.
In particular I'd like to know what happens when number of partitions 
exceeds number of nodes (which I think is allowed, Spark tuning doc 
mentions that)?
Does Spark guarantee that all tasks process all partitions 
simultaneously? If not then how does barrier mode handle partitions that 
are waiting to be processed?
If there are partitions waiting to be processed then I don't think it's 
possible to send all data from given stage to a DL process, even when 
using barrier mode?
Thanks a lot,

Joe


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: question about barrier execution mode in Spark 2.4.0

Posted by Xiangrui Meng <me...@gmail.com>.
On Mon, Nov 12, 2018 at 7:33 AM Joe <jo...@net2020.org> wrote:

> Hello,
> I was reading Spark 2.4.0 release docs and I'd like to find out more
> about barrier execution mode.
> In particular I'd like to know what happens when number of partitions
> exceeds number of nodes (which I think is allowed, Spark tuning doc
> mentions that)?
>

The barrier execution mode is different. It needs to run tasks for all
partitions together. So when the number of partitions is greater than
number of nodes, it will wait until more nodes are available and print
warning messages.


> Does Spark guarantee that all tasks process all partitions
> simultaneously?


They will start all together. We provide a barrier()
<http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.BarrierTaskContext@getTaskInfos():Array[org.apache.spark.BarrierTaskInfo]>
method in the task scope to help simple coordination among tasks.


> If not then how does barrier mode handle partitions that
> are waiting to be processed?
> If there are partitions waiting to be processed then I don't think it's
> possible to send all data from given stage to a DL process, even when
> using barrier mode?
> Thanks a lot,
>
> Joe
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>