You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sebastian Piu <se...@gmail.com> on 2016/01/26 18:57:57 UTC

FAIR scheduler in Spark Streaming

Hi,

I'm trying to get *FAIR *scheduling to work in a spark streaming app
(1.6.0).

I've found a previous mailing list where it is indicated to do:

dstream.foreachRDD { rdd =>
rdd.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1") // set
the pool rdd.count() // or whatever job }

This seems to work, in the sense that If I have 5 foreachRDD in my code,
each one is sent to a different queue, but they still get executed one
after the other rather than at the same time.
Am I missing something?

The scheduler config and scheduler mode are being picked alright as I can
see them on the Spark UI

//Context config

*spark.scheduler.mode=FAIR*

Here is my scheduler config:


*<?xml version="1.0"?> <allocations> <pool name="A">
<schedulingMode>FAIR</schedulingMode> <weight>2</weight>
<minShare>1</minShare> </pool> <pool name="B">
<schedulingMode>FAIR</schedulingMode> <weight>1</weight>
<minShare>0</minShare> </pool> <pool name=C">
<schedulingMode>FAIR</schedulingMode> <weight>1</weight>
<minShare>0</minShare> </pool> <pool name=D">
<schedulingMode>FAIR</schedulingMode> <weight>1</weight>
<minShare>0</minShare> </pool> <pool name="E">
<schedulingMode>FAIR</schedulingMode> <weight>2</weight>
<minShare>1</minShare> </pool>*


Any idea on what could be wrong?

Re: FAIR scheduler in Spark Streaming

Posted by Sebastian Piu <se...@gmail.com>.

Thanks Shixiong, I'll give it a try and report back

Cheers
On 26 Jan 2016 6:10 p.m., "Shixiong(Ryan) Zhu" <sh...@databricks.com>
wrote:

> The number of concurrent Streaming job is controlled by
> "spark.streaming.concurrentJobs". It's 1 by default. However, you need to
> keep in mind that setting it to a bigger number will allow jobs of several
> batches running at the same time. It's hard to predicate the behavior and
> sometimes will surprise you.
>
> On Tue, Jan 26, 2016 at 9:57 AM, Sebastian Piu <se...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm trying to get *FAIR *scheduling to work in a spark streaming app
>> (1.6.0).
>>
>> I've found a previous mailing list where it is indicated to do:
>>
>> dstream.foreachRDD { rdd =>
>> rdd.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1") // set
>> the pool rdd.count() // or whatever job }
>>
>> This seems to work, in the sense that If I have 5 foreachRDD in my code,
>> each one is sent to a different queue, but they still get executed one
>> after the other rather than at the same time.
>> Am I missing something?
>>
>> The scheduler config and scheduler mode are being picked alright as I can
>> see them on the Spark UI
>>
>> //Context config
>>
>> *spark.scheduler.mode=FAIR*
>>
>> Here is my scheduler config:
>>
>>
>> *<?xml version="1.0"?> <allocations> <pool name="A">
>> <schedulingMode>FAIR</schedulingMode> <weight>2</weight>
>> <minShare>1</minShare> </pool> <pool name="B">
>> <schedulingMode>FAIR</schedulingMode> <weight>1</weight>
>> <minShare>0</minShare> </pool> <pool name=C">
>> <schedulingMode>FAIR</schedulingMode> <weight>1</weight>
>> <minShare>0</minShare> </pool> <pool name=D">
>> <schedulingMode>FAIR</schedulingMode> <weight>1</weight>
>> <minShare>0</minShare> </pool> <pool name="E">
>> <schedulingMode>FAIR</schedulingMode> <weight>2</weight>
>> <minShare>1</minShare> </pool>*
>>
>>
>> Any idea on what could be wrong?
>>
>
>

Re: FAIR scheduler in Spark Streaming

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.

The number of concurrent Streaming job is controlled by
"spark.streaming.concurrentJobs". It's 1 by default. However, you need to
keep in mind that setting it to a bigger number will allow jobs of several
batches running at the same time. It's hard to predicate the behavior and
sometimes will surprise you.

On Tue, Jan 26, 2016 at 9:57 AM, Sebastian Piu <se...@gmail.com>
wrote:

> Hi,
>
> I'm trying to get *FAIR *scheduling to work in a spark streaming app
> (1.6.0).
>
> I've found a previous mailing list where it is indicated to do:
>
> dstream.foreachRDD { rdd =>
> rdd.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1") // set
> the pool rdd.count() // or whatever job }
>
> This seems to work, in the sense that If I have 5 foreachRDD in my code,
> each one is sent to a different queue, but they still get executed one
> after the other rather than at the same time.
> Am I missing something?
>
> The scheduler config and scheduler mode are being picked alright as I can
> see them on the Spark UI
>
> //Context config
>
> *spark.scheduler.mode=FAIR*
>
> Here is my scheduler config:
>
>
> *<?xml version="1.0"?> <allocations> <pool name="A">
> <schedulingMode>FAIR</schedulingMode> <weight>2</weight>
> <minShare>1</minShare> </pool> <pool name="B">
> <schedulingMode>FAIR</schedulingMode> <weight>1</weight>
> <minShare>0</minShare> </pool> <pool name=C">
> <schedulingMode>FAIR</schedulingMode> <weight>1</weight>
> <minShare>0</minShare> </pool> <pool name=D">
> <schedulingMode>FAIR</schedulingMode> <weight>1</weight>
> <minShare>0</minShare> </pool> <pool name="E">
> <schedulingMode>FAIR</schedulingMode> <weight>2</weight>
> <minShare>1</minShare> </pool>*
>
>
> Any idea on what could be wrong?
>