You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by asadrao <as...@microsoft.com> on 2015/03/31 22:19:44 UTC

Using 'fair' scheduler mode

Hi, I am using the Spark ‘fair’ scheduler mode. I have noticed that if the
first query is a very expensive query (ex: ‘select *’ on a really big data
set) than any subsequent query seem to get blocked. I would have expected
the second query to run in parallel since I am using the ‘fair’ scheduler
mode not the ‘fifo’. I am submitting the query through thrift server.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-fair-scheduler-mode-tp22328.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Using 'fair' scheduler mode

Posted by Mark Hamstra <ma...@clearstorydata.com>.

>
> I am using the Spark ‘fair’ scheduler mode.

What do you mean by this?  Fair scheduling mode is not one thing in Spark,
but allows for multiple configurations and usages.  Presumably, at a
minimum you are using SparkConf to set spark.scheduling.mode to "FAIR", but
then how are you setting up scheduling pools, how are you allocating jobs
to pools, and what scheduling mode are you using within pools?

Setting spark.scheduling.mode is a necessary but probably not sufficient
condition to effect your desired scheduling policy.

On Tue, Mar 31, 2015 at 1:19 PM, asadrao <as...@microsoft.com> wrote:

> Hi, I am using the Spark ‘fair’ scheduler mode. I have noticed that if the
> first query is a very expensive query (ex: ‘select *’ on a really big data
> set) than any subsequent query seem to get blocked. I would have expected
> the second query to run in parallel since I am using the ‘fair’ scheduler
> mode not the ‘fifo’. I am submitting the query through thrift server.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-fair-scheduler-mode-tp22328.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Using 'fair' scheduler mode

Posted by Sean Owen <so...@cloudera.com>.

Does the expensive query take all executor slots? Then there is nothing for
any other job to use regardless of scheduling policy.
On Mar 31, 2015 9:20 PM, "asadrao" <as...@microsoft.com> wrote:

> Hi, I am using the Spark ‘fair’ scheduler mode. I have noticed that if the
> first query is a very expensive query (ex: ‘select *’ on a really big data
> set) than any subsequent query seem to get blocked. I would have expected
> the second query to run in parallel since I am using the ‘fair’ scheduler
> mode not the ‘fifo’. I am submitting the query through thrift server.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-fair-scheduler-mode-tp22328.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Using 'fair' scheduler mode

Posted by Raghavendra Pandey <ra...@gmail.com>.

I am facing the same issue. FAIR and FIFO behaving in the same way.

On Wed, Apr 1, 2015 at 1:49 AM, asadrao <as...@microsoft.com> wrote:

> Hi, I am using the Spark ‘fair’ scheduler mode. I have noticed that if the
> first query is a very expensive query (ex: ‘select *’ on a really big data
> set) than any subsequent query seem to get blocked. I would have expected
> the second query to run in parallel since I am using the ‘fair’ scheduler
> mode not the ‘fifo’. I am submitting the query through thrift server.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-fair-scheduler-mode-tp22328.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>