You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Vishal Gupta (JIRA)" <ji...@apache.org> on 2016/01/29 11:40:39 UTC

[jira] [Updated] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

     [ https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal Gupta updated SPARK-13083:
---------------------------------
    Summary: Small spark sql queries get blocked if there is a long running query over a lot a partitions  (was: Small spark sql queries get bloced if there is a long running query over a lot a partitions)

> Small spark sql queries get blocked if there is a long running query over a lot a partitions
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13083
>                 URL: https://issues.apache.org/jira/browse/SPARK-13083
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Vishal Gupta
>
> Steps to reproduce :
> a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions ) in s3.
> b) The spark-job for the first query starts running.
> c) Run second query "show tables"  to the same spark-application. ( i did it using zeppellin ) 
> d) As soon as the second query "show tables" is submitted, it starts showing up in the "Spark Application UI" > "SQL".
> e) At this point there is only one active job running in the application which corresponds to the first query.
> f) Only after the job for the first query is near completion, the job for "show tables" starts appearing in "Spark Application UI" > "Jobs". 
> g) As soon as the job for "show tables" starts, it completes very fast and gives the results.
> Sometime step (c) has to performed after 1-2 minutes of execution of the long-running-query. But after this point, jobs do not get started for any number of smaller queries submitted to the spark-application till the long-running-query is near execution. 
> They seem to be blocked on the long-running query. Ideally, they should have started running as the all settings are for fair-scheduler.
> I am running spark-1.5.1. In addtion to it, I have the following configs :
> {code}
> spark.scheduler.mode FAIR
> spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml
> {code}
> /usr/lib/spark/conf/fairscheduler.xml has the following contents 
> {code}
> <?xml version="1.0"?>
> <allocations>
>   <pool name="default">
>       <schedulingMode>FAIR</schedulingMode>
>    </pool>
>  </allocations>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org