You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vishal Gupta (JIRA)" <ji...@apache.org> on 2016/01/29 11:25:39 UTC

[jira] [Updated] (SPARK-13083) Small spark sql queries get bloced if there is a long running query over a lot a partitions

     [ https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal Gupta updated SPARK-13083:
---------------------------------
    Description: 
Steps to reproduce :
a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions ) in s3.
b) The spark-job for the first query starts running.
c) Run second query "show tables"  to the same spark-application. ( i did it using zeppellin ) 
d) As soon as the second query "show tables" is submitted, it starts showing up in the "Spark Application UI" > "SQL".
e) At this point there is only one active job running in the application which corresponds to the first query.
f) Only after the job for the first query is near completion, the job for "show tables" starts appearing in "Spark Application UI" > "Jobs". 
g) As soon as the job for "show tables" starts, it completes very fast and gives the results.

Sometime step (c) has to performed after 1-2 minutes of execution of the long-running-query. But after this point, jobs do not get started for any number of smaller queries submitted to the spark-application till the long-running-query is near execution. 

They seem to be blocked on the long-running query. Ideally, they should have started running as the all settings are for fair-scheduler.

I am running spark-1.5.1. In addtion to it, I have the following configs :
{code}
spark.scheduler.mode FAIR
spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml
{code}

/usr/lib/spark/conf/fairscheduler.xml has the following contents 
{code}
<?xml version="1.0"?>
<allocations>
  <pool name="default">
      <schedulingMode>FAIR</schedulingMode>
   </pool>
 </allocations>
{code}

  was:
Steps to reproduce :
a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions ) in s3.
b) The spark-job for the first query starts running.
c) Run second query "show tables"  to the same spark-application. ( i did it using zeppellin ) 
d) As soon as the second query "show tables" is submitted, it starts showing up in the "Spark Application UI" > "SQL".
e) At this point there is only one active job running in the application which corresponds to the first query.
f) Only after the job for the first query is near completion, the job for "show tables" starts appearing in "Spark Application UI" > "Jobs". 
g) As soon as the job for "show tables" starts, it completes very fast and gives the results.

Sometime step (c) has to performed after 1-2 minutes of execution of the long-running-query. But after this point, jobs do not get started for any number of smaller queries submitted to the spark-application till the long-running-query is near execution. 

They seem to be blocked on the long-running query. Ideally, they should have started running as the all settings are for fair-scheduler.

I am running spark-1.5.1. In addtion to it, I have the following configs :
spark.scheduler.mode FAIR
spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml

/usr/lib/spark/conf/fairscheduler.xml has the following contents 
{code}
<?xml version="1.0"?>
<allocations>
  <pool name="default">
      <schedulingMode>FAIR</schedulingMode>
   </pool>
 </allocations>
{code}





I run a query doing count(*) over a lot of paritions ( ~4500 partitions ) in s3. In the same spark-application using zepellin, when I run a "show tables" to the same spark-application it does not start till the first query come very near completion.

I submit the second query via zepellin to the same spark application. The "SQL" tab in the spark-application-UI starts showing "show tables". But the job 


> Small spark sql queries get bloced if there is a long running query over a lot a partitions
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13083
>                 URL: https://issues.apache.org/jira/browse/SPARK-13083
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Vishal Gupta
>
> Steps to reproduce :
> a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions ) in s3.
> b) The spark-job for the first query starts running.
> c) Run second query "show tables"  to the same spark-application. ( i did it using zeppellin ) 
> d) As soon as the second query "show tables" is submitted, it starts showing up in the "Spark Application UI" > "SQL".
> e) At this point there is only one active job running in the application which corresponds to the first query.
> f) Only after the job for the first query is near completion, the job for "show tables" starts appearing in "Spark Application UI" > "Jobs". 
> g) As soon as the job for "show tables" starts, it completes very fast and gives the results.
> Sometime step (c) has to performed after 1-2 minutes of execution of the long-running-query. But after this point, jobs do not get started for any number of smaller queries submitted to the spark-application till the long-running-query is near execution. 
> They seem to be blocked on the long-running query. Ideally, they should have started running as the all settings are for fair-scheduler.
> I am running spark-1.5.1. In addtion to it, I have the following configs :
> {code}
> spark.scheduler.mode FAIR
> spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml
> {code}
> /usr/lib/spark/conf/fairscheduler.xml has the following contents 
> {code}
> <?xml version="1.0"?>
> <allocations>
>   <pool name="default">
>       <schedulingMode>FAIR</schedulingMode>
>    </pool>
>  </allocations>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org