You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Prabhu Joseph <pr...@gmail.com> on 2016/02/19 06:51:35 UTC

Concurreny does not improve for Spark Jobs with Same Spark Context

Hi All,

   When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share a
single Spark Context, the jobs take more time to complete comparing with
when they ran with different Spark Context.
The spark jobs are submitted on different threads.

Test Case:

    A.  3 spark jobs submitted serially
    B.  3 spark jobs submitted concurrently and with different SparkContext
    C.  3 spark jobs submitted concurrently and with same Spark Context
    D.  3 spark jobs submitted concurrently and with same Spark Context and
tripling the resources.

A and B takes equal time, But C and D are taking 2-3 times longer than A,
which shows concurrency does not improve with shared Spark Context. [Spark
Job Server]

Thanks,
Prabhu Joseph

Re: Concurreny does not improve for Spark Jobs with Same Spark Context

Posted by Prabhu Joseph <pr...@gmail.com>.
Fair Scheduler, YARN Queue has the entire cluster resource as maxResource,
preemption does not come into picture during test case, all the spark jobs
got the requested resource.

The concurrent jobs with different spark context runs fine, so suspecting
on resource contention is not a correct one.

The performace degrades only for concurrent jobs on shared spark context.
Is SparkContext has any critical section, which needs locking, and jobs
waiting to read that. I know Spark and Scala is not a old thread model, it
uses Actor Model, where locking does not happen, but still want to verify
is java old  threading is used somewhere.



On Friday, February 19, 2016, Jörn Franke <jo...@gmail.com> wrote:

> How did you configure YARN queues? What scheduler? Preemption ?
>
> > On 19 Feb 2016, at 06:51, Prabhu Joseph <prabhujose.gates@gmail.com
> <javascript:;>> wrote:
> >
> > Hi All,
> >
> >    When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share
> a single Spark Context, the jobs take more time to complete comparing with
> when they ran with different Spark Context.
> > The spark jobs are submitted on different threads.
> >
> > Test Case:
> >
> >     A.  3 spark jobs submitted serially
> >     B.  3 spark jobs submitted concurrently and with different
> SparkContext
> >     C.  3 spark jobs submitted concurrently and with same Spark Context
> >     D.  3 spark jobs submitted concurrently and with same Spark Context
> and tripling the resources.
> >
> > A and B takes equal time, But C and D are taking 2-3 times longer than
> A, which shows concurrency does not improve with shared Spark Context.
> [Spark Job Server]
> >
> > Thanks,
> > Prabhu Joseph
>

Re: Concurreny does not improve for Spark Jobs with Same Spark Context

Posted by Prabhu Joseph <pr...@gmail.com>.
Fair Scheduler, YARN Queue has the entire cluster resource as maxResource,
preemption does not come into picture during test case, all the spark jobs
got the requested resource.

The concurrent jobs with different spark context runs fine, so suspecting
on resource contention is not a correct one.

The performace degrades only for concurrent jobs on shared spark context.
Is SparkContext has any critical section, which needs locking, and jobs
waiting to read that. I know Spark and Scala is not a old thread model, it
uses Actor Model, where locking does not happen, but still want to verify
is java old  threading is used somewhere.



On Friday, February 19, 2016, Jörn Franke <jo...@gmail.com> wrote:

> How did you configure YARN queues? What scheduler? Preemption ?
>
> > On 19 Feb 2016, at 06:51, Prabhu Joseph <prabhujose.gates@gmail.com
> <javascript:;>> wrote:
> >
> > Hi All,
> >
> >    When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share
> a single Spark Context, the jobs take more time to complete comparing with
> when they ran with different Spark Context.
> > The spark jobs are submitted on different threads.
> >
> > Test Case:
> >
> >     A.  3 spark jobs submitted serially
> >     B.  3 spark jobs submitted concurrently and with different
> SparkContext
> >     C.  3 spark jobs submitted concurrently and with same Spark Context
> >     D.  3 spark jobs submitted concurrently and with same Spark Context
> and tripling the resources.
> >
> > A and B takes equal time, But C and D are taking 2-3 times longer than
> A, which shows concurrency does not improve with shared Spark Context.
> [Spark Job Server]
> >
> > Thanks,
> > Prabhu Joseph
>

Re: Concurreny does not improve for Spark Jobs with Same Spark Context

Posted by Jörn Franke <jo...@gmail.com>.
How did you configure YARN queues? What scheduler? Preemption ?

> On 19 Feb 2016, at 06:51, Prabhu Joseph <pr...@gmail.com> wrote:
> 
> Hi All,
> 
>    When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share a single Spark Context, the jobs take more time to complete comparing with when they ran with different Spark Context.
> The spark jobs are submitted on different threads.
> 
> Test Case: 
>   
>     A.  3 spark jobs submitted serially
>     B.  3 spark jobs submitted concurrently and with different SparkContext
>     C.  3 spark jobs submitted concurrently and with same Spark Context
>     D.  3 spark jobs submitted concurrently and with same Spark Context and tripling the resources.
> 
> A and B takes equal time, But C and D are taking 2-3 times longer than A, which shows concurrency does not improve with shared Spark Context. [Spark Job Server]
> 
> Thanks,
> Prabhu Joseph

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Concurreny does not improve for Spark Jobs with Same Spark Context

Posted by Ted Yu <yu...@gmail.com>.
Is it possible to perform the tests using Spark 1.6.0 ?

Thanks

On Thu, Feb 18, 2016 at 9:51 PM, Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi All,
>
>    When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share a
> single Spark Context, the jobs take more time to complete comparing with
> when they ran with different Spark Context.
> The spark jobs are submitted on different threads.
>
> Test Case:
>
>     A.  3 spark jobs submitted serially
>     B.  3 spark jobs submitted concurrently and with different SparkContext
>     C.  3 spark jobs submitted concurrently and with same Spark Context
>     D.  3 spark jobs submitted concurrently and with same Spark Context
> and tripling the resources.
>
> A and B takes equal time, But C and D are taking 2-3 times longer than A,
> which shows concurrency does not improve with shared Spark Context. [Spark
> Job Server]
>
> Thanks,
> Prabhu Joseph
>

Re: Concurreny does not improve for Spark Jobs with Same Spark Context

Posted by Jörn Franke <jo...@gmail.com>.
How did you configure YARN queues? What scheduler? Preemption ?

> On 19 Feb 2016, at 06:51, Prabhu Joseph <pr...@gmail.com> wrote:
> 
> Hi All,
> 
>    When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share a single Spark Context, the jobs take more time to complete comparing with when they ran with different Spark Context.
> The spark jobs are submitted on different threads.
> 
> Test Case: 
>   
>     A.  3 spark jobs submitted serially
>     B.  3 spark jobs submitted concurrently and with different SparkContext
>     C.  3 spark jobs submitted concurrently and with same Spark Context
>     D.  3 spark jobs submitted concurrently and with same Spark Context and tripling the resources.
> 
> A and B takes equal time, But C and D are taking 2-3 times longer than A, which shows concurrency does not improve with shared Spark Context. [Spark Job Server]
> 
> Thanks,
> Prabhu Joseph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org