You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sean Owen <so...@cloudera.com> on 2014/09/02 19:05:30 UTC

Re: Possible to make one executor be able to work on multiple tasks simultaneously?

+user@

An executor is specific to an application, but an application can be
executing many jobs at once. So as I understand many jobs' tasks can
be executing at once on an executor.

You may not use your full 80-way parallelism if, for example, your
data set doesn't have 80 partitions. I also believe Spark will not
necessarily spread the load over executors, instead preferring to
respect data and rack locality if possible. Those are two reasons you
might see only 4 executors active. If you mean only 4 executors exist
at all, is it possible the other 4 can't provide the memory you're
asking for?


On Tue, Sep 2, 2014 at 5:56 PM, Victor Tso-Guillen <vt...@paxata.com> wrote:
> Actually one more question, since in preliminary runs I wasn't sure if I
> understood what's going on. Are the cores allocated to an executor able to
> execute tasks for different jobs simultaneously, or just for one job at a
> time? I have 10 workers with 8 cores each, and it appeared that one job got
> four executors at once, then four more later on. The system wasn't anywhere
> near saturation of 80 cores so I would've expected all 8 cores to be running
> simultaneously.
>
> If there's value to these questions, please reply back to the list.
>
>
> On Tue, Sep 2, 2014 at 6:58 AM, Victor Tso-Guillen <vt...@paxata.com> wrote:
>>
>> Thank you for the help, guys. So as I expected, I didn't fully understand
>> the options. I had SPARK_WORKER_CORES set to 1 because I did not realize
>> that by setting to > 1 it would mean an executor could operate on multiple
>> tasks simultaneously. I just thought it was a hint to Spark that that
>> executor could be expected to use that many threads, but otherwise I had not
>> understood that it affected the scheduler that way. Thanks!
>>
>>
>> On Sun, Aug 31, 2014 at 9:28 PM, Matei Zaharia <ma...@gmail.com>
>> wrote:
>>>
>>>
>>> Hey Victor,
>>>
>>> As Sean said, executors actually execute multiple tasks at a time. The
>>> only reasons they wouldn't are either (1) if you launched an executor with
>>> just 1 core (you can configure how many cores the executors will use when
>>> you set up your Worker, or it will look at your system by default) or (2) if
>>> your tasks are acquiring some kind of global lock, so only one can run at a
>>> time.
>>>
>>> To test this, do the following:
>>> - Launch your standalone cluster (you can do it on just one machine by
>>> adding just "localhost" in the slaves file)
>>> - Go to http://:4040 and look at the worker list. Do you see workers with
>>> more than 1 core? If not, you need to launch the workers by hand or set
>>> SPARK_WORKER_CORES in conf/spark-env.sh.
>>> - Run your application. Make sure it has enough pending tasks for your
>>> cores in the driver web UI (http://:4040), and if so, jstack one of the
>>> CoarseGrainedExecutor processes on a worker to see what the threads are
>>> doing. (Look for threads that contain TaskRunner.run in them)
>>>
>>> You can also try a simple CPU-bound job that launches lots of tasks like
>>> this to see that all cores are being used:
>>>
>>> sc.parallelize(1 to 1000, 1000).map(_ => (1 to
>>> 2000000000).product).count()
>>>
>>> Each task here takes 1-2 seconds to execute and there are 1000 of them so
>>> it should fill up your cluster.
>>>
>>> Matei
>>>
>>>
>>>
>>> On August 31, 2014 at 9:18:02 PM, Victor Tso-Guillen
>>> (vtso@paxata.com(mailto:vtso@paxata.com)) wrote:
>>>
>>> > I'm pretty sure my terminology matches that doc except the doc makes no
>>> > explicit mention of machines. In standalone mode, you can spawn multiple
>>> > workers on a single machine and each will babysit one executor (per
>>> > application). In my observation as well each executor can be assigned many
>>> > tasks but operates on one at a time. If there's a way to have it execute in
>>> > multiple tasks simultaneously in a single VM can you please show me how?
>>> > Maybe I'm missing the requisite configuration options, no matter how common
>>> > or trivial...
>>> >
>>> > On Sunday, August 31, 2014, Sean Owen wrote:
>>> > > The confusion may be your use of 'worker', which isn't matching what
>>> > > 'worker' means in Spark. Have a look at
>>> > > https://spark.apache.org/docs/latest/cluster-overview.html Of course
>>> > > one VM can run many tasks at once; that's already how Spark works.
>>> > >
>>> > > On Sun, Aug 31, 2014 at 4:52 AM, Victor Tso-Guillen wrote:
>>> > > > I might not be making myself clear, so sorry about that. I
>>> > > > understand that a
>>> > > > machine can have as many spark workers as you'd like, for example
>>> > > > one per
>>> > > > core. A worker may be assigned to a pool for one or more
>>> > > > applications, but
>>> > > > for a single application let's just say a single worker will have
>>> > > > at most a
>>> > > > single executor. An executor can be assigned multiple tasks in its
>>> > > > queue,
>>> > > > but will work on one task at a time only.
>>> > > >
>>> > > > In local mode, you can specify the number of executors you want and
>>> > > > they
>>> > > > will all reside in the same vm. Those executors will each be able
>>> > > > to operate
>>> > > > on a single task at a time, though they may also have an arbitrary
>>> > > > number of
>>> > > > tasks in their queue. From the standpoint of a vm, however, a vm
>>> > > > can
>>> > > > therefore operate on multiple tasks simultaneously in local mode.
>>> > > >
>>> > > > What I want is something similar in standalone mode (or mesos or
>>> > > > YARN if
>>> > > > that's the only way to do it) whereby I can have a single executor
>>> > > > vm handle
>>> > > > many tasks concurrently. Is this possible? Is my problem statement
>>> > > > clear? If
>>> > > > there's a misconception on my part on the deployment of a spark
>>> > > > cluster I'd
>>> > > > like to know it, but as of currently what we have deployed is like
>>> > > > my first
>>> > > > paragraph.
>>> > > >
>>> > > >
>>> > > > On Sat, Aug 30, 2014 at 1:58 AM, Sean Owen wrote:
>>> > > >>
>>> > > >> A machine should have one worker, and many executors per worker
>>> > > >> (one per
>>> > > >> app). An executor runs many tasks. This is how it works for me in
>>> > > >> standalone
>>> > > >> mode at least!
>>> > > >>
>>> > > >> On Aug 30, 2014 3:08 AM, "Victor Tso-Guillen" wrote:
>>> > > >>>
>>> > > >>> A machine has many workers and a worker has an executor. I want
>>> > > >>> the
>>> > > >>> executor to handle many tasks at once, like in local mode.
>>> > > >>>
>>> > > >>>
>>> > > >>> On Fri, Aug 29, 2014 at 5:51 PM, Sean Owen wrote:
>>> > > >>>>
>>> > > >>>> Hm, do you mean worker? Spark certainly works on many tasks per
>>> > > >>>> machine
>>> > > >>>> at once.
>>> > > >>>>
>>> > > >>>> On Aug 29, 2014 8:11 PM, "Victor Tso-Guillen" wrote:
>>> > > >>>>>
>>> > > >>>>> Standalone. I'd love to tell it that my one executor can
>>> > > >>>>> simultaneously
>>> > > >>>>> serve, say, 16 tasks at once for an arbitrary number of
>>> > > >>>>> distinct jobs.
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>> On Fri, Aug 29, 2014 at 11:29 AM, Matei Zaharia
>>> > > >>>>> wrote:
>>> > > >>>>>>
>>> > > >>>>>> Yes, executors run one task per core of your machine by
>>> > > >>>>>> default. You
>>> > > >>>>>> can also manually launch them with more worker threads than
>>> > > >>>>>> you have cores.
>>> > > >>>>>> What cluster manager are you on?
>>> > > >>>>>>
>>> > > >>>>>> Matei
>>> > > >>>>>>
>>> > > >>>>>> On August 29, 2014 at 11:24:33 AM, Victor Tso-Guillen
>>> > > >>>>>> (vtso@paxata.com(javascript:;)) wrote:
>>> > > >>>>>>
>>> > > >>>>>> I'm thinking of local mode where multiple virtual executors
>>> > > >>>>>> occupy the
>>> > > >>>>>> same vm. Can we have the same configuration in spark
>>> > > >>>>>> standalone cluster
>>> > > >>>>>> mode?
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>
>>> > > >
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Possible to make one executor be able to work on multiple tasks simultaneously?

Posted by Victor Tso-Guillen <vt...@paxata.com>.
I'm pretty sure the issue was an interaction with another subsystem. Thanks
for your patience with me!


On Tue, Sep 2, 2014 at 10:05 AM, Sean Owen <so...@cloudera.com> wrote:

> +user@
>
> An executor is specific to an application, but an application can be
> executing many jobs at once. So as I understand many jobs' tasks can
> be executing at once on an executor.
>
> You may not use your full 80-way parallelism if, for example, your
> data set doesn't have 80 partitions. I also believe Spark will not
> necessarily spread the load over executors, instead preferring to
> respect data and rack locality if possible. Those are two reasons you
> might see only 4 executors active. If you mean only 4 executors exist
> at all, is it possible the other 4 can't provide the memory you're
> asking for?
>
>
> On Tue, Sep 2, 2014 at 5:56 PM, Victor Tso-Guillen <vt...@paxata.com>
> wrote:
> > Actually one more question, since in preliminary runs I wasn't sure if I
> > understood what's going on. Are the cores allocated to an executor able
> to
> > execute tasks for different jobs simultaneously, or just for one job at a
> > time? I have 10 workers with 8 cores each, and it appeared that one job
> got
> > four executors at once, then four more later on. The system wasn't
> anywhere
> > near saturation of 80 cores so I would've expected all 8 cores to be
> running
> > simultaneously.
> >
> > If there's value to these questions, please reply back to the list.
> >
> >
> > On Tue, Sep 2, 2014 at 6:58 AM, Victor Tso-Guillen <vt...@paxata.com>
> wrote:
> >>
> >> Thank you for the help, guys. So as I expected, I didn't fully
> understand
> >> the options. I had SPARK_WORKER_CORES set to 1 because I did not realize
> >> that by setting to > 1 it would mean an executor could operate on
> multiple
> >> tasks simultaneously. I just thought it was a hint to Spark that that
> >> executor could be expected to use that many threads, but otherwise I
> had not
> >> understood that it affected the scheduler that way. Thanks!
> >>
> >>
> >> On Sun, Aug 31, 2014 at 9:28 PM, Matei Zaharia <matei.zaharia@gmail.com
> >
> >> wrote:
> >>>
> >>>
> >>> Hey Victor,
> >>>
> >>> As Sean said, executors actually execute multiple tasks at a time. The
> >>> only reasons they wouldn't are either (1) if you launched an executor
> with
> >>> just 1 core (you can configure how many cores the executors will use
> when
> >>> you set up your Worker, or it will look at your system by default) or
> (2) if
> >>> your tasks are acquiring some kind of global lock, so only one can run
> at a
> >>> time.
> >>>
> >>> To test this, do the following:
> >>> - Launch your standalone cluster (you can do it on just one machine by
> >>> adding just "localhost" in the slaves file)
> >>> - Go to http://:4040 and look at the worker list. Do you see workers
> with
> >>> more than 1 core? If not, you need to launch the workers by hand or set
> >>> SPARK_WORKER_CORES in conf/spark-env.sh.
> >>> - Run your application. Make sure it has enough pending tasks for your
> >>> cores in the driver web UI (http://:4040), and if so, jstack one of
> the
> >>> CoarseGrainedExecutor processes on a worker to see what the threads are
> >>> doing. (Look for threads that contain TaskRunner.run in them)
> >>>
> >>> You can also try a simple CPU-bound job that launches lots of tasks
> like
> >>> this to see that all cores are being used:
> >>>
> >>> sc.parallelize(1 to 1000, 1000).map(_ => (1 to
> >>> 2000000000).product).count()
> >>>
> >>> Each task here takes 1-2 seconds to execute and there are 1000 of them
> so
> >>> it should fill up your cluster.
> >>>
> >>> Matei
> >>>
> >>>
> >>>
> >>> On August 31, 2014 at 9:18:02 PM, Victor Tso-Guillen
> >>> (vtso@paxata.com(mailto:vtso@paxata.com)) wrote:
> >>>
> >>> > I'm pretty sure my terminology matches that doc except the doc makes
> no
> >>> > explicit mention of machines. In standalone mode, you can spawn
> multiple
> >>> > workers on a single machine and each will babysit one executor (per
> >>> > application). In my observation as well each executor can be
> assigned many
> >>> > tasks but operates on one at a time. If there's a way to have it
> execute in
> >>> > multiple tasks simultaneously in a single VM can you please show me
> how?
> >>> > Maybe I'm missing the requisite configuration options, no matter how
> common
> >>> > or trivial...
> >>> >
> >>> > On Sunday, August 31, 2014, Sean Owen wrote:
> >>> > > The confusion may be your use of 'worker', which isn't matching
> what
> >>> > > 'worker' means in Spark. Have a look at
> >>> > > https://spark.apache.org/docs/latest/cluster-overview.html Of
> course
> >>> > > one VM can run many tasks at once; that's already how Spark works.
> >>> > >
> >>> > > On Sun, Aug 31, 2014 at 4:52 AM, Victor Tso-Guillen wrote:
> >>> > > > I might not be making myself clear, so sorry about that. I
> >>> > > > understand that a
> >>> > > > machine can have as many spark workers as you'd like, for example
> >>> > > > one per
> >>> > > > core. A worker may be assigned to a pool for one or more
> >>> > > > applications, but
> >>> > > > for a single application let's just say a single worker will have
> >>> > > > at most a
> >>> > > > single executor. An executor can be assigned multiple tasks in
> its
> >>> > > > queue,
> >>> > > > but will work on one task at a time only.
> >>> > > >
> >>> > > > In local mode, you can specify the number of executors you want
> and
> >>> > > > they
> >>> > > > will all reside in the same vm. Those executors will each be able
> >>> > > > to operate
> >>> > > > on a single task at a time, though they may also have an
> arbitrary
> >>> > > > number of
> >>> > > > tasks in their queue. From the standpoint of a vm, however, a vm
> >>> > > > can
> >>> > > > therefore operate on multiple tasks simultaneously in local mode.
> >>> > > >
> >>> > > > What I want is something similar in standalone mode (or mesos or
> >>> > > > YARN if
> >>> > > > that's the only way to do it) whereby I can have a single
> executor
> >>> > > > vm handle
> >>> > > > many tasks concurrently. Is this possible? Is my problem
> statement
> >>> > > > clear? If
> >>> > > > there's a misconception on my part on the deployment of a spark
> >>> > > > cluster I'd
> >>> > > > like to know it, but as of currently what we have deployed is
> like
> >>> > > > my first
> >>> > > > paragraph.
> >>> > > >
> >>> > > >
> >>> > > > On Sat, Aug 30, 2014 at 1:58 AM, Sean Owen wrote:
> >>> > > >>
> >>> > > >> A machine should have one worker, and many executors per worker
> >>> > > >> (one per
> >>> > > >> app). An executor runs many tasks. This is how it works for me
> in
> >>> > > >> standalone
> >>> > > >> mode at least!
> >>> > > >>
> >>> > > >> On Aug 30, 2014 3:08 AM, "Victor Tso-Guillen" wrote:
> >>> > > >>>
> >>> > > >>> A machine has many workers and a worker has an executor. I want
> >>> > > >>> the
> >>> > > >>> executor to handle many tasks at once, like in local mode.
> >>> > > >>>
> >>> > > >>>
> >>> > > >>> On Fri, Aug 29, 2014 at 5:51 PM, Sean Owen wrote:
> >>> > > >>>>
> >>> > > >>>> Hm, do you mean worker? Spark certainly works on many tasks
> per
> >>> > > >>>> machine
> >>> > > >>>> at once.
> >>> > > >>>>
> >>> > > >>>> On Aug 29, 2014 8:11 PM, "Victor Tso-Guillen" wrote:
> >>> > > >>>>>
> >>> > > >>>>> Standalone. I'd love to tell it that my one executor can
> >>> > > >>>>> simultaneously
> >>> > > >>>>> serve, say, 16 tasks at once for an arbitrary number of
> >>> > > >>>>> distinct jobs.
> >>> > > >>>>>
> >>> > > >>>>>
> >>> > > >>>>> On Fri, Aug 29, 2014 at 11:29 AM, Matei Zaharia
> >>> > > >>>>> wrote:
> >>> > > >>>>>>
> >>> > > >>>>>> Yes, executors run one task per core of your machine by
> >>> > > >>>>>> default. You
> >>> > > >>>>>> can also manually launch them with more worker threads than
> >>> > > >>>>>> you have cores.
> >>> > > >>>>>> What cluster manager are you on?
> >>> > > >>>>>>
> >>> > > >>>>>> Matei
> >>> > > >>>>>>
> >>> > > >>>>>> On August 29, 2014 at 11:24:33 AM, Victor Tso-Guillen
> >>> > > >>>>>> (vtso@paxata.com(javascript:;)) wrote:
> >>> > > >>>>>>
> >>> > > >>>>>> I'm thinking of local mode where multiple virtual executors
> >>> > > >>>>>> occupy the
> >>> > > >>>>>> same vm. Can we have the same configuration in spark
> >>> > > >>>>>> standalone cluster
> >>> > > >>>>>> mode?
> >>> > > >>>>>
> >>> > > >>>>>
> >>> > > >>>
> >>> > > >
> >>>
> >>
> >
>