You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ar7 <as...@gmail.com> on 2016/07/04 07:15:47 UTC

Limiting Pyspark.daemons

Hi,

I am currently using PySpark 1.6.1 in my cluster. When a pyspark application
is run, the load on the workers seems to go more than what was given. When I
ran top, I noticed that there were too many Pyspark.daemons processes
running. There was another mail thread regarding the same:

https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E

I followed what was mentioned there, i.e. reduced the number of executor
cores and number of executors in one node to 1. But the number of
pyspark.daemons process is still not coming down. It looks like initially
there is one Pyspark.daemons process and this in turn spawns as many
pyspark.daemons processes as the number of cores in the machine. 

Any help is appreciated :)

Thanks,
Ashwin Raaghav.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Limiting Pyspark.daemons

Posted by Ashwin Raaghav <as...@gmail.com>.
Thanks. I'll try that. Hopefully that should work.

On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin <ma...@closetwork.org>
wrote:

> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav <as...@gmail.com>
> wrote:
>
>> I am thinking of any possibilities as to why this could be happening. If
>> the cores are multi-threaded, should that affect the daemons? Your spark
>> was built from source code or downloaded as a binary, though that should
>> not technically change anything?
>>
>> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <ma...@closetwork.org>
>> wrote:
>>
>>> 1.6.1.
>>>
>>> I have no idea. SPARK_WORKER_CORES should do the same.
>>>
>>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <as...@gmail.com>
>>> wrote:
>>>
>>>> Which version of Spark are you using? 1.6.1?
>>>>
>>>> Any ideas as to why it is not working in ours?
>>>>
>>>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <mathieu@closetwork.org
>>>> > wrote:
>>>>
>>>>> 16.
>>>>>
>>>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried what you suggested and started the slave using the following
>>>>>> command:
>>>>>>
>>>>>> start-slave.sh --cores 1 <master>
>>>>>>
>>>>>> But it still seems to start as many pyspark daemons as the number of
>>>>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>>>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>>>>
>>>>>> When you said it helped you and limited it to 2 processes in your
>>>>>> cluster, how many cores did each machine have?
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <
>>>>>> mathieu@closetwork.org> wrote:
>>>>>>
>>>>>>> It depends on what you want to do:
>>>>>>>
>>>>>>> If, on any given server, you don't want Spark to use more than one
>>>>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>>>>>> --cores=1
>>>>>>>
>>>>>>> If you have a bunch of servers dedicated to Spark, but you don't
>>>>>>> want a driver to use more than one core per server, then: spark.executor.cores=1
>>>>>>> tells it not to use more than 1 core per server. However, it seems it will
>>>>>>> start as many pyspark as there are cores, but maybe not use them.
>>>>>>>
>>>>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Mathieu,
>>>>>>>>
>>>>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>>>>>>> can I specify "--cores=1" from the application?
>>>>>>>>
>>>>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>
>>>>>>>>> When running the executor, put --cores=1. We use this and I only
>>>>>>>>> see 2 pyspark process, one seem to be the parent of the other and is idle.
>>>>>>>>>
>>>>>>>>> In your case, are all pyspark process working?
>>>>>>>>>
>>>>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>>>>>> application
>>>>>>>>>> is run, the load on the workers seems to go more than what was
>>>>>>>>>> given. When I
>>>>>>>>>> ran top, I noticed that there were too many Pyspark.daemons
>>>>>>>>>> processes
>>>>>>>>>> running. There was another mail thread regarding the same:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>>>>>>>
>>>>>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>>>>>> executor
>>>>>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>>>>>> initially
>>>>>>>>>> there is one Pyspark.daemons process and this in turn spawns as
>>>>>>>>>> many
>>>>>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>>>>>
>>>>>>>>>> Any help is appreciated :)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ashwin Raaghav.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>>>> Nabble.com.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Mathieu Longtin
>>>>>>>>> 1-514-803-8977
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Ashwin Raaghav
>>>>>>>>
>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Ashwin Raaghav
>>>>>>
>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Ashwin Raaghav
>>>>
>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav

Re: Limiting Pyspark.daemons

Posted by Mathieu Longtin <ma...@closetwork.org>.
Try to figure out what the env vars and arguments of the worker JVM and
Python process are. Maybe you'll get a clue.

On Mon, Jul 4, 2016 at 11:42 AM Mathieu Longtin <ma...@closetwork.org>
wrote:

> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav <as...@gmail.com>
> wrote:
>
>> I am thinking of any possibilities as to why this could be happening. If
>> the cores are multi-threaded, should that affect the daemons? Your spark
>> was built from source code or downloaded as a binary, though that should
>> not technically change anything?
>>
>> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <ma...@closetwork.org>
>> wrote:
>>
>>> 1.6.1.
>>>
>>> I have no idea. SPARK_WORKER_CORES should do the same.
>>>
>>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <as...@gmail.com>
>>> wrote:
>>>
>>>> Which version of Spark are you using? 1.6.1?
>>>>
>>>> Any ideas as to why it is not working in ours?
>>>>
>>>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <mathieu@closetwork.org
>>>> > wrote:
>>>>
>>>>> 16.
>>>>>
>>>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried what you suggested and started the slave using the following
>>>>>> command:
>>>>>>
>>>>>> start-slave.sh --cores 1 <master>
>>>>>>
>>>>>> But it still seems to start as many pyspark daemons as the number of
>>>>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>>>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>>>>
>>>>>> When you said it helped you and limited it to 2 processes in your
>>>>>> cluster, how many cores did each machine have?
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <
>>>>>> mathieu@closetwork.org> wrote:
>>>>>>
>>>>>>> It depends on what you want to do:
>>>>>>>
>>>>>>> If, on any given server, you don't want Spark to use more than one
>>>>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>>>>>> --cores=1
>>>>>>>
>>>>>>> If you have a bunch of servers dedicated to Spark, but you don't
>>>>>>> want a driver to use more than one core per server, then: spark.executor.cores=1
>>>>>>> tells it not to use more than 1 core per server. However, it seems it will
>>>>>>> start as many pyspark as there are cores, but maybe not use them.
>>>>>>>
>>>>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Mathieu,
>>>>>>>>
>>>>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>>>>>>> can I specify "--cores=1" from the application?
>>>>>>>>
>>>>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>>
>>>>>>>>> When running the executor, put --cores=1. We use this and I only
>>>>>>>>> see 2 pyspark process, one seem to be the parent of the other and is idle.
>>>>>>>>>
>>>>>>>>> In your case, are all pyspark process working?
>>>>>>>>>
>>>>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>>>>>> application
>>>>>>>>>> is run, the load on the workers seems to go more than what was
>>>>>>>>>> given. When I
>>>>>>>>>> ran top, I noticed that there were too many Pyspark.daemons
>>>>>>>>>> processes
>>>>>>>>>> running. There was another mail thread regarding the same:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>>>>>>>
>>>>>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>>>>>> executor
>>>>>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>>>>>> initially
>>>>>>>>>> there is one Pyspark.daemons process and this in turn spawns as
>>>>>>>>>> many
>>>>>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>>>>>
>>>>>>>>>> Any help is appreciated :)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ashwin Raaghav.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>>>> Nabble.com.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Mathieu Longtin
>>>>>>>>> 1-514-803-8977
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Ashwin Raaghav
>>>>>>>>
>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Ashwin Raaghav
>>>>>>
>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Ashwin Raaghav
>>>>
>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>
-- 
Mathieu Longtin
1-514-803-8977

Re: Limiting Pyspark.daemons

Posted by Mathieu Longtin <ma...@closetwork.org>.
I started with a download of 1.6.0. These days, we use a self compiled
1.6.2.

On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav <as...@gmail.com> wrote:

> I am thinking of any possibilities as to why this could be happening. If
> the cores are multi-threaded, should that affect the daemons? Your spark
> was built from source code or downloaded as a binary, though that should
> not technically change anything?
>
> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <ma...@closetwork.org>
> wrote:
>
>> 1.6.1.
>>
>> I have no idea. SPARK_WORKER_CORES should do the same.
>>
>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <as...@gmail.com>
>> wrote:
>>
>>> Which version of Spark are you using? 1.6.1?
>>>
>>> Any ideas as to why it is not working in ours?
>>>
>>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <ma...@closetwork.org>
>>> wrote:
>>>
>>>> 16.
>>>>
>>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <as...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I tried what you suggested and started the slave using the following
>>>>> command:
>>>>>
>>>>> start-slave.sh --cores 1 <master>
>>>>>
>>>>> But it still seems to start as many pyspark daemons as the number of
>>>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>>>
>>>>> When you said it helped you and limited it to 2 processes in your
>>>>> cluster, how many cores did each machine have?
>>>>>
>>>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <
>>>>> mathieu@closetwork.org> wrote:
>>>>>
>>>>>> It depends on what you want to do:
>>>>>>
>>>>>> If, on any given server, you don't want Spark to use more than one
>>>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>>>>> --cores=1
>>>>>>
>>>>>> If you have a bunch of servers dedicated to Spark, but you don't want
>>>>>> a driver to use more than one core per server, then: spark.executor.cores=1
>>>>>> tells it not to use more than 1 core per server. However, it seems it will
>>>>>> start as many pyspark as there are cores, but maybe not use them.
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Mathieu,
>>>>>>>
>>>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>>>>>> can I specify "--cores=1" from the application?
>>>>>>>
>>>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>>>>>> mathieu@closetwork.org> wrote:
>>>>>>>
>>>>>>>> When running the executor, put --cores=1. We use this and I only
>>>>>>>> see 2 pyspark process, one seem to be the parent of the other and is idle.
>>>>>>>>
>>>>>>>> In your case, are all pyspark process working?
>>>>>>>>
>>>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>>>>> application
>>>>>>>>> is run, the load on the workers seems to go more than what was
>>>>>>>>> given. When I
>>>>>>>>> ran top, I noticed that there were too many Pyspark.daemons
>>>>>>>>> processes
>>>>>>>>> running. There was another mail thread regarding the same:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>>>>>>
>>>>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>>>>> executor
>>>>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>>>>> initially
>>>>>>>>> there is one Pyspark.daemons process and this in turn spawns as
>>>>>>>>> many
>>>>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>>>>
>>>>>>>>> Any help is appreciated :)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ashwin Raaghav.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>>> Nabble.com.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Mathieu Longtin
>>>>>>>> 1-514-803-8977
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Ashwin Raaghav
>>>>>>>
>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Ashwin Raaghav
>>>>>
>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Ashwin Raaghav
>>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977

Re: Limiting Pyspark.daemons

Posted by Ashwin Raaghav <as...@gmail.com>.
I am thinking of any possibilities as to why this could be happening. If
the cores are multi-threaded, should that affect the daemons? Your spark
was built from source code or downloaded as a binary, though that should
not technically change anything?

On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <ma...@closetwork.org>
wrote:

> 1.6.1.
>
> I have no idea. SPARK_WORKER_CORES should do the same.
>
> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <as...@gmail.com>
> wrote:
>
>> Which version of Spark are you using? 1.6.1?
>>
>> Any ideas as to why it is not working in ours?
>>
>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <ma...@closetwork.org>
>> wrote:
>>
>>> 16.
>>>
>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <as...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I tried what you suggested and started the slave using the following
>>>> command:
>>>>
>>>> start-slave.sh --cores 1 <master>
>>>>
>>>> But it still seems to start as many pyspark daemons as the number of
>>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>>
>>>> When you said it helped you and limited it to 2 processes in your
>>>> cluster, how many cores did each machine have?
>>>>
>>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <mathieu@closetwork.org
>>>> > wrote:
>>>>
>>>>> It depends on what you want to do:
>>>>>
>>>>> If, on any given server, you don't want Spark to use more than one
>>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>>>> --cores=1
>>>>>
>>>>> If you have a bunch of servers dedicated to Spark, but you don't want
>>>>> a driver to use more than one core per server, then: spark.executor.cores=1
>>>>> tells it not to use more than 1 core per server. However, it seems it will
>>>>> start as many pyspark as there are cores, but maybe not use them.
>>>>>
>>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Mathieu,
>>>>>>
>>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>>>>> can I specify "--cores=1" from the application?
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>>>>> mathieu@closetwork.org> wrote:
>>>>>>
>>>>>>> When running the executor, put --cores=1. We use this and I only see
>>>>>>> 2 pyspark process, one seem to be the parent of the other and is idle.
>>>>>>>
>>>>>>> In your case, are all pyspark process working?
>>>>>>>
>>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>>>> application
>>>>>>>> is run, the load on the workers seems to go more than what was
>>>>>>>> given. When I
>>>>>>>> ran top, I noticed that there were too many Pyspark.daemons
>>>>>>>> processes
>>>>>>>> running. There was another mail thread regarding the same:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>>>>>
>>>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>>>> executor
>>>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>>>> initially
>>>>>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>>>
>>>>>>>> Any help is appreciated :)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ashwin Raaghav.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>
>>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Ashwin Raaghav
>>>>>>
>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Ashwin Raaghav
>>>>
>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav

Re: Limiting Pyspark.daemons

Posted by Mathieu Longtin <ma...@closetwork.org>.
1.6.1.

I have no idea. SPARK_WORKER_CORES should do the same.

On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <as...@gmail.com> wrote:

> Which version of Spark are you using? 1.6.1?
>
> Any ideas as to why it is not working in ours?
>
> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <ma...@closetwork.org>
> wrote:
>
>> 16.
>>
>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <as...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried what you suggested and started the slave using the following
>>> command:
>>>
>>> start-slave.sh --cores 1 <master>
>>>
>>> But it still seems to start as many pyspark daemons as the number of
>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>
>>> When you said it helped you and limited it to 2 processes in your
>>> cluster, how many cores did each machine have?
>>>
>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <ma...@closetwork.org>
>>> wrote:
>>>
>>>> It depends on what you want to do:
>>>>
>>>> If, on any given server, you don't want Spark to use more than one
>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>>> --cores=1
>>>>
>>>> If you have a bunch of servers dedicated to Spark, but you don't want a
>>>> driver to use more than one core per server, then: spark.executor.cores=1
>>>> tells it not to use more than 1 core per server. However, it seems it will
>>>> start as many pyspark as there are cores, but maybe not use them.
>>>>
>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Mathieu,
>>>>>
>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>>>> can I specify "--cores=1" from the application?
>>>>>
>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>>>> mathieu@closetwork.org> wrote:
>>>>>
>>>>>> When running the executor, put --cores=1. We use this and I only see
>>>>>> 2 pyspark process, one seem to be the parent of the other and is idle.
>>>>>>
>>>>>> In your case, are all pyspark process working?
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>>> application
>>>>>>> is run, the load on the workers seems to go more than what was
>>>>>>> given. When I
>>>>>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>>>>>> running. There was another mail thread regarding the same:
>>>>>>>
>>>>>>>
>>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>>>>
>>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>>> executor
>>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>>> initially
>>>>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>>
>>>>>>> Any help is appreciated :)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ashwin Raaghav.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>>
>>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Ashwin Raaghav
>>>>>
>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Ashwin Raaghav
>>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977

Re: Limiting Pyspark.daemons

Posted by Ashwin Raaghav <as...@gmail.com>.
Which version of Spark are you using? 1.6.1?

Any ideas as to why it is not working in ours?

On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <ma...@closetwork.org>
wrote:

> 16.
>
> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <as...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried what you suggested and started the slave using the following
>> command:
>>
>> start-slave.sh --cores 1 <master>
>>
>> But it still seems to start as many pyspark daemons as the number of
>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>
>> When you said it helped you and limited it to 2 processes in your
>> cluster, how many cores did each machine have?
>>
>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <ma...@closetwork.org>
>> wrote:
>>
>>> It depends on what you want to do:
>>>
>>> If, on any given server, you don't want Spark to use more than one core,
>>> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>>>
>>> If you have a bunch of servers dedicated to Spark, but you don't want a
>>> driver to use more than one core per server, then: spark.executor.cores=1
>>> tells it not to use more than 1 core per server. However, it seems it will
>>> start as many pyspark as there are cores, but maybe not use them.
>>>
>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
>>> wrote:
>>>
>>>> Hi Mathieu,
>>>>
>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how can
>>>> I specify "--cores=1" from the application?
>>>>
>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <mathieu@closetwork.org
>>>> > wrote:
>>>>
>>>>> When running the executor, put --cores=1. We use this and I only see 2
>>>>> pyspark process, one seem to be the parent of the other and is idle.
>>>>>
>>>>> In your case, are all pyspark process working?
>>>>>
>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>> application
>>>>>> is run, the load on the workers seems to go more than what was given.
>>>>>> When I
>>>>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>>>>> running. There was another mail thread regarding the same:
>>>>>>
>>>>>>
>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>>>
>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>> executor
>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>> initially
>>>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>
>>>>>> Any help is appreciated :)
>>>>>>
>>>>>> Thanks,
>>>>>> Ashwin Raaghav.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Ashwin Raaghav
>>>>
>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav

Re: Limiting Pyspark.daemons

Posted by Mathieu Longtin <ma...@closetwork.org>.
16.

On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <as...@gmail.com> wrote:

> Hi,
>
> I tried what you suggested and started the slave using the following
> command:
>
> start-slave.sh --cores 1 <master>
>
> But it still seems to start as many pyspark daemons as the number of cores
> in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by
> giving SPARK_WORKER_CORES=1 also didn't help.
>
> When you said it helped you and limited it to 2 processes in your cluster,
> how many cores did each machine have?
>
> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <ma...@closetwork.org>
> wrote:
>
>> It depends on what you want to do:
>>
>> If, on any given server, you don't want Spark to use more than one core,
>> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>>
>> If you have a bunch of servers dedicated to Spark, but you don't want a
>> driver to use more than one core per server, then: spark.executor.cores=1
>> tells it not to use more than 1 core per server. However, it seems it will
>> start as many pyspark as there are cores, but maybe not use them.
>>
>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
>> wrote:
>>
>>> Hi Mathieu,
>>>
>>> Isn't that the same as setting "spark.executor.cores" to 1? And how can
>>> I specify "--cores=1" from the application?
>>>
>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <ma...@closetwork.org>
>>> wrote:
>>>
>>>> When running the executor, put --cores=1. We use this and I only see 2
>>>> pyspark process, one seem to be the parent of the other and is idle.
>>>>
>>>> In your case, are all pyspark process working?
>>>>
>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>> application
>>>>> is run, the load on the workers seems to go more than what was given.
>>>>> When I
>>>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>>>> running. There was another mail thread regarding the same:
>>>>>
>>>>>
>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>>
>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>> executor
>>>>> cores and number of executors in one node to 1. But the number of
>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>> initially
>>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>
>>>>> Any help is appreciated :)
>>>>>
>>>>> Thanks,
>>>>> Ashwin Raaghav.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Ashwin Raaghav
>>>
>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977

Re: Limiting Pyspark.daemons

Posted by Ashwin Raaghav <as...@gmail.com>.
Hi,

I tried what you suggested and started the slave using the following
command:

start-slave.sh --cores 1 <master>

But it still seems to start as many pyspark daemons as the number of cores
in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by
giving SPARK_WORKER_CORES=1 also didn't help.

When you said it helped you and limited it to 2 processes in your cluster,
how many cores did each machine have?

On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <ma...@closetwork.org>
wrote:

> It depends on what you want to do:
>
> If, on any given server, you don't want Spark to use more than one core,
> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>
> If you have a bunch of servers dedicated to Spark, but you don't want a
> driver to use more than one core per server, then: spark.executor.cores=1
> tells it not to use more than 1 core per server. However, it seems it will
> start as many pyspark as there are cores, but maybe not use them.
>
> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com>
> wrote:
>
>> Hi Mathieu,
>>
>> Isn't that the same as setting "spark.executor.cores" to 1? And how can I
>> specify "--cores=1" from the application?
>>
>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <ma...@closetwork.org>
>> wrote:
>>
>>> When running the executor, put --cores=1. We use this and I only see 2
>>> pyspark process, one seem to be the parent of the other and is idle.
>>>
>>> In your case, are all pyspark process working?
>>>
>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>> application
>>>> is run, the load on the workers seems to go more than what was given.
>>>> When I
>>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>>> running. There was another mail thread regarding the same:
>>>>
>>>>
>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>>
>>>> I followed what was mentioned there, i.e. reduced the number of executor
>>>> cores and number of executors in one node to 1. But the number of
>>>> pyspark.daemons process is still not coming down. It looks like
>>>> initially
>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>
>>>> Any help is appreciated :)
>>>>
>>>> Thanks,
>>>> Ashwin Raaghav.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav

Re: Limiting Pyspark.daemons

Posted by Mathieu Longtin <ma...@closetwork.org>.
It depends on what you want to do:

If, on any given server, you don't want Spark to use more than one core,
use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1

If you have a bunch of servers dedicated to Spark, but you don't want a
driver to use more than one core per server, then: spark.executor.cores=1
tells it not to use more than 1 core per server. However, it seems it will
start as many pyspark as there are cores, but maybe not use them.

On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <as...@gmail.com> wrote:

> Hi Mathieu,
>
> Isn't that the same as setting "spark.executor.cores" to 1? And how can I
> specify "--cores=1" from the application?
>
> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <ma...@closetwork.org>
> wrote:
>
>> When running the executor, put --cores=1. We use this and I only see 2
>> pyspark process, one seem to be the parent of the other and is idle.
>>
>> In your case, are all pyspark process working?
>>
>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>> application
>>> is run, the load on the workers seems to go more than what was given.
>>> When I
>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>> running. There was another mail thread regarding the same:
>>>
>>>
>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>>
>>> I followed what was mentioned there, i.e. reduced the number of executor
>>> cores and number of executors in one node to 1. But the number of
>>> pyspark.daemons process is still not coming down. It looks like initially
>>> there is one Pyspark.daemons process and this in turn spawns as many
>>> pyspark.daemons processes as the number of cores in the machine.
>>>
>>> Any help is appreciated :)
>>>
>>> Thanks,
>>> Ashwin Raaghav.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
>
>
> --
> Regards,
>
> Ashwin Raaghav
>
-- 
Mathieu Longtin
1-514-803-8977

Re: Limiting Pyspark.daemons

Posted by Ashwin Raaghav <as...@gmail.com>.
Hi Mathieu,

Isn't that the same as setting "spark.executor.cores" to 1? And how can I
specify "--cores=1" from the application?

On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <ma...@closetwork.org>
wrote:

> When running the executor, put --cores=1. We use this and I only see 2
> pyspark process, one seem to be the parent of the other and is idle.
>
> In your case, are all pyspark process working?
>
> On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:
>
>> Hi,
>>
>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>> application
>> is run, the load on the workers seems to go more than what was given.
>> When I
>> ran top, I noticed that there were too many Pyspark.daemons processes
>> running. There was another mail thread regarding the same:
>>
>>
>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>>
>> I followed what was mentioned there, i.e. reduced the number of executor
>> cores and number of executors in one node to 1. But the number of
>> pyspark.daemons process is still not coming down. It looks like initially
>> there is one Pyspark.daemons process and this in turn spawns as many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is appreciated :)
>>
>> Thanks,
>> Ashwin Raaghav.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav

Re: Limiting Pyspark.daemons

Posted by Mathieu Longtin <ma...@closetwork.org>.
When running the executor, put --cores=1. We use this and I only see 2
pyspark process, one seem to be the parent of the other and is idle.

In your case, are all pyspark process working?

On Mon, Jul 4, 2016 at 3:15 AM ar7 <as...@gmail.com> wrote:

> Hi,
>
> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
> application
> is run, the load on the workers seems to go more than what was given. When
> I
> ran top, I noticed that there were too many Pyspark.daemons processes
> running. There was another mail thread regarding the same:
>
>
> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3CCAO429hvi3dRc-ojEMue3X4q1VDzt61hTByeAcagtRE9yrHsFgA@mail.gmail.com%3E
>
> I followed what was mentioned there, i.e. reduced the number of executor
> cores and number of executors in one node to 1. But the number of
> pyspark.daemons process is still not coming down. It looks like initially
> there is one Pyspark.daemons process and this in turn spawns as many
> pyspark.daemons processes as the number of cores in the machine.
>
> Any help is appreciated :)
>
> Thanks,
> Ashwin Raaghav.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Mathieu Longtin
1-514-803-8977