You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kartik Mathur <ka...@bluedata.com> on 2016/07/11 22:09:40 UTC

Spark cluster tuning recommendation

I am trying a run terasort in spark , for a 7 node cluster with only 10g of
data and executors get lost with GC overhead limit exceeded error.

This is what my cluster looks like -


   - *Alive Workers:* 7
   - *Cores in use:* 28 Total, 2 Used
   - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
   - *Applications:* 1 Running, 6 Completed
   - *Drivers:* 0 Running, 0 Completed
   - *Status:* ALIVE

Each worker has 8 cores and 4GB memory.

My questions is how do people running in production decide these properties
-

1) --num-executors
2) --executor-cores
3) --executor-memory
4) num of partitions
5) spark.default.parallelism

Thanks,
Kartik

Re: Spark cluster tuning recommendation

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,

Have you see a slide in spark summit 2016?
https://spark-summit.org/2016/events/top-5-mistakes-when-writing-spark-applications/
This is a good slide for your capacity planning.

// maropu

On Tue, Jul 12, 2016 at 2:31 PM, Yash Sharma <ya...@gmail.com> wrote:

> I would say use the dynamic allocation rather than number of executors.
> Provide some executor memory which you would like.
> Deciding the values requires couple of test runs and checking what works
> best for you.
>
> You could try something like -
>
> --driver-memory 1G \
> --executor-memory 2G \
> --executor-cores 2 \
> --conf spark.dynamicAllocation.enabled=true \
> --conf spark.dynamicAllocation.initialExecutors=8 \
>
>
>
> On Tue, Jul 12, 2016 at 1:27 PM, Anuj Kumar <an...@gmail.com> wrote:
>
>> That configuration looks bad. With only two cores in use and 1GB used by
>> the app. Few points-
>>
>> 1. Please oversubscribe those CPUs to at-least twice the amount of cores
>> you have to start-with and then tune if it freezes
>> 2. Allocate all of the CPU cores and memory to your running app (I assume
>> it is your test environment)
>> 3. Assuming that you are running a quad core machine if you define cores
>> as 8 for your workers you will get 56 cores (CPU threads)
>> 4. Also, it depends on the source from where you are reading the data. If
>> you are reading from HDFS, what is your block size and part count?
>> 5. You may also have to tune the timeouts and frame-size based on the
>> dataset and errors that you are facing
>>
>> We have run terasort with couple of high-end worker machines RW from HDFS
>> with 5-10 mount points allocated for HDFS and Spark local. We have used
>> multiple configuration, like-
>> 10W-10CPU-10GB, 25W-6CPU-6GB running on each of the two machines with
>> HDFS 512MB blocks and 1000-2000 parts. All these guys chatting at 10Gbe,
>> worked well.
>>
>> On Tue, Jul 12, 2016 at 3:39 AM, Kartik Mathur <ka...@bluedata.com>
>> wrote:
>>
>>> I am trying a run terasort in spark , for a 7 node cluster with only 10g
>>> of data and executors get lost with GC overhead limit exceeded error.
>>>
>>> This is what my cluster looks like -
>>>
>>>
>>>    - *Alive Workers:* 7
>>>    - *Cores in use:* 28 Total, 2 Used
>>>    - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
>>>    - *Applications:* 1 Running, 6 Completed
>>>    - *Drivers:* 0 Running, 0 Completed
>>>    - *Status:* ALIVE
>>>
>>> Each worker has 8 cores and 4GB memory.
>>>
>>> My questions is how do people running in production decide these
>>> properties -
>>>
>>> 1) --num-executors
>>> 2) --executor-cores
>>> 3) --executor-memory
>>> 4) num of partitions
>>> 5) spark.default.parallelism
>>>
>>> Thanks,
>>> Kartik
>>>
>>>
>>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Spark cluster tuning recommendation

Posted by Yash Sharma <ya...@gmail.com>.
I would say use the dynamic allocation rather than number of executors.
Provide some executor memory which you would like.
Deciding the values requires couple of test runs and checking what works
best for you.

You could try something like -

--driver-memory 1G \
--executor-memory 2G \
--executor-cores 2 \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.initialExecutors=8 \



On Tue, Jul 12, 2016 at 1:27 PM, Anuj Kumar <an...@gmail.com> wrote:

> That configuration looks bad. With only two cores in use and 1GB used by
> the app. Few points-
>
> 1. Please oversubscribe those CPUs to at-least twice the amount of cores
> you have to start-with and then tune if it freezes
> 2. Allocate all of the CPU cores and memory to your running app (I assume
> it is your test environment)
> 3. Assuming that you are running a quad core machine if you define cores
> as 8 for your workers you will get 56 cores (CPU threads)
> 4. Also, it depends on the source from where you are reading the data. If
> you are reading from HDFS, what is your block size and part count?
> 5. You may also have to tune the timeouts and frame-size based on the
> dataset and errors that you are facing
>
> We have run terasort with couple of high-end worker machines RW from HDFS
> with 5-10 mount points allocated for HDFS and Spark local. We have used
> multiple configuration, like-
> 10W-10CPU-10GB, 25W-6CPU-6GB running on each of the two machines with HDFS
> 512MB blocks and 1000-2000 parts. All these guys chatting at 10Gbe, worked
> well.
>
> On Tue, Jul 12, 2016 at 3:39 AM, Kartik Mathur <ka...@bluedata.com>
> wrote:
>
>> I am trying a run terasort in spark , for a 7 node cluster with only 10g
>> of data and executors get lost with GC overhead limit exceeded error.
>>
>> This is what my cluster looks like -
>>
>>
>>    - *Alive Workers:* 7
>>    - *Cores in use:* 28 Total, 2 Used
>>    - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
>>    - *Applications:* 1 Running, 6 Completed
>>    - *Drivers:* 0 Running, 0 Completed
>>    - *Status:* ALIVE
>>
>> Each worker has 8 cores and 4GB memory.
>>
>> My questions is how do people running in production decide these
>> properties -
>>
>> 1) --num-executors
>> 2) --executor-cores
>> 3) --executor-memory
>> 4) num of partitions
>> 5) spark.default.parallelism
>>
>> Thanks,
>> Kartik
>>
>>
>>
>

Re: Spark cluster tuning recommendation

Posted by Anuj Kumar <an...@gmail.com>.
That configuration looks bad. With only two cores in use and 1GB used by
the app. Few points-

1. Please oversubscribe those CPUs to at-least twice the amount of cores
you have to start-with and then tune if it freezes
2. Allocate all of the CPU cores and memory to your running app (I assume
it is your test environment)
3. Assuming that you are running a quad core machine if you define cores as
8 for your workers you will get 56 cores (CPU threads)
4. Also, it depends on the source from where you are reading the data. If
you are reading from HDFS, what is your block size and part count?
5. You may also have to tune the timeouts and frame-size based on the
dataset and errors that you are facing

We have run terasort with couple of high-end worker machines RW from HDFS
with 5-10 mount points allocated for HDFS and Spark local. We have used
multiple configuration, like-
10W-10CPU-10GB, 25W-6CPU-6GB running on each of the two machines with HDFS
512MB blocks and 1000-2000 parts. All these guys chatting at 10Gbe, worked
well.

On Tue, Jul 12, 2016 at 3:39 AM, Kartik Mathur <ka...@bluedata.com> wrote:

> I am trying a run terasort in spark , for a 7 node cluster with only 10g
> of data and executors get lost with GC overhead limit exceeded error.
>
> This is what my cluster looks like -
>
>
>    - *Alive Workers:* 7
>    - *Cores in use:* 28 Total, 2 Used
>    - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
>    - *Applications:* 1 Running, 6 Completed
>    - *Drivers:* 0 Running, 0 Completed
>    - *Status:* ALIVE
>
> Each worker has 8 cores and 4GB memory.
>
> My questions is how do people running in production decide these
> properties -
>
> 1) --num-executors
> 2) --executor-cores
> 3) --executor-memory
> 4) num of partitions
> 5) spark.default.parallelism
>
> Thanks,
> Kartik
>
>
>