You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gerard Maas (JIRA)" <ji...@apache.org> on 2015/01/07 23:52:34 UTC
[jira] [Comment Edited] (SPARK-4940) Support more evenly distributing cores for Mesos mode

    [ https://issues.apache.org/jira/browse/SPARK-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268440#comment-14268440 ] 

Gerard Maas edited comment on SPARK-4940 at 1/7/15 10:52 PM:
-------------------------------------------------------------

Hi Tim,

We are indeed using Coarse Grain mode. I'm not sure fine-grained mode makes much sense for Spark Streaming.

Here're few examples of resource allocation. They are taken from several runs of the same job with identical configuration:
Job config:
spark.cores.max = 18
spark.mesos.coarse = true
spark.executor.memory = 4g
    
The job logic will start 6 Kafka receivers.

#1
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 4 |  4GB | 3  | 2  |
| 2 | 6 |  4GB | 2  | 1  | 
| 3 | 7 | 4GB  | 3  | 2  |
| 4 | 1 | 4GB | 1 | 1 |

Total mem: 16 GB
Total CPUs: 18

Observations: 
Node#4 with only 1 CPU and 1 Kafka receiver does not have capacity to process the received data, so all data received needs to be sent to other node for non-local processing  (not sure how replication helps or not in this case, the blocks of data are processed on other nodes). Also the nodes with 2 streaming receivers have higher load that the node with 1 receiver.

#2
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 7 |  4GB | 7  | 4  |
| 2 | 2 |  4GB | 2  | 2  | 

Total mem: 8 GB
Total CPUs: 9

Observations: 
This is the worst configuration of the day. Totally unbalanced (4 vs 2 receivers) and for some reason, the job didn't get all the resources assigned in the configuration. The job processing time is also slower as there're less cores to handle the data and less overall memory.

#3
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 3 |  4GB | 3  | 2  |
| 2 | 8 |  4GB | 2  | 2  | 
| 3 | 7 | 4GB  | 3  | 2  |

Total mem: 12GB
Total CPU: 18

Observations: 
This is a fairly good configuration with a more evenly distributed receivers and CPUs although there's one  considerable smaller node in terms of CPU assignment.
 
We can observe that the current resource assignment policy results in less than ideal and in particular random assignments that have a strong impact on the job execution and performance. Given that CPU allocation is by executor (and not by job), makes total memory for the job variable as it can get 2 to 4 executors assigned. It's also weird and unexpected to observe less than max CPU allocations.
Here's a performance chart of the same job jumping from one config to another (*), one with 3 (left) nodes and one with 2 (right): 
!https://lh3.googleusercontent.com/Z1C71OKoQzGA13uNJ8Yvf_xz_glRUqU_IGGvLsfkPvUPK2lahrEatweiWl-PDDfysjXtbs1Sl_k=w1682-h689!
(chart line: processing time in ms, load is fairly constant)

(*) for some reason we didn't find yet, Mesos often kills the job. When Marathon relaunches it, it results in a different resource assignment.


was (Author: gmaas):
Hi Tim,

We are indeed using Coarse Grain mode. I'm not sure fine-grained mode makes much sense for Spark Streaming.

Here're few examples of resource allocation. They are taken from several runs of the same job with identical configuration:
Job config:
spark.cores.max = 18
spark.mesos.coarse = true
spark.executor.memory = 4g
    
The job logic will start 6 Kafka receivers.

#1
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 4 |  4GB | 3  | 2  |
| 2 | 6 |  4GB | 2  | 1  | 
| 3 | 7 | 4GB  | 3  | 2  |
| 4 | 1 | 4GB | 1 | 1 |

Total mem: 16 GB
Total CPUs: 18

Observations: 
Node#4 with only 1 CPU and 1 Kafka receiver does not have capacity to process the received data, so all data received needs to be sent to other node for non-local processing  (not sure how replication helps or not in this case, the blocks of data are processed on other nodes). Also the nodes with 2 streaming receivers have higher load that the node with 1 receiver.

#2
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 7 |  4GB | 7  | 4  |
| 2 | 2 |  4GB | 2  | 2  | 

Total mem: 8 GB
Total CPUs: 9

Observations: 
This is the worst configuration of the day. Totally unbalanced (4 vs 2 receivers) and for some reason, the job didn't get all the resources assigned in the configuration. The job processing time is also slower as there're less cores to handle the data and less overall memory.

#3
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 3 |  4GB | 3  | 2  |
| 2 | 8 |  4GB | 2  | 2  | 
| 3 | 7 | 4GB  | 3  | 2  |

Total mem: 12GB
Total CPU: 18

Observations: 
This is a fairly good configuration with a more evenly distributed receivers and CPUs although there's one  considerable smaller node in terms of CPU assignment.
 
We can observe that the current resource assignment policy results in less than ideal and in particular random assignments that have a strong impact on the job execution and performance. Given that CPU allocation is by executor (and not by job), makes total memory for the job variable as it can get 2 to 4 executors assigned. It's also weird and unexpected to observe less than max CPU allocations.
Here's a performance chart of the same job across two configurations, one with 3 (left) nodes and one with 2 (right): 
!https://lh3.googleusercontent.com/Z1C71OKoQzGA13uNJ8Yvf_xz_glRUqU_IGGvLsfkPvUPK2lahrEatweiWl-PDDfysjXtbs1Sl_k=w1682-h689!
(chart line: processing time in ms, load is fairly constant)

> Support more evenly distributing cores for Mesos mode
> -----------------------------------------------------
>
>                 Key: SPARK-4940
>                 URL: https://issues.apache.org/jira/browse/SPARK-4940
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>            Reporter: Timothy Chen
>
> Currently in Coarse grain mode the spark scheduler simply takes all the resources it can on each node, but can cause uneven distribution based on resources available on each slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org