You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tzahi File <tz...@ironsrc.com> on 2019/12/29 22:02:52 UTC

Splitting resource in Spark cluster

Hi All,

I'm using one spark cluster cluster that contains 50 nodes from type i3.4xl
(16Vcores).
I'm trying to run 4 Spark SQL queries simultaneously.

The data is split to 10 even partitions and the 4 queries run on the same
data,but different partition. I have tried to configure the cluster so each
job will get the same resources and won't interfere with the other jobs
resources.
When running with 1/2 queries simultaneously I got much better performance
then the 4 queries.
Although I expected to get the same performance.

I'm looking for your advice on how to improve the performance by tuning the
configurations.

I have a total of 15*50 nodes
5 executors per instance
max-executers 37
shuffle partition 750
...

From what I understand when setting 37 max executors when running 1,2,3,4
jobs in parallel they will have the same executors number, thus the same
running time..


Thanks,
Tzahi