You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Steven Stetzler <st...@gmail.com> on 2020/10/20 22:21:42 UTC

Manual allocation of Spark executors during Spark application runtime

Hi all,

I am wondering if there is a method to manually tune the number of Spark
executors when dynamic allocation is enabled on a Spark application. Say
for example I have I PySpark shell application running on Kubernetes:
```
$ python
> from pyspark.sql import SparkSession
> spark = (
    SparkSession
    .builder
    .config("spark.master", "k8s://<k8s-endpoint>")
    .config("spark.dynamicAllocation.enabled", "true")
    .config("spark.dynamicAllocation.minExecutors", "1")
    .config("spark.dynamicAllocation.maxExecutors", "4")
    .config("spark.dynamicAllocation.shuffleTracking.enabled", "true")
    .enableHiveSupport()
    .getOrCreate()
)
```

I am wondering if there is a way to do something like
```
spark.scaleExecutors(3)
```
to set the number of executors to 3. And if not through the Python API,
then through the SparkSession/SparkContext object in the JVM. (This API
call is made up, I am wondering about possible functionality.)

It seems to me that for dynamic allocation to work, there must be some
internal API being called that sets the number of executors, scaling it up
and down as required. I am not sure where/if this internal API exists (I've
had some trouble going through the source code; I am not too familiar with
Scala), but I am wondering if there is a way to make this internal API
external to the user.

The Kubernetes cluster I am using is an elastic resource, so my motivation
is to utilize this elasticity during application runtime. If a query in my
Spark application is running slowly, I'd like to add more executors to it
without restarting my query or recreating my SparkSession/SparkContext.

Thanks,
Steven