You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rodrigo Boavida (Jira)" <ji...@apache.org> on 2021/12/03 10:37:00 UTC

[jira] [Created] (SPARK-37536) Allow for API user to disable Shuffle Operations while running locally

Rodrigo Boavida created SPARK-37536:
---------------------------------------

             Summary: Allow for API user to disable Shuffle Operations while running locally
                 Key: SPARK-37536
                 URL: https://issues.apache.org/jira/browse/SPARK-37536
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.3.0
         Environment: Spark running in local mode
            Reporter: Rodrigo Boavida


We have been using Spark on local mode, as a small embedded, in-memory SQL DB for our microservice.

Spark's powerful SQL features, and flexibility enables developers to build efficient data querying solutions. Due to the nature of our solution dealing with small datasets, which required to be queried through SQL, on very low latencies, we found the embedded approach a very good model.

We found through experimentation, that Spark on local mode, would gain significant performance improvements (on average between 20-30%) by disabling the shuffling on aggregation operations. This is done by expanding the query execution plan with ShuffleExchangeExec or the BroadcastExchangeExec.

I will be raising a PR, to propose introducing a new configuration variable 

*spark.shuffle.local.enabled*

This variable will default to true, and will be checked on the QueryExecution EnsureRequirements creation time, in conjunction with checking if Spark is running on local mode, will keep the execution plan unchanged if the value is false.

Looking forward any comments and feedback.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org