You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "jiafu zhang (JIRA)" <ji...@apache.org> on 2019/01/16 06:21:00 UTC

[jira] [Created] (SPARK-26632) Separate Thread Configurations of Driver and Executor

jiafu zhang created SPARK-26632:
-----------------------------------

             Summary: Separate Thread Configurations of Driver and Executor
                 Key: SPARK-26632
                 URL: https://issues.apache.org/jira/browse/SPARK-26632
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: jiafu zhang


During the benchmark of Spark 2.4.0 on HPC (High Performance Computing), we identified an area can be optimized to improve RPC performance on large number of HPC nodes with omini-path NIC. It's same thread configurations for both driver and executor. From the test, we find driver and executor should have different thread configurations because driver has far more RPC messages than single executor.

These configurations are, 
||Config Key||for Driver||for Executor||
|spark.rpc.io.serverThreads|spark.driver.rpc.io.serverThreads|spark.executor.rpc.io.serverThreads|
|spark.rpc.io.clientThreads|spark.driver.rpc.io.clientThreads|spark.executor.rpc.io.clientThreads|
|spark.rpc.netty.dispatcher.numThreads|spark.driver.rpc.netty.dispatcher.numThreads|spark.executor.rpc.netty.dispatcher.numThreads|

When Spark reads thread configurations, it tries to read driver's configurations or executor's configurations first. Then fall back to the common thread configurations.

After the separation, the performance is improved a lot in 256 nodes and 512 nodes. see below test result of SimpleMapTask.
|| ||spark.driver.rpc.io.serverThreads||spark.driver.rpc.io.clientThreads||spark.driver.rpc.netty.dispatcher.numThreads||spark.executor.rpc.netty.dispatcher.numThreads||Overall Time (s)||Overall Time without Separation (s)||Improvement||
|128 nodes|15|15|10|30|107|108|0.9%|
|256 nodes|12|15|10|30|159|196|18.8%|
|512 nodes|12|15|10|30|283|377|24.9%|

 

The implementation is almost done. We are working on the code merge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org