You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by neoremind <gi...@git.apache.org> on 2017/08/16 14:34:43 UTC

[GitHub] spark pull request #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO...

GitHub user neoremind opened a pull request:

    https://github.com/apache/spark/pull/18964

    [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SNDBUF`  in SparkConf.

    ## What changes were proposed in this pull request?
    
    TCP parameters like SO_RCVBUF and SO_SNDBUF can be set in SparkConf, and `org.apache.spark.network.server.TransportServe`r can use those parameters to build server by leveraging netty. But for TransportClientFactory, there is no such way to set those parameters from SparkConf. This could be inconsistent in server and client side when people set parameters in SparkConf. So this PR make RPC client to be enable to use those TCP parameters as well.
    
    ## How was this patch tested?
    
    Existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/neoremind/spark add_client_param

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18964
    
----
commit 14ba13b1fec4fbe20b130283ba0d6b2d5c58bb87
Author: xu.zhang <xu...@hulu.com>
Date:   2017-08-16T14:30:25Z

    Enable RPC client to use ` SO_RCVBUF` and ` SO_SNDBUF`  in SparkConf.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO...

Posted by neoremind <gi...@git.apache.org>.
Github user neoremind commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18964#discussion_r135081475
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java ---
    @@ -210,6 +210,14 @@ private TransportClient createClient(InetSocketAddress address)
           .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
           .option(ChannelOption.ALLOCATOR, pooledAllocator);
     
    +    if (conf.receiveBuf() > 0) {
    +      bootstrap.option(ChannelOption.SO_RCVBUF, conf.receiveBuf());
    --- End diff --
    
    For server side, netty leverages reactor pattern and `ServerBootstrap` is used here, method `option` works when socket binding and connecting occurs, while `childOption` works for the later process when connection is established and there will be one channel per client.
    For client side, spark rpc uses `Bootstrap`, there is no method to set `childOption` because client works not like server side, it is pretty simple for client, only one thread-pool will be used by netty to do network I/O.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by neoremind <gi...@git.apache.org>.
Github user neoremind commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    @zsxwing Thanks for reviewing.  The project I mentioned above is for studying purpose and hope it will help others who are interested. I totally agree that spark rpc mainly for internal use, but as I tested, its performance is good though in general cases, which is good news :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18964


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    > I also notice that Spark RPC by default uses java native serialization, even a verifying endpoint exist or not request would cost 1K of payload size, not to mention some other real logic endpoint, so in the real world it might be useful to profile this
    
    @neoremind did you see any performance issue caused by Spark RPC? Spark doesn't send a lot of RPC messages. I don't see it's a bottleneck even when we tried to optimize the latency in Structured Streaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    Thanks. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81097/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by neoremind <gi...@git.apache.org>.
Github user neoremind commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    @cloud-fan would you take a look of the PR, the update is very simple. Thanks very much!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by neoremind <gi...@git.apache.org>.
Github user neoremind commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    @zsxwing I did try to create a performance test against spark rpc, the test result can be found [here](https://github.com/neoremind/kraps-rpc#4-performance-test), note that I created the project for studying purpose and the code is based on 2.1.0. But as you said, the performance would not be dropped as client not using `SO_RCVBUF` and `SO_SNDBUF` set in `SparkConf`.  
    
    For example, I use the scenario of concurrent calls 10, total calls 100k, keep all things as default, the QPS would be around 11k. When I set `SO_RCVBUF` and ` SO_SNDBUF`  to extremely small number like 100 the performance is affected tremendously. If they are set to a large number like 128k, the results won't be boosted by whether clients set the corresponding `SO_RCVBUF` and `SO_SNDBUF` value or not. 
    
    I admit that the update is trivial but from users' perspective, if `spark.{module}.io.sendBuffer` and `spark.{module}.io.sendBuffer` are exposed outside and could be set, and they only works on server side, I think it is a little bit not consistent, so I raise the PR to try to make it work on both server and client side, just to make them consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    This change looks reasonable, cc @zsxwing @cloud-fan for another look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18964#discussion_r135042032
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java ---
    @@ -210,6 +210,14 @@ private TransportClient createClient(InetSocketAddress address)
           .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
           .option(ChannelOption.ALLOCATOR, pooledAllocator);
     
    +    if (conf.receiveBuf() > 0) {
    +      bootstrap.option(ChannelOption.SO_RCVBUF, conf.receiveBuf());
    --- End diff --
    
    what's the difference between `option` and `childOption`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by neoremind <gi...@git.apache.org>.
Github user neoremind commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    @jerryshao please review my separated PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    The change looks OK to me. Did you meet the issue in which you have to change the buffer size in the client side?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    **[Test build #81097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81097/testReport)** for PR 18964 at commit [`14ba13b`](https://github.com/apache/spark/commit/14ba13b1fec4fbe20b130283ba0d6b2d5c58bb87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    **[Test build #81097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81097/testReport)** for PR 18964 at commit [`14ba13b`](https://github.com/apache/spark/commit/14ba13b1fec4fbe20b130283ba0d6b2d5c58bb87).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    @neoremind that's an interesting project. However, Spark RPC is not designed for high-performance and general RPC. In general, Spark just needs a good enough RPC system. That's why it's using Java serialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18964: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF...

Posted by neoremind <gi...@git.apache.org>.
Github user neoremind commented on the issue:

    https://github.com/apache/spark/pull/18964
  
    Not yet since it is OK to keep buffer size as default system value, but to keep it consistent as user would like to specify, this makes sense. 
    I also notice that Spark RPC by default uses java native serialization, even a verifying endpoint exist or not request would cost 1K of payload size, not to mention some other real logic endpoint, so in the real world it might be useful to profile this, I suggest maybe providing more RPC monitoring log to or hook would be beneficial, anyway this should be discussed in another thread. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org