You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aarondav <gi...@git.apache.org> on 2014/12/16 21:07:19 UTC

[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

GitHub user aarondav opened a pull request:

    https://github.com/apache/spark/pull/3713

    [SPARK-4864] Add documentation to Netty-based configs

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aarondav/spark netty-configs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3713
    
----
commit 3b1f84e8f2f6aa736f3c625e42926cb1e0c25381
Author: Aaron Davidson <aa...@databricks.com>
Date:   2014-12-16T19:53:05Z

    [SPARK-4864] Add documentation to Netty-based configs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67409650
  
      [Test build #24559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24559/consoleFull) for   PR 3713 at commit [`8a8b373`](https://github.com/apache/spark/commit/8a8b3739d1b357fc2f993c43d6cbdba9d306d27c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3713#discussion_r21990285
  
    --- Diff: docs/configuration.md ---
    @@ -852,6 +852,59 @@ Apart from these, the following properties are also available, and may be useful
         between nodes leading to flooding the network with those.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.shuffle.io.preferDirectBufs</code></td>
    +  <td>true</td>
    +  <td>
    +    (Netty only) Off-heap buffers are used to reduce garbage collection during shuffle and cache 
    +    block transfer. For environments where off-heap memory is tightly limited, users may wish to 
    +    turn this off to force all allocations from Netty to be on-heap.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.io.numConnectionsPerPeer</code></td>
    +  <td>1</td>
    +  <td>
    +    (Netty only) Connections between hosts are reused in order to reduce connection buildup for 
    +    large clusters. For small clusters with many hard disks, this may result in insufficient
    --- End diff --
    
    should this say many hard disks and a small number of hosts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67409260
  
    Addressed your comments -- removed server/clientThreads


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3713


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67420193
  
      [Test build #24559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24559/consoleFull) for   PR 3713 at commit [`8a8b373`](https://github.com/apache/spark/commit/8a8b3739d1b357fc2f993c43d6cbdba9d306d27c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67236534
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24505/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3713#discussion_r21990371
  
    --- Diff: docs/configuration.md ---
    @@ -852,6 +852,59 @@ Apart from these, the following properties are also available, and may be useful
         between nodes leading to flooding the network with those.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.shuffle.io.preferDirectBufs</code></td>
    +  <td>true</td>
    +  <td>
    +    (Netty only) Off-heap buffers are used to reduce garbage collection during shuffle and cache 
    +    block transfer. For environments where off-heap memory is tightly limited, users may wish to 
    +    turn this off to force all allocations from Netty to be on-heap.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.io.numConnectionsPerPeer</code></td>
    +  <td>1</td>
    +  <td>
    +    (Netty only) Connections between hosts are reused in order to reduce connection buildup for 
    +    large clusters. For small clusters with many hard disks, this may result in insufficient
    +    concurrency to saturate all disks, and so users may consider increasing this value.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.io.serverThreads</code></td>
    --- End diff --
    
    Or at least, could you give some indication of exactly why they might increase it (i.e. if you find you can't saturate the network)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67223679
  
      [Test build #24505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24505/consoleFull) for   PR 3713 at commit [`3b1f84e`](https://github.com/apache/spark/commit/3b1f84e8f2f6aa736f3c625e42926cb1e0c25381).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67236521
  
      [Test build #24505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24505/consoleFull) for   PR 3713 at commit [`3b1f84e`](https://github.com/apache/spark/commit/3b1f84e8f2f6aa736f3c625e42926cb1e0c25381).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3713#discussion_r21990331
  
    --- Diff: docs/configuration.md ---
    @@ -852,6 +852,59 @@ Apart from these, the following properties are also available, and may be useful
         between nodes leading to flooding the network with those.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.shuffle.io.preferDirectBufs</code></td>
    +  <td>true</td>
    +  <td>
    +    (Netty only) Off-heap buffers are used to reduce garbage collection during shuffle and cache 
    +    block transfer. For environments where off-heap memory is tightly limited, users may wish to 
    +    turn this off to force all allocations from Netty to be on-heap.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.io.numConnectionsPerPeer</code></td>
    +  <td>1</td>
    +  <td>
    +    (Netty only) Connections between hosts are reused in order to reduce connection buildup for 
    +    large clusters. For small clusters with many hard disks, this may result in insufficient
    +    concurrency to saturate all disks, and so users may consider increasing this value.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.io.serverThreads</code></td>
    --- End diff --
    
    Should this be undocumented? One way I tend to think of it is whether someone could reasonably understand why they should tune this value. It seems a little vague here when someone should adjust this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67363838
  
    Hey Aaron,
    
    This looks good overall but I wonder whether we should leave out serverThreads and clientThreads for now. At least on the surface of it I'm not sure users would really understand when they need to adjust it, and the doc itself says they likely wouldn't need to adjust it ever.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67890065
  
    Thanks - I merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4864] Add documentation to Netty-based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3713#issuecomment-67420200
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24559/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org