You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by redsanket <gi...@git.apache.org> on 2016/01/19 22:50:31 UTC

[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

GitHub user redsanket opened a pull request:

    https://github.com/apache/spark/pull/10838

    [SPARK-6166] Limit number of concurrent outbound connections

    This JIRA is related to
    https://github.com/apache/spark/pull/5852
    Had to do some minor rework and test to make sure it 
    works with current version of spark.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/redsanket/spark limit-outbound-connections

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10838
    
----
commit 32063006c28e4e94c6005e559e03465a1ce41e81
Author: Sanket <sc...@untilservice-lm>
Date:   2016-01-19T21:38:51Z

    Limit number of concurrent outbound connections

commit 4b2bbd83f4fe02375f7ccfd73e091e000b3aae7b
Author: Sanket <sc...@untilservice-lm>
Date:   2016-01-19T21:46:12Z

    merge resolution from upstream:master

commit 9761809f5129fd4a5f593a4904f9b086f46c9f76
Author: Sanket <sc...@untilservice-lm>
Date:   2016-01-19T21:48:41Z

    Changed info level to debug level

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183071483
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183027600
  
    @redsanket most of my comments are minor. Otherwise LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50269536
  
    --- Diff: docs/configuration.md ---
    @@ -392,6 +392,16 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td>Int.MaxValue</td>
    +  <td>
    +    "spark.reducer.maxMbInFlight" puts a bound on the in flight data in terms of size.
    +    But this is not always sufficient when the number of hosts in the cluster increase,
    --- End diff --
    
    maxMBInFlight should be maxSizeInFlight
    
    change "to fetches" to "to fetch"
    
    Wrap the actual config variables with 
    
    I would prefer to see the description of spark.reducer.maxReqsInFlight first. Perhaps after that put something close to what you have about the maxMbInFlight. something like (feel free to change wording).
    
    maxReqsInFlight limits number ..... This is sometimes needed when you have a large number of hosts and which can lead to a very large number of in-bound connections...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182940338
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52541897
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    But for ArrayBuffer prepends and deletes are linear


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173032887
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49716/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183065989
  
    LGTM. will merge this one once the tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by tedyu <gi...@git.apache.org>.
Github user tedyu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52811989
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
       private def fetchUpToMaxBytes(): Unit = {
         // Send fetch requests up to maxBytesInFlight
         while (fetchRequests.nonEmpty &&
    -      (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
    +      (bytesInFlight == 0 ||
    +        (reqsInFlight + 1 <= maxReqsInFlight &&
    --- End diff --
    
    Right. The request would stay in fetchRequests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173032886
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52542474
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    Isn't that amortized constant time vs linear?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50186855
  
    --- Diff: docs/configuration.md ---
    @@ -392,6 +392,17 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td><code>spark.reducer.maxReqsInFlight</code></td>
    +  <td>20</td>
    --- End diff --
    
    The default is unlimited not 20.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r53812741
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
       private def fetchUpToMaxBytes(): Unit = {
         // Send fetch requests up to maxBytesInFlight
         while (fetchRequests.nonEmpty &&
    -      (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
    +      (bytesInFlight == 0 ||
    +        (reqsInFlight + 1 <= maxReqsInFlight &&
    --- End diff --
    
    @zsxwing My account name is sanket991


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182993003
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51118/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52811613
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
       private def fetchUpToMaxBytes(): Unit = {
         // Send fetch requests up to maxBytesInFlight
         while (fetchRequests.nonEmpty &&
    -      (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
    +      (bytesInFlight == 0 ||
    +        (reqsInFlight + 1 <= maxReqsInFlight &&
    --- End diff --
    
    No request will be dropped. It just limits the number of concurrent requests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52542439
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    ok thanks np that was an obvious question mostly EOD tired query. Sorry!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182911828
  
    @zsxwing rebased and changed ArrayBuffer to HashSet
    @tgravescs might want to take a look at it one more time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52346381
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    Moreover, `fetchBlocks` will be called in other threads, `remainingBlocks` should be a thread-safe collection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182936370
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52345827
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    nit: use HashSet instead of `ArrayBuffer`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10838


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182939705
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-178141964
  
    @zsxwing Other than the PR title - I think everything else says inflight in code/documentation. Did you see anything to the contrary ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50186875
  
    --- Diff: docs/configuration.md ---
    @@ -392,6 +392,17 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td><code>spark.reducer.maxReqsInFlight</code></td>
    +  <td>20</td>
    +  <td>
    +    spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size.
    --- End diff --
    
    wrong description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183072553
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182939709
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51114/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50613356
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -258,6 +268,9 @@ final class ShuffleBlockFetcherIterator(
         val remoteRequests = splitLocalRemoteBlocks()
         // Add the remote requests into our queue in a random order
         fetchRequests ++= Utils.randomize(remoteRequests)
    +    assert ((0 == reqsInFlight) == (0 == bytesInFlight),
    +      "expected reqsInFlight = 0 but found reqsInFlight = " + reqsInFlight +
    +      ", expected bytesInFlight = 0 but found bytesInFlight = " + bytesInFlight)
    --- End diff --
    
    cool, that makes it a bit clearer to read :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52653425
  
    --- Diff: docs/configuration.md ---
    @@ -392,6 +392,16 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td><code>spark.reducer.maxReqsInFlight</code></td>
    +  <td>Int.MaxValue</td>
    +  <td>
    +    This configuration limits the number of remote blocks to fetch at any given point.
    --- End diff --
    
    nit: `the number of remote blocks to fetch` -> `the number of remote requests to fetch blocks`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52653442
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -169,7 +176,10 @@ final class ShuffleBlockFetcherIterator(
                   // Increment the ref count because we need to pass this to a different thread.
                   // This needs to be released after use.
                   buf.retain()
    -              results.put(new SuccessFetchResult(BlockId(blockId), address, sizeMap(blockId), buf))
    +              remainingBlocks -= blockId
    +              results.put(new SuccessFetchResult(BlockId(blockId), address, sizeMap(blockId), buf,
    +                remainingBlocks.isEmpty))
    +              logDebug("remainingBlocks" + remainingBlocks)
    --- End diff --
    
    nit: "remainingBlocks" -> "remainingBlocks: "


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52519913
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    Removing from `HashSet` is usually faster than `ArrayBuffer`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173682258
  
    @wzhfy from looking at the linked original PR this is based on the large number of requests happens with a very large # of nodes and smaller amounts of data per node. Perhaps @JoshRosen would like to take a look at this since he asked for the previous PR to be updated to 2.0 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-174773912
  
    Actually cc @zsxwing who looked into this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183200242
  
    Merging to master. Thanks, @redsanket


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183122765
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51130/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-181906980
  
    @zswing any other comments, otherwise LGTM and I'll commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52518306
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    @zsxwing Just curious both ArrayBuffer and HashSet seem to be thread safe I presume from looking at the scala API and blockId seems to be unique. Just curious why we need to change it else I don't mind changing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by tedyu <gi...@git.apache.org>.
Github user tedyu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52811398
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
       private def fetchUpToMaxBytes(): Unit = {
         // Send fetch requests up to maxBytesInFlight
         while (fetchRequests.nonEmpty &&
    -      (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
    +      (bytesInFlight == 0 ||
    +        (reqsInFlight + 1 <= maxReqsInFlight &&
    --- End diff --
    
    Should we have a metric for the number of dropped requests due to the above check ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-178129985
  
    @tgravescs in general, this patch doesn't limit number of concurrent outbound connections. Instead, it just limits number of in flight blocks. Could you update the title and comments/variable names/configuration in this PR to make it accurate? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173008913
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52653512
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -47,6 +47,7 @@ import org.apache.spark.util.Utils
      *                        For each block we also require the size (in bytes as a long field) in
      *                        order to throttle the memory usage.
      * @param maxBytesInFlight max size (in bytes) of remote blocks to fetch at any given point.
    + * @param maxReqsInFlight max number of remote blocks to fetch at any given point.
    --- End diff --
    
    nit: `max number of remote blocks to fetch` -> `number of remote requests to fetch blocks`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173437752
  
    hi, @redsanket , in what situation will the number of requests become very large?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50186670
  
    --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
    @@ -81,7 +81,7 @@ import org.apache.spark.util.Utils
      *  - HTTP for broadcast and file server (via HttpServer) ->  Spark currently uses Jetty
      *            for the HttpServer. Jetty supports multiple authentication mechanisms -
      *            Basic, Digest, Form, Spengo, etc. It also supports multiple different login
    - *            services - Hash, JAAS, Spnego, JDBC, etc.  Spark currently uses the HashLoginService
    + *            services - Hash, JAAS, Spengo, JDBC, etc.  Spark currently uses the HashLoginService
    --- End diff --
    
    change this back please, its supposed to be SPNEGO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r53821837
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
       private def fetchUpToMaxBytes(): Unit = {
         // Send fetch requests up to maxBytesInFlight
         while (fetchRequests.nonEmpty &&
    -      (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
    +      (bytesInFlight == 0 ||
    +        (reqsInFlight + 1 <= maxReqsInFlight &&
    --- End diff --
    
    > @zsxwing My account name is sanket991
    
    Thanks! I found your account name finally :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50269518
  
    --- Diff: docs/configuration.md ---
    @@ -392,6 +392,16 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td>Int.MaxValue</td>
    --- End diff --
    
    formatting is wrong, you lost the first column of maxReqsInFlight
     Add a line note



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183065836
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182991688
  
    **[Test build #51118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51118/consoleFull)** for PR 10838 at commit [`f0778af`](https://github.com/apache/spark/commit/f0778af55271266c78451c0447f9a3c50bb49aa6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173010832
  
    **[Test build #49716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49716/consoleFull)** for PR 10838 at commit [`9761809`](https://github.com/apache/spark/commit/9761809f5129fd4a5f593a4904f9b086f46c9f76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183122764
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182943209
  
    **[Test build #51118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51118/consoleFull)** for PR 10838 at commit [`f0778af`](https://github.com/apache/spark/commit/f0778af55271266c78451c0447f9a3c50bb49aa6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173009800
  
    LGTM - though would be better for someone else to also review this since I might be too close to the code !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50208175
  
    --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
    @@ -81,7 +81,7 @@ import org.apache.spark.util.Utils
      *  - HTTP for broadcast and file server (via HttpServer) ->  Spark currently uses Jetty
      *            for the HttpServer. Jetty supports multiple authentication mechanisms -
      *            Basic, Digest, Form, Spengo, etc. It also supports multiple different login
    - *            services - Hash, JAAS, Spnego, JDBC, etc.  Spark currently uses the HashLoginService
    + *            services - Hash, JAAS, Spengo, JDBC, etc.  Spark currently uses the HashLoginService
    --- End diff --
    
    The above line had Spengo, so I happened to see it and change it, I might have to change the other way round


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50449878
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -258,6 +268,8 @@ final class ShuffleBlockFetcherIterator(
         val remoteRequests = splitLocalRemoteBlocks()
         // Add the remote requests into our queue in a random order
         fetchRequests ++= Utils.randomize(remoteRequests)
    +    assert ((0 == reqsInFlight) == (0 == bytesInFlight),
    --- End diff --
    
    Maybe improve the assertion text a bit more? If this assertion fails it doesn't really say why its failing. (Just a maybe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183063914
  
    @zsxwing addressed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182992993
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52654346
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -153,9 +158,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new HashSet[String]() ++ sizeMap.keys
    --- End diff --
    
    nit: Use `++=` like `val remainingBlocks = new HashSet[String]() ++= sizeMap.keys` to avoid creating a temp HashSet


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50208257
  
    --- Diff: docs/configuration.md ---
    @@ -392,6 +392,17 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    +  <td><code>spark.reducer.maxReqsInFlight</code></td>
    +  <td>20</td>
    +  <td>
    +    spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size.
    --- End diff --
    
    The full description describes about the configuration if you can take a look at it. The first line just explains why the maxMbInFlight is insufficient


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173242407
  
    
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173242328
  
    cc @rxin @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183079473
  
    **[Test build #51130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51130/consoleFull)** for PR 10838 at commit [`c70e76f`](https://github.com/apache/spark/commit/c70e76fc0c0d3cdb91ac86d8408d5c02efe30dcc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-182148417
  
    @redsanket could you rebase this PR? I found a bug when reviewing this PR and fixed it in #11138 which made the conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r50576595
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -258,6 +268,9 @@ final class ShuffleBlockFetcherIterator(
         val remoteRequests = splitLocalRemoteBlocks()
         // Add the remote requests into our queue in a random order
         fetchRequests ++= Utils.randomize(remoteRequests)
    +    assert ((0 == reqsInFlight) == (0 == bytesInFlight),
    +      "expected reqsInFlight = 0 but found reqsInFlight = " + reqsInFlight +
    +      ", expected bytesInFlight = 0 but found bytesInFlight = " + bytesInFlight)
    --- End diff --
    
    @holdenk Improved the assert information, can take look at it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-178209754
  
    @zsxwing Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-178153656
  
    Hm, I think `block` is also not `accurate`. Never mind, just update the title should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10838#discussion_r52564828
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
    @@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
         logDebug("Sending request for %d blocks (%s) from %s".format(
           req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
         bytesInFlight += req.size
    +    reqsInFlight += 1
     
         // so we can look up the size of each blockID
         val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
    +    val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
    --- End diff --
    
    yup


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-173032736
  
    **[Test build #49716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49716/consoleFull)** for PR 10838 at commit [`9761809`](https://github.com/apache/spark/commit/9761809f5129fd4a5f593a4904f9b086f46c9f76).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183122543
  
    **[Test build #51130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51130/consoleFull)** for PR 10838 at commit [`c70e76f`](https://github.com/apache/spark/commit/c70e76fc0c0d3cdb91ac86d8408d5c02efe30dcc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183071484
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51129/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-178150124
  
    > @zsxwing Other than the PR title - I think everything else says inflight in code/documentation. Did you see anything to the contrary ?
    
    I mean changing `request/connection` to `block`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-183200705
  
    @redsanket what's your JIRA account name? I want to assign it to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-172999704
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-181991403
  
    Just a small nit. Otherwise, LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/10838#issuecomment-174534386
  
    ping @rxin @JoshRosen once more. I'll leave this open for a couple more days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org