You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by redsanket <gi...@git.apache.org> on 2016/01/19 22:50:31 UTC
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
GitHub user redsanket opened a pull request:
https://github.com/apache/spark/pull/10838
[SPARK-6166] Limit number of concurrent outbound connections
This JIRA is related to
https://github.com/apache/spark/pull/5852
Had to do some minor rework and test to make sure it
works with current version of spark.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/redsanket/spark limit-outbound-connections
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10838.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10838
----
commit 32063006c28e4e94c6005e559e03465a1ce41e81
Author: Sanket <sc...@untilservice-lm>
Date: 2016-01-19T21:38:51Z
Limit number of concurrent outbound connections
commit 4b2bbd83f4fe02375f7ccfd73e091e000b3aae7b
Author: Sanket <sc...@untilservice-lm>
Date: 2016-01-19T21:46:12Z
merge resolution from upstream:master
commit 9761809f5129fd4a5f593a4904f9b086f46c9f76
Author: Sanket <sc...@untilservice-lm>
Date: 2016-01-19T21:48:41Z
Changed info level to debug level
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183071483
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183027600
@redsanket most of my comments are minor. Otherwise LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50269536
--- Diff: docs/configuration.md ---
@@ -392,6 +392,16 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td>Int.MaxValue</td>
+ <td>
+ "spark.reducer.maxMbInFlight" puts a bound on the in flight data in terms of size.
+ But this is not always sufficient when the number of hosts in the cluster increase,
--- End diff --
maxMBInFlight should be maxSizeInFlight
change "to fetches" to "to fetch"
Wrap the actual config variables with
I would prefer to see the description of spark.reducer.maxReqsInFlight first. Perhaps after that put something close to what you have about the maxMbInFlight. something like (feel free to change wording).
maxReqsInFlight limits number ..... This is sometimes needed when you have a large number of hosts and which can lead to a very large number of in-bound connections...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182940338
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52541897
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
But for ArrayBuffer prepends and deletes are linear
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173032887
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49716/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183065989
LGTM. will merge this one once the tests pass.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by tedyu <gi...@git.apache.org>.
Github user tedyu commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52811989
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
private def fetchUpToMaxBytes(): Unit = {
// Send fetch requests up to maxBytesInFlight
while (fetchRequests.nonEmpty &&
- (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
+ (bytesInFlight == 0 ||
+ (reqsInFlight + 1 <= maxReqsInFlight &&
--- End diff --
Right. The request would stay in fetchRequests
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173032886
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52542474
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
Isn't that amortized constant time vs linear?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50186855
--- Diff: docs/configuration.md ---
@@ -392,6 +392,17 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td><code>spark.reducer.maxReqsInFlight</code></td>
+ <td>20</td>
--- End diff --
The default is unlimited not 20.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r53812741
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
private def fetchUpToMaxBytes(): Unit = {
// Send fetch requests up to maxBytesInFlight
while (fetchRequests.nonEmpty &&
- (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
+ (bytesInFlight == 0 ||
+ (reqsInFlight + 1 <= maxReqsInFlight &&
--- End diff --
@zsxwing My account name is sanket991
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182993003
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51118/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52811613
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
private def fetchUpToMaxBytes(): Unit = {
// Send fetch requests up to maxBytesInFlight
while (fetchRequests.nonEmpty &&
- (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
+ (bytesInFlight == 0 ||
+ (reqsInFlight + 1 <= maxReqsInFlight &&
--- End diff --
No request will be dropped. It just limits the number of concurrent requests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52542439
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
ok thanks np that was an obvious question mostly EOD tired query. Sorry!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182911828
@zsxwing rebased and changed ArrayBuffer to HashSet
@tgravescs might want to take a look at it one more time
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52346381
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
Moreover, `fetchBlocks` will be called in other threads, `remainingBlocks` should be a thread-safe collection.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182936370
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52345827
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
nit: use HashSet instead of `ArrayBuffer`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/10838
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182939705
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-178141964
@zsxwing Other than the PR title - I think everything else says inflight in code/documentation. Did you see anything to the contrary ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50186875
--- Diff: docs/configuration.md ---
@@ -392,6 +392,17 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td><code>spark.reducer.maxReqsInFlight</code></td>
+ <td>20</td>
+ <td>
+ spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size.
--- End diff --
wrong description?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183072553
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182939709
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51114/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50613356
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -258,6 +268,9 @@ final class ShuffleBlockFetcherIterator(
val remoteRequests = splitLocalRemoteBlocks()
// Add the remote requests into our queue in a random order
fetchRequests ++= Utils.randomize(remoteRequests)
+ assert ((0 == reqsInFlight) == (0 == bytesInFlight),
+ "expected reqsInFlight = 0 but found reqsInFlight = " + reqsInFlight +
+ ", expected bytesInFlight = 0 but found bytesInFlight = " + bytesInFlight)
--- End diff --
cool, that makes it a bit clearer to read :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52653425
--- Diff: docs/configuration.md ---
@@ -392,6 +392,16 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td><code>spark.reducer.maxReqsInFlight</code></td>
+ <td>Int.MaxValue</td>
+ <td>
+ This configuration limits the number of remote blocks to fetch at any given point.
--- End diff --
nit: `the number of remote blocks to fetch` -> `the number of remote requests to fetch blocks`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52653442
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -169,7 +176,10 @@ final class ShuffleBlockFetcherIterator(
// Increment the ref count because we need to pass this to a different thread.
// This needs to be released after use.
buf.retain()
- results.put(new SuccessFetchResult(BlockId(blockId), address, sizeMap(blockId), buf))
+ remainingBlocks -= blockId
+ results.put(new SuccessFetchResult(BlockId(blockId), address, sizeMap(blockId), buf,
+ remainingBlocks.isEmpty))
+ logDebug("remainingBlocks" + remainingBlocks)
--- End diff --
nit: "remainingBlocks" -> "remainingBlocks: "
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52519913
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
Removing from `HashSet` is usually faster than `ArrayBuffer`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173682258
@wzhfy from looking at the linked original PR this is based on the large number of requests happens with a very large # of nodes and smaller amounts of data per node. Perhaps @JoshRosen would like to take a look at this since he asked for the previous PR to be updated to 2.0 :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-174773912
Actually cc @zsxwing who looked into this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183200242
Merging to master. Thanks, @redsanket
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183122765
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51130/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-181906980
@zswing any other comments, otherwise LGTM and I'll commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52518306
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
@zsxwing Just curious both ArrayBuffer and HashSet seem to be thread safe I presume from looking at the scala API and blockId seems to be unique. Just curious why we need to change it else I don't mind changing it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by tedyu <gi...@git.apache.org>.
Github user tedyu commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52811398
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
private def fetchUpToMaxBytes(): Unit = {
// Send fetch requests up to maxBytesInFlight
while (fetchRequests.nonEmpty &&
- (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
+ (bytesInFlight == 0 ||
+ (reqsInFlight + 1 <= maxReqsInFlight &&
--- End diff --
Should we have a metric for the number of dropped requests due to the above check ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-178129985
@tgravescs in general, this patch doesn't limit number of concurrent outbound connections. Instead, it just limits number of in flight blocks. Could you update the title and comments/variable names/configuration in this PR to make it accurate?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173008913
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52653512
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -47,6 +47,7 @@ import org.apache.spark.util.Utils
* For each block we also require the size (in bytes as a long field) in
* order to throttle the memory usage.
* @param maxBytesInFlight max size (in bytes) of remote blocks to fetch at any given point.
+ * @param maxReqsInFlight max number of remote blocks to fetch at any given point.
--- End diff --
nit: `max number of remote blocks to fetch` -> `number of remote requests to fetch blocks`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173437752
hi, @redsanket , in what situation will the number of requests become very large?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50186670
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -81,7 +81,7 @@ import org.apache.spark.util.Utils
* - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty
* for the HttpServer. Jetty supports multiple authentication mechanisms -
* Basic, Digest, Form, Spengo, etc. It also supports multiple different login
- * services - Hash, JAAS, Spnego, JDBC, etc. Spark currently uses the HashLoginService
+ * services - Hash, JAAS, Spengo, JDBC, etc. Spark currently uses the HashLoginService
--- End diff --
change this back please, its supposed to be SPNEGO.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r53821837
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -328,7 +345,9 @@ final class ShuffleBlockFetcherIterator(
private def fetchUpToMaxBytes(): Unit = {
// Send fetch requests up to maxBytesInFlight
while (fetchRequests.nonEmpty &&
- (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) {
+ (bytesInFlight == 0 ||
+ (reqsInFlight + 1 <= maxReqsInFlight &&
--- End diff --
> @zsxwing My account name is sanket991
Thanks! I found your account name finally :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50269518
--- Diff: docs/configuration.md ---
@@ -392,6 +392,16 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td>Int.MaxValue</td>
--- End diff --
formatting is wrong, you lost the first column of maxReqsInFlight
Add a line note
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183065836
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182991688
**[Test build #51118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51118/consoleFull)** for PR 10838 at commit [`f0778af`](https://github.com/apache/spark/commit/f0778af55271266c78451c0447f9a3c50bb49aa6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173010832
**[Test build #49716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49716/consoleFull)** for PR 10838 at commit [`9761809`](https://github.com/apache/spark/commit/9761809f5129fd4a5f593a4904f9b086f46c9f76).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183122764
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182943209
**[Test build #51118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51118/consoleFull)** for PR 10838 at commit [`f0778af`](https://github.com/apache/spark/commit/f0778af55271266c78451c0447f9a3c50bb49aa6).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173009800
LGTM - though would be better for someone else to also review this since I might be too close to the code !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50208175
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -81,7 +81,7 @@ import org.apache.spark.util.Utils
* - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty
* for the HttpServer. Jetty supports multiple authentication mechanisms -
* Basic, Digest, Form, Spengo, etc. It also supports multiple different login
- * services - Hash, JAAS, Spnego, JDBC, etc. Spark currently uses the HashLoginService
+ * services - Hash, JAAS, Spengo, JDBC, etc. Spark currently uses the HashLoginService
--- End diff --
The above line had Spengo, so I happened to see it and change it, I might have to change the other way round
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50449878
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -258,6 +268,8 @@ final class ShuffleBlockFetcherIterator(
val remoteRequests = splitLocalRemoteBlocks()
// Add the remote requests into our queue in a random order
fetchRequests ++= Utils.randomize(remoteRequests)
+ assert ((0 == reqsInFlight) == (0 == bytesInFlight),
--- End diff --
Maybe improve the assertion text a bit more? If this assertion fails it doesn't really say why its failing. (Just a maybe).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183063914
@zsxwing addressed
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182992993
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52654346
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -153,9 +158,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new HashSet[String]() ++ sizeMap.keys
--- End diff --
nit: Use `++=` like `val remainingBlocks = new HashSet[String]() ++= sizeMap.keys` to avoid creating a temp HashSet
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50208257
--- Diff: docs/configuration.md ---
@@ -392,6 +392,17 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td><code>spark.reducer.maxReqsInFlight</code></td>
+ <td>20</td>
+ <td>
+ spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size.
--- End diff --
The full description describes about the configuration if you can take a look at it. The first line just explains why the maxMbInFlight is insufficient
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173242407
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173242328
cc @rxin @JoshRosen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183079473
**[Test build #51130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51130/consoleFull)** for PR 10838 at commit [`c70e76f`](https://github.com/apache/spark/commit/c70e76fc0c0d3cdb91ac86d8408d5c02efe30dcc).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-182148417
@redsanket could you rebase this PR? I found a bug when reviewing this PR and fixed it in #11138 which made the conflicts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r50576595
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -258,6 +268,9 @@ final class ShuffleBlockFetcherIterator(
val remoteRequests = splitLocalRemoteBlocks()
// Add the remote requests into our queue in a random order
fetchRequests ++= Utils.randomize(remoteRequests)
+ assert ((0 == reqsInFlight) == (0 == bytesInFlight),
+ "expected reqsInFlight = 0 but found reqsInFlight = " + reqsInFlight +
+ ", expected bytesInFlight = 0 but found bytesInFlight = " + bytesInFlight)
--- End diff --
@holdenk Improved the assert information, can take look at it
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-178209754
@zsxwing Updated
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-178153656
Hm, I think `block` is also not `accurate`. Never mind, just update the title should be enough.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/10838#discussion_r52564828
--- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ---
@@ -143,9 +148,11 @@ final class ShuffleBlockFetcherIterator(
logDebug("Sending request for %d blocks (%s) from %s".format(
req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))
bytesInFlight += req.size
+ reqsInFlight += 1
// so we can look up the size of each blockID
val sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMap
+ val remainingBlocks = new ArrayBuffer[String]() ++ sizeMap.keys
--- End diff --
yup
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-173032736
**[Test build #49716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49716/consoleFull)** for PR 10838 at commit [`9761809`](https://github.com/apache/spark/commit/9761809f5129fd4a5f593a4904f9b086f46c9f76).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183122543
**[Test build #51130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51130/consoleFull)** for PR 10838 at commit [`c70e76f`](https://github.com/apache/spark/commit/c70e76fc0c0d3cdb91ac86d8408d5c02efe30dcc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183071484
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51129/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-178150124
> @zsxwing Other than the PR title - I think everything else says inflight in code/documentation. Did you see anything to the contrary ?
I mean changing `request/connection` to `block`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-183200705
@redsanket what's your JIRA account name? I want to assign it to you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-172999704
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of in flight outboun...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-181991403
Just a small nit. Otherwise, LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...
Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/10838#issuecomment-174534386
ping @rxin @JoshRosen once more. I'll leave this open for a couple more days.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org