You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sharkdtu <gi...@git.apache.org> on 2018/06/28 08:07:26 UTC
[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...
GitHub user sharkdtu opened a pull request:
https://github.com/apache/spark/pull/21658
[SPARK-24678][Spark-Streaming] Give priority in use of 'PROCESS_LOCAL' for spark-streaming
## What changes were proposed in this pull request?
Currently, `BlockRDD.getPreferredLocations` only get hosts info of blocks, which results in subsequent schedule level is not better than 'NODE_LOCAL'. We can just make a small changes, the schedule level can be improved to 'PROCESS_LOCAL'
## How was this patch tested?
manual test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sharkdtu/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21658.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21658
----
commit 666fb4c5d343a1ea439ecc284d047810d6189c23
Author: sharkdtu <sh...@...>
Date: 2018-06-28T07:35:52Z
give priority in use of 'PROCESS_LOCAL' for spark-streaming
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...
Posted by sharkdtu <gi...@git.apache.org>.
Github user sharkdtu commented on a diff in the pull request:
https://github.com/apache/spark/pull/21658#discussion_r200310184
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
@@ -1569,7 +1569,7 @@ private[spark] object BlockManager {
val blockManagers = new HashMap[BlockId, Seq[String]]
for (i <- 0 until blockIds.length) {
- blockManagers(blockIds(i)) = blockLocations(i).map(_.host)
+ blockManagers(blockIds(i)) = blockLocations(i).map(b => s"executor_${b.host}_${b.executorId}")
--- End diff --
blockIdsToLocations ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92793/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21658#discussion_r200226750
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
@@ -1569,7 +1569,7 @@ private[spark] object BlockManager {
val blockManagers = new HashMap[BlockId, Seq[String]]
for (i <- 0 until blockIds.length) {
- blockManagers(blockIds(i)) = blockLocations(i).map(_.host)
+ blockManagers(blockIds(i)) = blockLocations(i).map(b => s"executor_${b.host}_${b.executorId}")
--- End diff --
The name of this method should be updated, `blockIdsToHosts` seems doesn't reflect your change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21658#discussion_r200536325
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
@@ -1569,7 +1569,7 @@ private[spark] object BlockManager {
val blockManagers = new HashMap[BlockId, Seq[String]]
for (i <- 0 until blockIds.length) {
- blockManagers(blockIds(i)) = blockLocations(i).map(_.host)
+ blockManagers(blockIds(i)) = blockLocations(i).map(b => s"executor_${b.host}_${b.executorId}")
--- End diff --
Also you'd better using `ExecutorCacheTaskLocation#toString` here instead of manually writing the location hint, which will be more robust.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92634/testReport)** for PR 21658 at commit [`666fb4c`](https://github.com/apache/spark/commit/666fb4c5d343a1ea439ecc284d047810d6189c23).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92634/testReport)** for PR 21658 at commit [`666fb4c`](https://github.com/apache/spark/commit/666fb4c5d343a1ea439ecc284d047810d6189c23).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21658
Please add the UTs as I mentioned before.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21658
Jenkins, retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21658
LGTM, merging to master branch.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92679/testReport)** for PR 21658 at commit [`adf39a5`](https://github.com/apache/spark/commit/adf39a53b24687154513028b4104239233c5c760).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21658
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92798/testReport)** for PR 21658 at commit [`380f242`](https://github.com/apache/spark/commit/380f242cea5ef1d43c0ec42f57598cf21b24c3e2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/21658
a late LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92798/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21658
Would you please add a UT for it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92634/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by sharkdtu <gi...@git.apache.org>.
Github user sharkdtu commented on the issue:
https://github.com/apache/spark/pull/21658
@jerryshao Yeah, I hava verified it in our cluster, and the locality is 'PROCESS_LOCAL'.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92798/testReport)** for PR 21658 at commit [`380f242`](https://github.com/apache/spark/commit/380f242cea5ef1d43c0ec42f57598cf21b24c3e2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21658#discussion_r200535022
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
@@ -1569,7 +1569,7 @@ private[spark] object BlockManager {
val blockManagers = new HashMap[BlockId, Seq[String]]
for (i <- 0 until blockIds.length) {
- blockManagers(blockIds(i)) = blockLocations(i).map(_.host)
+ blockManagers(blockIds(i)) = blockLocations(i).map(b => s"executor_${b.host}_${b.executorId}")
--- End diff --
Yeah, it's OK.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92727/testReport)** for PR 21658 at commit [`4750260`](https://github.com/apache/spark/commit/47502603d0e2116fb3b789335bf6ebf7836c61de).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21658#discussion_r200586976
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
@@ -1569,7 +1570,8 @@ private[spark] object BlockManager {
val blockManagers = new HashMap[BlockId, Seq[String]]
for (i <- 0 until blockIds.length) {
- blockManagers(blockIds(i)) = blockLocations(i).map(_.host)
+ blockManagers(blockIds(i)) = blockLocations(i).map(
--- End diff --
nit:
```
blockLoations(i).map { loc =>
xxx
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92679/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92679/testReport)** for PR 21658 at commit [`adf39a5`](https://github.com/apache/spark/commit/adf39a53b24687154513028b4104239233c5c760).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92727/testReport)** for PR 21658 at commit [`4750260`](https://github.com/apache/spark/commit/47502603d0e2116fb3b789335bf6ebf7836c61de).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21658
ok to test.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92793/testReport)** for PR 21658 at commit [`380f242`](https://github.com/apache/spark/commit/380f242cea5ef1d43c0ec42f57598cf21b24c3e2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21658
**[Test build #92793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92793/testReport)** for PR 21658 at commit [`380f242`](https://github.com/apache/spark/commit/380f242cea5ef1d43c0ec42f57598cf21b24c3e2).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/21658
Hi @sharkdtu , did you also verify this in your cluster, to see if the locality is correct or not?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21658
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92727/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org