You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2015/10/12 10:50:33 UTC

[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/9075

    [SPARK-11063][Streaming]Change preferredLocations of Receiver's RDD to hosts rather than hostports

    The format of RDD's preferredLocations must be hostname but the format of Streaming Receiver's scheduling executors is hostport. So it doesn't work.
    
    This PR converts `schedulerExecutors` to `hosts` before creating Receiver's RDD.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-11063

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9075
    
----
commit 4706ec0ba2ee9a82b279c46abf7005894e593b3c
Author: zsxwing <zs...@gmail.com>
Date:   2015-10-12T08:47:36Z

    Change preferredLocations of Receiver's RDD to hosts rather than hostports

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42259823
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -388,7 +388,7 @@ private[spark] class TaskSetManager(
         if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {
           // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
           for (index <- dequeueTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -        return Some((index, TaskLocality.PROCESS_LOCAL, false))
    +        return Some((index, TaskLocality.NO_PREF, false))
    --- End diff --
    
    sure, will do tomorrow morning 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148896953
  
    LGTM. Just a clarifying question. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42305278
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -388,7 +388,7 @@ private[spark] class TaskSetManager(
         if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {
           // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
           for (index <- dequeueTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -        return Some((index, TaskLocality.PROCESS_LOCAL, false))
    +        return Some((index, TaskLocality.NO_PREF, false))
    --- End diff --
    
    hmmm....it's a typo but would not take any effect 
    
    see discussions here:  https://github.com/apache/spark/pull/3816 (only look at the comments after Dec 30, 2014 is enough)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-147702807
  
    The fix looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-147332022
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148883508
  
      [Test build #43877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43877/consoleFull) for   PR 9075 at commit [`49b7792`](https://github.com/apache/spark/commit/49b779279a1b3bf7a061982d3511f37e3dfdb213).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42307531
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/scheduler/ReceiverTrackerSuite.scala ---
    @@ -80,6 +82,28 @@ class ReceiverTrackerSuite extends TestSuiteBase {
           }
         }
       }
    +
    +  test("SPARK-11063: TaskSetManager should use Receiver RDD's preferredLocations") {
    +    // Use ManualClock to prevent from starting batches so that we can make sure the only task is
    +    // for starting the Receiver
    +    val _conf = conf.clone.set("spark.streaming.clock", "org.apache.spark.util.ManualClock")
    +    withStreamingContext(new StreamingContext(_conf, Milliseconds(100))) { ssc =>
    +      @volatile var receiverTaskLocality: TaskLocality = null
    +      ssc.sparkContext.addSparkListener(new SparkListener {
    +        override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +          receiverTaskLocality = taskStart.taskInfo.taskLocality
    +        }
    +      })
    +      val input = ssc.receiverStream(new TestReceiver)
    +      val output = new TestOutputStream(input)
    +      output.register()
    +      ssc.start()
    +      eventually(timeout(10 seconds), interval(10 millis)) {
    +        // If preferredLocations is set correctly, receiverTaskLocality should be NODE_LOCAL
    +        assert(receiverTaskLocality === TaskLocality.NODE_LOCAL)
    --- End diff --
    
    What will it be if it is not set correctly? That is, without the fix above, what would locality be


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42289720
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -388,7 +388,7 @@ private[spark] class TaskSetManager(
         if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {
           // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
           for (index <- dequeueTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -        return Some((index, TaskLocality.PROCESS_LOCAL, false))
    +        return Some((index, TaskLocality.NO_PREF, false))
    --- End diff --
    
    This looks like a significant bug in the core. If so probably go into a separate patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148757447
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42310297
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/scheduler/ReceiverTrackerSuite.scala ---
    @@ -80,6 +82,28 @@ class ReceiverTrackerSuite extends TestSuiteBase {
           }
         }
       }
    +
    +  test("SPARK-11063: TaskSetManager should use Receiver RDD's preferredLocations") {
    +    // Use ManualClock to prevent from starting batches so that we can make sure the only task is
    +    // for starting the Receiver
    +    val _conf = conf.clone.set("spark.streaming.clock", "org.apache.spark.util.ManualClock")
    +    withStreamingContext(new StreamingContext(_conf, Milliseconds(100))) { ssc =>
    +      @volatile var receiverTaskLocality: TaskLocality = null
    +      ssc.sparkContext.addSparkListener(new SparkListener {
    +        override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +          receiverTaskLocality = taskStart.taskInfo.taskLocality
    +        }
    +      })
    +      val input = ssc.receiverStream(new TestReceiver)
    +      val output = new TestOutputStream(input)
    +      output.register()
    +      ssc.start()
    +      eventually(timeout(10 seconds), interval(10 millis)) {
    +        // If preferredLocations is set correctly, receiverTaskLocality should be NODE_LOCAL
    +        assert(receiverTaskLocality === TaskLocality.NODE_LOCAL)
    --- End diff --
    
    It's `TaskLocality.ANY` without this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148885689
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43877/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-147354342
  
      [Test build #43564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43564/console) for   PR 9075 at commit [`4706ec0`](https://github.com/apache/spark/commit/4706ec0ba2ee9a82b279c46abf7005894e593b3c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148657044
  
    What is the implication of this bug in case for the scheduling? We are trying to evenly distribute receivers to executors. But if we set the preferredLocations to only the granularity of hosts, then the following could happen. Two executors can be in the same host (through YARN). We set the preferred locations for two receivers for that same host even though we want them to be in two different executors. It may so happen that both executors get scheduled at the same executor. Isnt it? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42305276
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -388,7 +388,7 @@ private[spark] class TaskSetManager(
         if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {
           // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
           for (index <- dequeueTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -        return Some((index, TaskLocality.PROCESS_LOCAL, false))
    +        return Some((index, TaskLocality.NO_PREF, false))
    --- End diff --
    
    This is not related. I will move it in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42259554
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -388,7 +388,7 @@ private[spark] class TaskSetManager(
         if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {
           // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
           for (index <- dequeueTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -        return Some((index, TaskLocality.PROCESS_LOCAL, false))
    +        return Some((index, TaskLocality.NO_PREF, false))
    --- End diff --
    
    @CodingCat could you take a look at this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148885688
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9075


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148662675
  
    I think we could add another `TaskLocation`, like `ExecutorTaskLocation` and pass the `preferredLocations` with special tag prefix like HDFS in memory tag, to let TaskManager now we want to schedule the task with process local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-147335370
  
      [Test build #43564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43564/consoleFull) for   PR 9075 at commit [`4706ec0`](https://github.com/apache/spark/commit/4706ec0ba2ee9a82b279c46abf7005894e593b3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148778827
  
    Tested this patch in a 5 nodes cluster. Each node has an executor, the executor core is 1 and the receiver number is 5. 
    
    Before this patch, there were several Receiver restarting logs. After applying this patch, all restarting logs disappeared.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-149159428
  
    Merging this to master and branch 1.5


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148757466
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42289756
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -388,7 +388,7 @@ private[spark] class TaskSetManager(
         if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {
           // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
           for (index <- dequeueTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -        return Some((index, TaskLocality.PROCESS_LOCAL, false))
    +        return Some((index, TaskLocality.NO_PREF, false))
    --- End diff --
    
    And how is this related to the main bug we are considering in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-147354464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43564/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148659530
  
    Yeah, we have to fix it for branch 1.5 with a small fix like this. 
    Is it possible to test this with unit tests? Could you check if there are unit tests of the DAGScheduler / TaskScheduler that we can use to unit test so that something like this does not happen ever again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148883338
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-147332103
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148883343
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148658439
  
    > What is the implication of this bug in case for the scheduling? We are trying to evenly distribute receivers to executors. But if we set the preferredLocations to only the granularity of hosts, then the following could happen. Two executors can be in the same host (through YARN). We set the preferred locations for two receivers for that same host even though we want them to be in two different executors. It may so happen that both executors get scheduled at the same executor. Isnt it?
    
    Right. However, `preferredLocations` only supports hosts. So at least for branch 1.5, this is the best can can do. We cam enhance `preferredLocations` to support host and port in master branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-147354463
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148759401
  
      [Test build #43840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43840/consoleFull) for   PR 9075 at commit [`2d7c030`](https://github.com/apache/spark/commit/2d7c030065d206418c08d278f0808c3ef3e7cde4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148796367
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148658550
  
    From my understanding, Spark's scheduler does not support scheduling with port (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148795961
  
      [Test build #43840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43840/console) for   PR 9075 at commit [`2d7c030`](https://github.com/apache/spark/commit/2d7c030065d206418c08d278f0808c3ef3e7cde4).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9075#discussion_r42305506
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -388,7 +388,7 @@ private[spark] class TaskSetManager(
         if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {
           // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
           for (index <- dequeueTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -        return Some((index, TaskLocality.PROCESS_LOCAL, false))
    +        return Some((index, TaskLocality.NO_PREF, false))
    --- End diff --
    
    Didn't notice this PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148796371
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43840/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148756762
  
    Added a unit test and fixed a minor issue in `TaskSetManager`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11063][Streaming]Change preferredLocati...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9075#issuecomment-148885661
  
      [Test build #43877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43877/console) for   PR 9075 at commit [`49b7792`](https://github.com/apache/spark/commit/49b779279a1b3bf7a061982d3511f37e3dfdb213).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org