You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jaceklaskowski <gi...@git.apache.org> on 2015/10/05 07:14:27 UTC

[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

GitHub user jaceklaskowski opened a pull request:

    https://github.com/apache/spark/pull/8976

    [SPARK-10921] [YARN] Completely remove the use of SparkContext.prefer…

    …redNodeLocationData

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaceklaskowski/spark SPARK-10921

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8976.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8976
    
----
commit 1353686d6f963f6316a279f3425effd17973f61a
Author: Jacek Laskowski <ja...@deepsense.io>
Date:   2015-10-05T04:12:05Z

    [SPARK-10921] [YARN] Completely remove the use of SparkContext.preferredNodeLocationData

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-149151638
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-148512484
  
    @srowen Mind looking at the PR once again? I'd appreciate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145492004
  
      [Test build #1841 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1841/consoleFull) for   PR 8976 at commit [`1353686`](https://github.com/apache/spark/commit/1353686d6f963f6316a279f3425effd17973f61a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8976#discussion_r41289941
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -147,23 +139,41 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
        * @param jars Collection of JARs to send to the cluster. These can be paths on the local file
        *             system or HDFS, HTTP, HTTPS, or FTP URLs.
        * @param environment Environment variables to set on worker nodes.
    -   * @param preferredNodeLocationData used in YARN mode to select nodes to launch containers on.
    -   * Can be generated using [[org.apache.spark.scheduler.InputFormatInfo.computePreferredLocations]]
    -   * from a list of input files or InputFormats for the application.
    +   * @param preferredNodeLocationData not used. Left for backward compatibility.
        */
    +  @deprecated("Passing in preferred locations has no effect at all, see SPARK-10921", "1.5.1")
    --- End diff --
    
    (PS I think this will be 1.6.0, not 1.5.x)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146477140
  
    Yeah, but this is also changing the public-facing API. We'd be deprecating and then undeprecating interfaces, and then potentially deprecating other interfaces. If this code really has a future purpose, and it's not that much, not causing any harm, I'm not as clear there's a compelling reason to remove it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-148538673
  
      [Test build #1907 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1907/consoleFull) for   PR 8976 at commit [`6ca6cc7`](https://github.com/apache/spark/commit/6ca6cc78d793d7b88081aaf922c28a47ec892300).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146574566
  
    SPARK-2089 is marked as resolved so any discussion there is done (or am I mistaken?)
    
    Also, I'm saying that the comments say it's used in YARN while it's not. My change *merely* removes "the noise".
    
    I fully agree with your other points.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-147602759
  
      [Test build #1886 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1886/consoleFull) for   PR 8976 at commit [`7ffc52a`](https://github.com/apache/spark/commit/7ffc52a636a02359da3b67253a6ce36119ddcf01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8976#discussion_r41827562
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -147,10 +139,9 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
        * @param jars Collection of JARs to send to the cluster. These can be paths on the local file
        *             system or HDFS, HTTP, HTTPS, or FTP URLs.
        * @param environment Environment variables to set on worker nodes.
    -   * @param preferredNodeLocationData used in YARN mode to select nodes to launch containers on.
    -   * Can be generated using [[org.apache.spark.scheduler.InputFormatInfo.computePreferredLocations]]
    -   * from a list of input files or InputFormats for the application.
    +   * @param preferredNodeLocationData not used. Left for backward compatibility.
        */
    +  @deprecated("Passing in preferred locations has no effect at all, see SPARK-10921", "1.6.0")
    --- End diff --
    
    I was going to say, hm, it may be funny to deprecate and then un-deprecate this if `preferredNodeLocationData` comes back. But then again, another constructor with this parameter was already deprecated. But then that one was a `@DeveloperApi`. All in all I think this change is OK though, on grounds of consistency. Un-deprecating isn't a big deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146506978
  
    Sure, but how do you know **now** how the public API is going to look like? It is **currently** causing troubles in understanding the code and we only **wish** / **hope** to have uses in the future.
    
    I'm all for this and very well understand the reasoning. Unless there are short-term plans to work on using `preferredNodeLocationData` (there's a task and plan for 1.6), I'll be advocating for removing it as it simply causes confusion and misunderstanding how Spark really works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145977387
  
    `./dev/mima` will run MiMa checks for you. Jenkins will automatically retest if you push anything else to the branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8976


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8976#discussion_r41130010
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala ---
    @@ -49,10 +49,11 @@ private[spark] class YarnRMClient(args: ApplicationMasterArguments) extends Logg
        *
        * @param conf The Yarn configuration.
        * @param sparkConf The Spark configuration.
    -   * @param preferredNodeLocations Map with hints about where to allocate containers.
    +   * @param preferredNodeLocations not used. Left for backward compatibility.
        * @param uiAddress Address of the SparkUI.
        * @param uiHistoryAddress Address of the application on the History Server.
        */
    +  @deprecated("Passing in preferred locations has no effect at all, see SPARK-10921", "1.5.1")
    --- End diff --
    
    Since this is private to spark, I think the method can just change signature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146595930
  
    @jaceklaskowski I think you're going to have to back out your API changes anyway since they aren't compatible. The rest, yeah, no real problem with updating docs, or changing the internals. That's much easier to change again later if desired.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145910412
  
    @srowen @SparkQA says "This patch fails MiMa tests.", but I can't seem to decipher what exactly the pull request has broken. I remember you mentioned the change should be fine with MiMa, and given the message I'm a bit concerned. Would you help me with it if there's anything I should do to help the pull request get merged cleanly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146545328
  
    Yes, as @srowen said, this API is seldom used, and currently we already marked it as deprecated, also even calling this API will not introduce any problem, so keeping it is not a big problem. Besides because of Mima problem, actually you cannot entirely remove this API, so your current solution is still no better than the existed one.
    
    Also as I said, compared to remove this, reactivating it might be a better choice, since now it is not difficult to support, also does no harm to the current code path (I can submit a PR if needed).
    
    Besides here has some discussions about whether to keep this parameter [SPARK-2089](https://issues.apache.org/jira/browse/SPARK-2089).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145931588
  
    The second one is because the default value for the `preferredNodeLocationData` parameter is being removed. You can't do that, it breaks the ABI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-148561702
  
      [Test build #1907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1907/console) for   PR 8976 at commit [`6ca6cc7`](https://github.com/apache/spark/commit/6ca6cc78d793d7b88081aaf922c28a47ec892300).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class HyperLogLogPlusPlus(`
      * `  case class QualifiedTableName(database: String, name: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-147645261
  
    Yeah you'll still have to filter this spurious binary compatibility check warning:
    
    ```
    [error]  * method preferredNodeLocationData_=(scala.collection.Map)Unit in class org.apache.spark.SparkContext does not have a correspondent in new version
    [error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.SparkContext.preferredNodeLocationData_=")
    [info] Done updating.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145976915
  
    Taking these out seems like the right thing to do to me


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-149024871
  
    I think this much is OK -- removing the unused var / args in internal code and uniformly deprecating its use in the public API. I'm still eyeing SPARK-4352 but figure it's simple enough to re-add this later. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145432564
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146459679
  
    A-ha so you mean there is a pretty real intent to begin to use this data structure? enough that we shouldn't remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146462053
  
    At some point in the future, very much likely, but it's not happening now and I'd remove it now (to make the code clearer) and once it's needed it gets added in the feature-related changes in the future. I'm working on the fix re MiMa.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145958130
  
    Thanks @vanzin @srowen! That helped a lot. I'm going to fix it. How can I run the compatibility check locally? Is it `./build/sbt core/mima-report-binary-issues` or something else? How can I ask @SparkQA to execute the check after I `rebase` and `push` new version of the pull request?
    
    Thanks a lot for helping me out with the pull request and the process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145925233
  
    ```
    [error]  * method preferredNodeLocationData_=(scala.collection.Map)Unit in class org.apache.spark.SparkContext does not have a correspondent in new version
    [error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.SparkContext.preferredNodeLocationData_=")
    [error]  * synthetic method <init>$default$6()scala.collection.Map in object org.apache.spark.SparkContext does not have a correspondent in new version
    [error]    filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.SparkContext.<init>$default$6")
    ```
    
    The first one looks like a false positive. It's noting the removal of the synthetic setter for the private var field that was removed, but that's OK since it was `private[spark]`. You can just add the suggested exclude to `project/MimaExcludes.scala` -- just follow the pattern there, under "1.6".
    
    The second, I'm not so sure. I wonder if it is an obscure complaint about the change to the constructors. For example, given the change, I'm not sure I can do this anymore? `new SparkContext("master", "appName", preferredNodeLocations = Map(...))`? That may be a good point if I have that right.
    
    You changed it that way since leaving both with the optional args would not compile? Hm. It may be that we have to leave the existing constructor and param and just note it does nothing, unless you can see another way to deprecate but support any existing invocation, while providing a non-deprecated constructor with all the same functionality except `preferredNodeLocations`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146384050
  
    With [SPARK-4352](https://issues.apache.org/jira/browse/SPARK-4352) getting merged, this `SparkContext.preferredNodeLocationData` can easily be achieved by computing the locality information at the start of AM and issue `ContainerRequest` with locality information.
    
    My concern is that for some workloads like "terasort", "wordcount",  normally we will not enable dynamic allocation to do the benchmark, so there's no way to specify the preferred locality and the test result may not be so good.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145491891
  
    @jerryshao @sryza do you have a thought on this one? looks OK to remove as it's unused, but was there any hint of a plan to use it again at some point? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146510001
  
    I don't think it causes trouble. Most people don't ever use this constructor, and using it doesn't cause any problems (well, other than doing nothing as advertised). If there is in fact a use for this coming up for 1.6, which would happen in the next 3 weeks then, then I don't see a value in removing this. Of course, depends entirely on how serious that plan is. If it's "1.6, likely to slip to 1.7 and then 1.8" then sure remove it. @jerryshao what's your sense on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8976#discussion_r41303229
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -147,23 +139,41 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
        * @param jars Collection of JARs to send to the cluster. These can be paths on the local file
        *             system or HDFS, HTTP, HTTPS, or FTP URLs.
        * @param environment Environment variables to set on worker nodes.
    -   * @param preferredNodeLocationData used in YARN mode to select nodes to launch containers on.
    -   * Can be generated using [[org.apache.spark.scheduler.InputFormatInfo.computePreferredLocations]]
    -   * from a list of input files or InputFormats for the application.
    +   * @param preferredNodeLocationData not used. Left for backward compatibility.
        */
    +  @deprecated("Passing in preferred locations has no effect at all, see SPARK-10921", "1.5.1")
    --- End diff --
    
    Been thinking about it, but couldn't decide (and didn't ask) and pick the version. I'm going to fix it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-147606659
  
      [Test build #1886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1886/console) for   PR 8976 at commit [`7ffc52a`](https://github.com/apache/spark/commit/7ffc52a636a02359da3b67253a6ce36119ddcf01).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ChildProcAppHandle implements SparkAppHandle `
      * `abstract class LauncherConnection implements Closeable, Runnable `
      * `final class LauncherProtocol `
      * `  static class Message implements Serializable `
      * `  static class Hello extends Message `
      * `  static class SetAppId extends Message `
      * `  static class SetState extends Message `
      * `  static class Stop extends Message `
      * `class LauncherServer implements Closeable `
      * `class NamedThreadFactory implements ThreadFactory `
      * `class OutputRedirector `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146474747
  
    Fully agree. I'm however not skilled to work on the feature to enable locality preference and haven't heard about anyone working on it, and the code as it is now is *not* used and is misleading. That said, I'd remove it and bring back when it's needed (the comments do more harm than offer help).
    
    BTW, It's all versioned in the repo and one can easily revert to any version in the past when needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-146465163
  
    From my understanding for some batch workloads like ETL, traditional process pattern is to read data from HDFS, process and output to HDFS, in such process pattern data locality is quite important, instead dynamic allocation is not quite useful, so normally user will not enable dynamic allocation, in such scenario node locality will be a problem because there's no way to specify the container locality based on the input data.
    
    With SPARK-4352, to reactivate this feature is quite simple, besides if dynamic allocation is enable the logic is the same, so instead of removing it, reactivating this parameter might be another choice.
    
    Just my opinion. Please point out if there's different concern.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-145494283
  
      [Test build #1841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1841/console) for   PR 8976 at commit [`1353686`](https://github.com/apache/spark/commit/1353686d6f963f6316a279f3425effd17973f61a).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `final class ChiSqSelector(override val uid: String)`
      * `final class QuantileDiscretizer(override val uid: String)`
      * `case class LogicalRelation(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10921] [YARN] Completely remove the use...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the pull request:

    https://github.com/apache/spark/pull/8976#issuecomment-147602196
  
    Hey @srowen @jerryshao How does the change look like now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org