You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2017/11/15 22:29:32 UTC

[GitHub] spark pull request #19760: [SPARK-22533][core] Handle deprecated names in Co...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/19760

    [SPARK-22533][core] Handle deprecated names in ConfigEntry.

    This change undoes the ConfigEntry changes introduced in SPARK-20887,
    while keeping the config deprecation that was added of part of that
    change.
    
    Instead, it hooks up the config reader to `SparkConf.getDeprecatedConfig`,
    which is better since declaring name changes in `SparkConf` enables
    deprecation messages when old config names are used. The `withAlternatives`
    approach has limitations in that area since it would not be possible to
    handle config constants declared outside of core in that way.
    
    Added a few unit tests to verify the desired behavior.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-22533

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19760
    
----
commit fcf3fbfc47fbf4ea7290f74e65f324eee68f7aba
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2017-11-15T21:35:05Z

    [SPARK-22533][core] Handle deprecated names in ConfigEntry.
    
    This change undoes the ConfigEntry changes introduced in SPARK-20887,
    while keeping the config deprecation that was added of part of that
    change.
    
    Instead, it hooks up the config reader to `SparkConf.getDeprecatedConfig`,
    which is better since declaring name changes in `SparkConf` enables
    deprecation messages when old config names are used. The `withAlternatives`
    approach has limitations in that area since it would not be possible to
    handle config constants declared outside of core in that way.
    
    Added a few unit tests to verify the desired behavior.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    **[Test build #83977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83977/testReport)** for PR 19760 at commit [`8afc56a`](https://github.com/apache/spark/commit/8afc56a825fdb4bce8aac683348bbd4223a29acc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    @cloud-fan since you made the change I'm (mostly) undoing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19760: [SPARK-22533][core] Handle deprecated names in Co...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19760#discussion_r151524853
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -663,8 +663,10 @@ private[spark] object SparkConf extends Logging {
           AlternateConfig("spark.yarn.jar", "2.0")),
         "spark.yarn.access.hadoopFileSystems" -> Seq(
           AlternateConfig("spark.yarn.access.namenodes", "2.2")),
    -    "spark.maxRemoteBlockSizeFetchToMem" -> Seq(
    -      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3"))
    +    MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM.key -> Seq(
    +      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3")),
    +    LISTENER_BUS_EVENT_QUEUE_CAPACITY.key -> Seq(
    +      AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", "2.3"))
    --- End diff --
    
    but you need to hardcode the config key here, if the config entry is defined in SQL module.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19760: [SPARK-22533][core] Handle deprecated names in Co...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19760#discussion_r151374800
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -663,8 +663,10 @@ private[spark] object SparkConf extends Logging {
           AlternateConfig("spark.yarn.jar", "2.0")),
         "spark.yarn.access.hadoopFileSystems" -> Seq(
           AlternateConfig("spark.yarn.access.namenodes", "2.2")),
    -    "spark.maxRemoteBlockSizeFetchToMem" -> Seq(
    -      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3"))
    +    MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM.key -> Seq(
    +      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3")),
    +    LISTENER_BUS_EVENT_QUEUE_CAPACITY.key -> Seq(
    +      AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", "2.3"))
    --- End diff --
    
    Now it only works with Spark Core confs right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19760: [SPARK-22533][core] Handle deprecated names in Co...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19760#discussion_r151492713
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -663,8 +663,10 @@ private[spark] object SparkConf extends Logging {
           AlternateConfig("spark.yarn.jar", "2.0")),
         "spark.yarn.access.hadoopFileSystems" -> Seq(
           AlternateConfig("spark.yarn.access.namenodes", "2.2")),
    -    "spark.maxRemoteBlockSizeFetchToMem" -> Seq(
    -      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3"))
    +    MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM.key -> Seq(
    +      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3")),
    +    LISTENER_BUS_EVENT_QUEUE_CAPACITY.key -> Seq(
    +      AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", "2.3"))
    --- End diff --
    
    not really, it works with any configs you add to this list. if you look at it, there are some yarn conf there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    **[Test build #83916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83916/testReport)** for PR 19760 at commit [`fcf3fbf`](https://github.com/apache/spark/commit/fcf3fbfc47fbf4ea7290f74e65f324eee68f7aba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83977/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19760: [SPARK-22533][core] Handle deprecated names in Co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19760


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83916/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    Generally having the deprecation message is good, but I hope that can be done within the config entry, instead of a central place in `SparkConf`. Is this possible?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    I don't exactly follow the purpose here, the name `withAlternatives` says it's an alternative, not deprecated, so it should not print deprecation message. Besides, I think `withAlternatives` is easier to use and maintain, as it's defined with the config entry.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    **[Test build #83977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83977/testReport)** for PR 19760 at commit [`8afc56a`](https://github.com/apache/spark/commit/8afc56a825fdb4bce8aac683348bbd4223a29acc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    > my future plan is to move config related stuff to a new maven module
    
    Sounds good. I'll try to update this before leaving on vacation, but may not be able to...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19760: [SPARK-22533][core] Handle deprecated names in Co...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19760#discussion_r151538808
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -663,8 +663,10 @@ private[spark] object SparkConf extends Logging {
           AlternateConfig("spark.yarn.jar", "2.0")),
         "spark.yarn.access.hadoopFileSystems" -> Seq(
           AlternateConfig("spark.yarn.access.namenodes", "2.2")),
    -    "spark.maxRemoteBlockSizeFetchToMem" -> Seq(
    -      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3"))
    +    MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM.key -> Seq(
    +      AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3")),
    +    LISTENER_BUS_EVENT_QUEUE_CAPACITY.key -> Seq(
    +      AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", "2.3"))
    --- End diff --
    
    Yes. But there's no way for the SQL module to deprecate configs any other way. Deprecation warnings are handled internally by SparkConf and the metadata needs to be available at the time the `SparkConf.set` call is made, which cannot happen if the config constant is declared in some other module (since the class holding the constant may not have been initialized).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    There's quite a lot of applications that use just `SparkContext`. And `SparkConf` is generally created before anything else (e.g. in yarn-cluster mode, the Spark code will instantiate `SparkConf` before even calling any user code).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    Since now `SparkSession` is the main entry, we just need `spark.conf.get(xxx)` to print deprecation message(still keep the old deprecated configs in `SparkConf`). I think then it's possible to deprecate SQL configs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    I'm ok to move the deprecated config keys of `MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM` and `LISTENER_BUS_EVENT_QUEUE_CAPACITY` to `SparkConf` if the deprecation message really matters. But I'd like to keep `withAlternatives`. Generally it's a better interface and my future plan is to move config related stuff to a new maven module, so it can be used in modules that don't depend on the core module(e.g. the network module). It will be annoying if everytime we wanna deprecate a conf we need to change the config module.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    > the name withAlternatives says it's an alternative,
    
    That's the other thing. Having alternative names is just confusing; there should be a single name for a config, with others deprecated.
    
    The two configs that had alternatives were basically deprecating the old names and replacing them with the new ones.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    It would be possible for configs declared in core; because then we can force SparkConf to initialize those classes. (e.g., referencing any config constant causes the `o.a.s.internal.config` package object to be loaded and all config constants to be initialized).
    
    The only way to do that for other modules would be to move their configs to core, so that SparkConf can initialize them the same way. (Well, you could do it with reflection I guess, but ugh.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    (And even `SparkSubmit` now creates `SparkConf`.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19760: [SPARK-22533][core] Handle deprecated names in ConfigEnt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19760
  
    **[Test build #83916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83916/testReport)** for PR 19760 at commit [`fcf3fbf`](https://github.com/apache/spark/commit/fcf3fbfc47fbf4ea7290f74e65f324eee68f7aba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org