You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2018/02/17 08:55:07 UTC

[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/20633

    [SPARK-23455][ML] Default Params in ML should be saved separately in metadata

    ## What changes were proposed in this pull request?
    
    We save ML's user-supplied params and default params as one entity in metadata. During loading the saved models, we set all the loaded params into created ML model instances as user-supplied params.
    
    It causes some problems, e.g., if we strictly disallow some params to be set at the same time, a default param can fail the param check because it is treated as user-supplied param after loading.
    
    The loaded default params should not be set as user-supplied params. We should save ML default params separately in metadata.
    
    For backward compatibility, when loading metadata, if it is a metadata file from previous Spark, we shouldn't raise error if we can't find the default param field.
    
    ## How was this patch tested?
    
    Pass existing tests and added tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 save-ml-default-params

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20633
    
----
commit 69648d67546b292037d26ef3a282bf26afd4863e
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-02-17T02:34:11Z

    Save default params separately in JSON.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #88880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88880/testReport)** for PR 20633 at commit [`de93e65`](https://github.com/apache/spark/commit/de93e65882d71cba6246b53b81f1601bc59029ae).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r172420910
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader {
           timestamp: Long,
           sparkVersion: String,
           params: JValue,
    +      defaultParams: JValue,
           metadata: JValue,
           metadataJson: String) {
     
         /**
          * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
          * This can be useful for getting a Param value before an instance of `Params`
          * is available.
    +     *
    +     * @param isDefaultParam Whether the given param name is a default param. Default is false.
          */
    -    def getParamValue(paramName: String): JValue = {
    +    def getParamValue(paramName: String, isDefaultParam: Boolean = false): JValue = {
    --- End diff --
    
    Sounds good. I will change this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r178955105
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
        *               this method gets called.
        * @param value  the default value
        */
    -  protected final def setDefault[T](param: Param[T], value: T): this.type = {
    +  private[ml] final def setDefault[T](param: Param[T], value: T): this.type = {
    --- End diff --
    
    Makes sense; I like your solution.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    @jkbradley Thanks for your comments! I've addressed them. Please review it again. Thank you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r175965924
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala ---
    @@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with MLlibTestSparkContext
         val myParams = new MyParams("my_params")
         testDefaultReadWrite(myParams)
       }
    +
    +  test("default param shouldn't become user-supplied param after persistence") {
    +    val myParams = new MyParams("my_params")
    +    myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1)
    +    myParams.checkExclusiveParams()
    +    val loadedMyParams = testDefaultReadWrite(myParams)
    +    loadedMyParams.checkExclusiveParams()
    +    assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) ==
    +      myParams.getDefault(myParams.intParamWithDefault))
    +
    +    loadedMyParams.set(myParams.intParamWithDefault, 1)
    +    intercept[SparkException] {
    +      loadedMyParams.checkExclusiveParams()
    +    }
    +  }
    +
    +  test("User-suppiled value for default param should be kept after persistence") {
    --- End diff --
    
    suppiled -> supplied


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    ping @jkbradley 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    ping @jkbradley 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88510/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #89421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89421/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20633


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #88510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88510/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87521/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #87998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87998/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1950/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    cc @jkbradley This is to save default params separately in metadata file. Please help review after 2.3. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r178955058
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable {
       }
     }
     
    +object Params {
    --- End diff --
    
    Shall we make this object package private (for cleaner docs)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Can you address the conflicts? Gonna start to review it soon. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r175963464
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader {
           timestamp: Long,
           sparkVersion: String,
           params: JValue,
    +      defaultParams: JValue,
           metadata: JValue,
           metadataJson: String) {
     
    +
    +    private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
    +      params match {
    +        case JObject(pairs) => pairs
    +        case _ =>
    +          throw new IllegalArgumentException(
    +            s"Cannot recognize JSON metadata: $metadataJson.")
    +      }
    +    }
    +
         /**
          * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
          * This can be useful for getting a Param value before an instance of `Params`
    -     * is available.
    +     * is available. This will look up `params` first, if not existing then looking up
    +     * `defaultParams`.
          */
         def getParamValue(paramName: String): JValue = {
           implicit val format = DefaultFormats
    -      params match {
    +
    +      // Looking up for `params` first.
    +      var pairs = getValueFromParams(params)
    +      var foundPairs = pairs.filter { case (pName, jsonValue) =>
    +        pName == paramName
    +      }
    +      if (foundPairs.length == 0) {
    +        // Looking up for `defaultParams` then.
    +        pairs = getValueFromParams(defaultParams)
    +        foundPairs = pairs.filter { case (pName, jsonValue) =>
    +          pName == paramName
    +        }
    +      }
    +      assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
    +        s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
    +
    +      foundPairs.map(_._2).head
    +    }
    +
    +    /**
    +     * Extract Params from metadata, and set them in the instance.
    +     * This works if all Params (except params included by `skipParams` list) implement
    +     * [[org.apache.spark.ml.param.Param.jsonDecode()]].
    +     *
    +     * @param skipParams The params included in `skipParams` won't be set. This is useful if some
    +     *                   params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
    +     *                   and need special handling.
    +     */
    +    def getAndSetParams(
    +        instance: Params,
    +        skipParams: Option[List[String]] = None): Unit = {
    +      setParams(instance, false, skipParams)
    --- End diff --
    
    style nit: It's nice to pass boolean args by name


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #4156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4156/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r175966231
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala ---
    @@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with MLlibTestSparkContext
         val myParams = new MyParams("my_params")
         testDefaultReadWrite(myParams)
       }
    +
    +  test("default param shouldn't become user-supplied param after persistence") {
    +    val myParams = new MyParams("my_params")
    +    myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1)
    +    myParams.checkExclusiveParams()
    +    val loadedMyParams = testDefaultReadWrite(myParams)
    +    loadedMyParams.checkExclusiveParams()
    +    assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) ==
    +      myParams.getDefault(myParams.intParamWithDefault))
    +
    +    loadedMyParams.set(myParams.intParamWithDefault, 1)
    +    intercept[SparkException] {
    +      loadedMyParams.checkExclusiveParams()
    +    }
    +  }
    +
    +  test("User-suppiled value for default param should be kept after persistence") {
    +    val myParams = new MyParams("my_params")
    +    myParams.set(myParams.intParamWithDefault, 100)
    +    val loadedMyParams = testDefaultReadWrite(myParams)
    +    assert(loadedMyParams.get(myParams.intParamWithDefault).get == 100)
    +  }
    +
    +  test("Read metadata without default field prior to 2.4") {
    --- End diff --
    
    Nice!  I like this setup.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #87521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87521/testReport)** for PR 20633 at commit [`69648d6`](https://github.com/apache/spark/commit/69648d67546b292037d26ef3a282bf26afd4863e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #87996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87996/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2363/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r175961795
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
        *               this method gets called.
        * @param value  the default value
        */
    -  protected final def setDefault[T](param: Param[T], value: T): this.type = {
    +  private[ml] final def setDefault[T](param: Param[T], value: T): this.type = {
    --- End diff --
    
    We should leave this as protected.  It's important that 3rd-party libraries be able to extend MLlib APIs, and this is one they need.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1701/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1308/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #87996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87996/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    cc @dbtsai if you have time to look at this too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r178956696
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
           timestamp: Long,
           sparkVersion: String,
           params: JValue,
    +      defaultParams: JValue,
           metadata: JValue,
           metadataJson: String) {
     
    +
    +    private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
    +      params match {
    +        case JObject(pairs) => pairs
    +        case _ =>
    +          throw new IllegalArgumentException(
    +            s"Cannot recognize JSON metadata: $metadataJson.")
    +      }
    +    }
    +
         /**
          * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
          * This can be useful for getting a Param value before an instance of `Params`
    -     * is available.
    +     * is available. This will look up `params` first, if not existing then looking up
    +     * `defaultParams`.
          */
         def getParamValue(paramName: String): JValue = {
           implicit val format = DefaultFormats
    -      params match {
    +
    +      // Looking up for `params` first.
    +      var pairs = getValueFromParams(params)
    +      var foundPairs = pairs.filter { case (pName, jsonValue) =>
    +        pName == paramName
    +      }
    +      if (foundPairs.length == 0) {
    +        // Looking up for `defaultParams` then.
    +        pairs = getValueFromParams(defaultParams)
    +        foundPairs = pairs.filter { case (pName, jsonValue) =>
    +          pName == paramName
    +        }
    +      }
    +      assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
    +        s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
    +
    +      foundPairs.map(_._2).head
    +    }
    +
    +    /**
    +     * Extract Params from metadata, and set them in the instance.
    +     * This works if all Params (except params included by `skipParams` list) implement
    +     * [[org.apache.spark.ml.param.Param.jsonDecode()]].
    +     *
    +     * @param skipParams The params included in `skipParams` won't be set. This is useful if some
    +     *                   params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
    +     *                   and need special handling.
    +     */
    +    def getAndSetParams(
    +        instance: Params,
    +        skipParams: Option[List[String]] = None): Unit = {
    +      setParams(instance, skipParams, isDefault = false)
    +
    +      // For metadata file prior to Spark 2.4, there is no default section.
    +      val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
    +      if (major >= 2 && minor >= 4) {
    --- End diff --
    
    This should be:
    ```if (major > 2 || (major == 2 && minor >= 4)) {```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r178992553
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
           timestamp: Long,
           sparkVersion: String,
           params: JValue,
    +      defaultParams: JValue,
           metadata: JValue,
           metadataJson: String) {
     
    +
    +    private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
    +      params match {
    +        case JObject(pairs) => pairs
    +        case _ =>
    +          throw new IllegalArgumentException(
    +            s"Cannot recognize JSON metadata: $metadataJson.")
    +      }
    +    }
    +
         /**
          * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
          * This can be useful for getting a Param value before an instance of `Params`
    -     * is available.
    +     * is available. This will look up `params` first, if not existing then looking up
    +     * `defaultParams`.
          */
         def getParamValue(paramName: String): JValue = {
           implicit val format = DefaultFormats
    -      params match {
    +
    +      // Looking up for `params` first.
    +      var pairs = getValueFromParams(params)
    +      var foundPairs = pairs.filter { case (pName, jsonValue) =>
    +        pName == paramName
    +      }
    +      if (foundPairs.length == 0) {
    +        // Looking up for `defaultParams` then.
    +        pairs = getValueFromParams(defaultParams)
    +        foundPairs = pairs.filter { case (pName, jsonValue) =>
    +          pName == paramName
    +        }
    +      }
    +      assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
    +        s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
    +
    +      foundPairs.map(_._2).head
    +    }
    +
    +    /**
    +     * Extract Params from metadata, and set them in the instance.
    +     * This works if all Params (except params included by `skipParams` list) implement
    +     * [[org.apache.spark.ml.param.Param.jsonDecode()]].
    +     *
    +     * @param skipParams The params included in `skipParams` won't be set. This is useful if some
    +     *                   params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
    +     *                   and need special handling.
    +     */
    +    def getAndSetParams(
    +        instance: Params,
    +        skipParams: Option[List[String]] = None): Unit = {
    +      setParams(instance, skipParams, isDefault = false)
    +
    +      // For metadata file prior to Spark 2.4, there is no default section.
    +      val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
    +      if (major >= 2 && minor >= 4) {
    --- End diff --
    
    Ah, you're correct. I'll fix it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #88880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88880/testReport)** for PR 20633 at commit [`de93e65`](https://github.com/apache/spark/commit/de93e65882d71cba6246b53b81f1601bc59029ae).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    also ping @MLnick @WeichenXu123  for review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r176310898
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
        *               this method gets called.
        * @param value  the default value
        */
    -  protected final def setDefault[T](param: Param[T], value: T): this.type = {
    +  private[ml] final def setDefault[T](param: Param[T], value: T): this.type = {
    --- End diff --
    
    Because we need to call `setDefault` in `Metadata.setParams`, I can't leave it as `protected`. But I think it is important to keep the extensibility. And I think we can't make it as `public` too. I add object `Params` and use it to help us update default param of a `Params`.
    
    
    
     


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    @WeichenXu123 I've added unit test in `DefaultReadWriteSuite/DefaultReadWriteTest` to test if this can read old metadata back.
    
    Sounds like the backward compatibility test you suggested should be checked manually. I will test it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    @jkbradley Thanks! I'll work on SPARK-24058.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1707/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #88502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88502/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r179073750
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable {
       }
     }
     
    +object Params {
    --- End diff --
    
    Yes. Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88880/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #4156 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4156/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87998/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88502/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    @WeichenXu123 Thanks for comment! I've run the backward compatibility test locally against `QuantileDiscretizer`/`Bucketizer`. They work fine. With this PR, the saved models can be read back and all params are restored correctly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    LGTM. Thanks! @jkbradley 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #88502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88502/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r175965272
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader {
           timestamp: Long,
           sparkVersion: String,
           params: JValue,
    +      defaultParams: JValue,
           metadata: JValue,
           metadataJson: String) {
     
    +
    +    private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
    +      params match {
    +        case JObject(pairs) => pairs
    +        case _ =>
    +          throw new IllegalArgumentException(
    +            s"Cannot recognize JSON metadata: $metadataJson.")
    +      }
    +    }
    +
         /**
          * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
          * This can be useful for getting a Param value before an instance of `Params`
    -     * is available.
    +     * is available. This will look up `params` first, if not existing then looking up
    +     * `defaultParams`.
          */
         def getParamValue(paramName: String): JValue = {
           implicit val format = DefaultFormats
    -      params match {
    +
    +      // Looking up for `params` first.
    +      var pairs = getValueFromParams(params)
    +      var foundPairs = pairs.filter { case (pName, jsonValue) =>
    +        pName == paramName
    +      }
    +      if (foundPairs.length == 0) {
    +        // Looking up for `defaultParams` then.
    +        pairs = getValueFromParams(defaultParams)
    +        foundPairs = pairs.filter { case (pName, jsonValue) =>
    +          pName == paramName
    +        }
    +      }
    +      assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
    +        s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
    +
    +      foundPairs.map(_._2).head
    +    }
    +
    +    /**
    +     * Extract Params from metadata, and set them in the instance.
    +     * This works if all Params (except params included by `skipParams` list) implement
    +     * [[org.apache.spark.ml.param.Param.jsonDecode()]].
    +     *
    +     * @param skipParams The params included in `skipParams` won't be set. This is useful if some
    +     *                   params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
    +     *                   and need special handling.
    +     */
    +    def getAndSetParams(
    +        instance: Params,
    +        skipParams: Option[List[String]] = None): Unit = {
    +      setParams(instance, false, skipParams)
    +      setParams(instance, true, skipParams)
    +    }
    +
    +    private def setParams(
    +        instance: Params,
    +        isDefault: Boolean,
    +        skipParams: Option[List[String]]): Unit = {
    +      implicit val format = DefaultFormats
    +      val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
    +      val paramsToSet = if (isDefault) defaultParams else params
    +      paramsToSet match {
             case JObject(pairs) =>
    -          val values = pairs.filter { case (pName, jsonValue) =>
    -            pName == paramName
    -          }.map(_._2)
    -          assert(values.length == 1, s"Expected one instance of Param '$paramName' but found" +
    -            s" ${values.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
    -          values.head
    +          pairs.foreach { case (paramName, jsonValue) =>
    +            if (skipParams == None || !skipParams.get.contains(paramName)) {
    +              val param = instance.getParam(paramName)
    +              val value = param.jsonDecode(compact(render(jsonValue)))
    +              if (isDefault) {
    +                instance.setDefault(param, value)
    +              } else {
    +                instance.set(param, value)
    +              }
    +            }
    +          }
    +        // For metadata file prior to Spark 2.4, there is no default section.
    +        case JNothing if isDefault && (major == 2 && minor < 4 || major < 2) =>
    --- End diff --
    
    This logic would be simpler if this check were put in the getAndSetParams method, which could just skip calling setParams(instance, true, skipParams) for Spark 2.3-.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87996/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    @dbtsai Thanks! I've solved the conflicts.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1311/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r175962188
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -296,14 +297,19 @@ private[ml] object DefaultParamsWriter {
           paramMap: Option[JValue] = None): String = {
         val uid = instance.uid
         val cls = instance.getClass.getName
    -    val params = instance.extractParamMap().toSeq.asInstanceOf[Seq[ParamPair[Any]]]
    +    val params = instance.paramMap.toSeq
    +    val defaultParams = instance.defaultParamMap.toSeq
         val jsonParams = paramMap.getOrElse(render(params.map { case ParamPair(p, v) =>
           p.name -> parse(p.jsonEncode(v))
         }.toList))
    +    val jsonDefaultParams = render(defaultParams.map { case ParamPair(p, v) =>
    +      p.name -> parse(p.jsonEncode(v))
    +    }.toList)
         val basicMetadata = ("class" -> cls) ~
           ("timestamp" -> System.currentTimeMillis()) ~
           ("sparkVersion" -> sc.version) ~
           ("uid" -> uid) ~
    +      ("defaultParamMap" -> jsonDefaultParams) ~
    --- End diff --
    
    nit: How about putting this below paramMap since that's nicer for viewing the JSON?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #87521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87521/testReport)** for PR 20633 at commit [`69648d6`](https://github.com/apache/spark/commit/69648d67546b292037d26ef3a282bf26afd4863e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #89421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89421/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r172139633
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader {
           timestamp: Long,
           sparkVersion: String,
           params: JValue,
    +      defaultParams: JValue,
           metadata: JValue,
           metadataJson: String) {
     
         /**
          * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
          * This can be useful for getting a Param value before an instance of `Params`
          * is available.
    +     *
    +     * @param isDefaultParam Whether the given param name is a default param. Default is false.
          */
    -    def getParamValue(paramName: String): JValue = {
    +    def getParamValue(paramName: String, isDefaultParam: Boolean = false): JValue = {
    --- End diff --
    
    Should the logic be "lookup params first, if not exist then lookup defaultParams" ?
    If so, we don't need the `isDefaultParam` param.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20633#discussion_r179073826
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
    @@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
           timestamp: Long,
           sparkVersion: String,
           params: JValue,
    +      defaultParams: JValue,
           metadata: JValue,
           metadataJson: String) {
     
    +
    +    private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
    +      params match {
    +        case JObject(pairs) => pairs
    +        case _ =>
    +          throw new IllegalArgumentException(
    +            s"Cannot recognize JSON metadata: $metadataJson.")
    +      }
    +    }
    +
         /**
          * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
          * This can be useful for getting a Param value before an instance of `Params`
    -     * is available.
    +     * is available. This will look up `params` first, if not existing then looking up
    +     * `defaultParams`.
          */
         def getParamValue(paramName: String): JValue = {
           implicit val format = DefaultFormats
    -      params match {
    +
    +      // Looking up for `params` first.
    +      var pairs = getValueFromParams(params)
    +      var foundPairs = pairs.filter { case (pName, jsonValue) =>
    +        pName == paramName
    +      }
    +      if (foundPairs.length == 0) {
    +        // Looking up for `defaultParams` then.
    +        pairs = getValueFromParams(defaultParams)
    +        foundPairs = pairs.filter { case (pName, jsonValue) =>
    +          pName == paramName
    +        }
    +      }
    +      assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
    +        s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
    +
    +      foundPairs.map(_._2).head
    +    }
    +
    +    /**
    +     * Extract Params from metadata, and set them in the instance.
    +     * This works if all Params (except params included by `skipParams` list) implement
    +     * [[org.apache.spark.ml.param.Param.jsonDecode()]].
    +     *
    +     * @param skipParams The params included in `skipParams` won't be set. This is useful if some
    +     *                   params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
    +     *                   and need special handling.
    +     */
    +    def getAndSetParams(
    +        instance: Params,
    +        skipParams: Option[List[String]] = None): Unit = {
    +      setParams(instance, skipParams, isDefault = false)
    +
    +      // For metadata file prior to Spark 2.4, there is no default section.
    +      val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
    +      if (major >= 2 && minor >= 4) {
    --- End diff --
    
    Good catch!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #88510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88510/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Sorry for the pause in review.
    LGTM
    Merging with master
    
    @dbtsai  I'm going to merge this since I'm worried it will collect more conflicts, but let's discuss more if needed.
    
    @viirya We'll need to update Python's DefaultParamsReader as well for Spark 2.4 in order to keep it in sync with Scala/Java.  R thankfully should not require anything since it only has wrappers.  I'll make & link a JIRA.  Will you have time to work on that?
    
    Thanks @viirya !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    @viirya Have you run backward compatibility test ? e.g., saving estimator/models via master version spark, and load estimator/models via this PR version spark. To check whether it works fine and all params get saved and restored correctly.
    Especially this should be done against `QuantileDiscretizer`/`Bucketizer` which you remove temporary fix code.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89421/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    **[Test build #87998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87998/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/945/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org