You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2018/02/17 08:55:07 UTC
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/20633
[SPARK-23455][ML] Default Params in ML should be saved separately in metadata
## What changes were proposed in this pull request?
We save ML's user-supplied params and default params as one entity in metadata. During loading the saved models, we set all the loaded params into created ML model instances as user-supplied params.
It causes some problems, e.g., if we strictly disallow some params to be set at the same time, a default param can fail the param check because it is treated as user-supplied param after loading.
The loaded default params should not be set as user-supplied params. We should save ML default params separately in metadata.
For backward compatibility, when loading metadata, if it is a metadata file from previous Spark, we shouldn't raise error if we can't find the default param field.
## How was this patch tested?
Pass existing tests and added tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 save-ml-default-params
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20633.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20633
----
commit 69648d67546b292037d26ef3a282bf26afd4863e
Author: Liang-Chi Hsieh <vi...@...>
Date: 2018-02-17T02:34:11Z
Save default params separately in JSON.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #88880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88880/testReport)** for PR 20633 at commit [`de93e65`](https://github.com/apache/spark/commit/de93e65882d71cba6246b53b81f1601bc59029ae).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r172420910
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader {
timestamp: Long,
sparkVersion: String,
params: JValue,
+ defaultParams: JValue,
metadata: JValue,
metadataJson: String) {
/**
* Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
* This can be useful for getting a Param value before an instance of `Params`
* is available.
+ *
+ * @param isDefaultParam Whether the given param name is a default param. Default is false.
*/
- def getParamValue(paramName: String): JValue = {
+ def getParamValue(paramName: String, isDefaultParam: Boolean = false): JValue = {
--- End diff --
Sounds good. I will change this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r178955105
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
* this method gets called.
* @param value the default value
*/
- protected final def setDefault[T](param: Param[T], value: T): this.type = {
+ private[ml] final def setDefault[T](param: Param[T], value: T): this.type = {
--- End diff --
Makes sense; I like your solution.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
@jkbradley Thanks for your comments! I've addressed them. Please review it again. Thank you.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r175965924
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala ---
@@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with MLlibTestSparkContext
val myParams = new MyParams("my_params")
testDefaultReadWrite(myParams)
}
+
+ test("default param shouldn't become user-supplied param after persistence") {
+ val myParams = new MyParams("my_params")
+ myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1)
+ myParams.checkExclusiveParams()
+ val loadedMyParams = testDefaultReadWrite(myParams)
+ loadedMyParams.checkExclusiveParams()
+ assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) ==
+ myParams.getDefault(myParams.intParamWithDefault))
+
+ loadedMyParams.set(myParams.intParamWithDefault, 1)
+ intercept[SparkException] {
+ loadedMyParams.checkExclusiveParams()
+ }
+ }
+
+ test("User-suppiled value for default param should be kept after persistence") {
--- End diff --
suppiled -> supplied
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
ping @jkbradley
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
ping @jkbradley
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88510/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #89421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89421/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20633
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #88510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88510/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87521/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #87998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87998/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1950/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
cc @jkbradley This is to save default params separately in metadata file. Please help review after 2.3. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r178955058
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable {
}
}
+object Params {
--- End diff --
Shall we make this object package private (for cleaner docs)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/20633
Can you address the conflicts? Gonna start to review it soon. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r175963464
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader {
timestamp: Long,
sparkVersion: String,
params: JValue,
+ defaultParams: JValue,
metadata: JValue,
metadataJson: String) {
+
+ private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
+ params match {
+ case JObject(pairs) => pairs
+ case _ =>
+ throw new IllegalArgumentException(
+ s"Cannot recognize JSON metadata: $metadataJson.")
+ }
+ }
+
/**
* Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
* This can be useful for getting a Param value before an instance of `Params`
- * is available.
+ * is available. This will look up `params` first, if not existing then looking up
+ * `defaultParams`.
*/
def getParamValue(paramName: String): JValue = {
implicit val format = DefaultFormats
- params match {
+
+ // Looking up for `params` first.
+ var pairs = getValueFromParams(params)
+ var foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ if (foundPairs.length == 0) {
+ // Looking up for `defaultParams` then.
+ pairs = getValueFromParams(defaultParams)
+ foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ }
+ assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
+ s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
+
+ foundPairs.map(_._2).head
+ }
+
+ /**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. This is useful if some
+ * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
+ * and need special handling.
+ */
+ def getAndSetParams(
+ instance: Params,
+ skipParams: Option[List[String]] = None): Unit = {
+ setParams(instance, false, skipParams)
--- End diff --
style nit: It's nice to pass boolean args by name
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #4156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4156/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r175966231
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala ---
@@ -169,4 +179,54 @@ class DefaultReadWriteSuite extends SparkFunSuite with MLlibTestSparkContext
val myParams = new MyParams("my_params")
testDefaultReadWrite(myParams)
}
+
+ test("default param shouldn't become user-supplied param after persistence") {
+ val myParams = new MyParams("my_params")
+ myParams.set(myParams.shouldNotSetIfSetintParamWithDefault, 1)
+ myParams.checkExclusiveParams()
+ val loadedMyParams = testDefaultReadWrite(myParams)
+ loadedMyParams.checkExclusiveParams()
+ assert(loadedMyParams.getDefault(loadedMyParams.intParamWithDefault) ==
+ myParams.getDefault(myParams.intParamWithDefault))
+
+ loadedMyParams.set(myParams.intParamWithDefault, 1)
+ intercept[SparkException] {
+ loadedMyParams.checkExclusiveParams()
+ }
+ }
+
+ test("User-suppiled value for default param should be kept after persistence") {
+ val myParams = new MyParams("my_params")
+ myParams.set(myParams.intParamWithDefault, 100)
+ val loadedMyParams = testDefaultReadWrite(myParams)
+ assert(loadedMyParams.get(myParams.intParamWithDefault).get == 100)
+ }
+
+ test("Read metadata without default field prior to 2.4") {
--- End diff --
Nice! I like this setup.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #87521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87521/testReport)** for PR 20633 at commit [`69648d6`](https://github.com/apache/spark/commit/69648d67546b292037d26ef3a282bf26afd4863e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #87996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87996/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2363/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r175961795
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
* this method gets called.
* @param value the default value
*/
- protected final def setDefault[T](param: Param[T], value: T): this.type = {
+ private[ml] final def setDefault[T](param: Param[T], value: T): this.type = {
--- End diff --
We should leave this as protected. It's important that 3rd-party libraries be able to extend MLlib APIs, and this is one they need.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1701/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1308/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #87996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87996/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
cc @dbtsai if you have time to look at this too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r178956696
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
timestamp: Long,
sparkVersion: String,
params: JValue,
+ defaultParams: JValue,
metadata: JValue,
metadataJson: String) {
+
+ private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
+ params match {
+ case JObject(pairs) => pairs
+ case _ =>
+ throw new IllegalArgumentException(
+ s"Cannot recognize JSON metadata: $metadataJson.")
+ }
+ }
+
/**
* Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
* This can be useful for getting a Param value before an instance of `Params`
- * is available.
+ * is available. This will look up `params` first, if not existing then looking up
+ * `defaultParams`.
*/
def getParamValue(paramName: String): JValue = {
implicit val format = DefaultFormats
- params match {
+
+ // Looking up for `params` first.
+ var pairs = getValueFromParams(params)
+ var foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ if (foundPairs.length == 0) {
+ // Looking up for `defaultParams` then.
+ pairs = getValueFromParams(defaultParams)
+ foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ }
+ assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
+ s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
+
+ foundPairs.map(_._2).head
+ }
+
+ /**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. This is useful if some
+ * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
+ * and need special handling.
+ */
+ def getAndSetParams(
+ instance: Params,
+ skipParams: Option[List[String]] = None): Unit = {
+ setParams(instance, skipParams, isDefault = false)
+
+ // For metadata file prior to Spark 2.4, there is no default section.
+ val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+ if (major >= 2 && minor >= 4) {
--- End diff --
This should be:
```if (major > 2 || (major == 2 && minor >= 4)) {```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r178992553
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
timestamp: Long,
sparkVersion: String,
params: JValue,
+ defaultParams: JValue,
metadata: JValue,
metadataJson: String) {
+
+ private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
+ params match {
+ case JObject(pairs) => pairs
+ case _ =>
+ throw new IllegalArgumentException(
+ s"Cannot recognize JSON metadata: $metadataJson.")
+ }
+ }
+
/**
* Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
* This can be useful for getting a Param value before an instance of `Params`
- * is available.
+ * is available. This will look up `params` first, if not existing then looking up
+ * `defaultParams`.
*/
def getParamValue(paramName: String): JValue = {
implicit val format = DefaultFormats
- params match {
+
+ // Looking up for `params` first.
+ var pairs = getValueFromParams(params)
+ var foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ if (foundPairs.length == 0) {
+ // Looking up for `defaultParams` then.
+ pairs = getValueFromParams(defaultParams)
+ foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ }
+ assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
+ s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
+
+ foundPairs.map(_._2).head
+ }
+
+ /**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. This is useful if some
+ * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
+ * and need special handling.
+ */
+ def getAndSetParams(
+ instance: Params,
+ skipParams: Option[List[String]] = None): Unit = {
+ setParams(instance, skipParams, isDefault = false)
+
+ // For metadata file prior to Spark 2.4, there is no default section.
+ val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+ if (major >= 2 && minor >= 4) {
--- End diff --
Ah, you're correct. I'll fix it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #88880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88880/testReport)** for PR 20633 at commit [`de93e65`](https://github.com/apache/spark/commit/de93e65882d71cba6246b53b81f1601bc59029ae).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
also ping @MLnick @WeichenXu123 for review.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r176310898
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -791,7 +791,7 @@ trait Params extends Identifiable with Serializable {
* this method gets called.
* @param value the default value
*/
- protected final def setDefault[T](param: Param[T], value: T): this.type = {
+ private[ml] final def setDefault[T](param: Param[T], value: T): this.type = {
--- End diff --
Because we need to call `setDefault` in `Metadata.setParams`, I can't leave it as `protected`. But I think it is important to keep the extensibility. And I think we can't make it as `public` too. I add object `Params` and use it to help us update default param of a `Params`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
@WeichenXu123 I've added unit test in `DefaultReadWriteSuite/DefaultReadWriteTest` to test if this can read old metadata back.
Sounds like the backward compatibility test you suggested should be checked manually. I will test it. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
@jkbradley Thanks! I'll work on SPARK-24058.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1707/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #88502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88502/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r179073750
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -905,6 +905,15 @@ trait Params extends Identifiable with Serializable {
}
}
+object Params {
--- End diff --
Yes. Done.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88880/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #4156 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4156/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87998/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88502/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
@WeichenXu123 Thanks for comment! I've run the backward compatibility test locally against `QuantileDiscretizer`/`Bucketizer`. They work fine. With this PR, the saved models can be read back and all params are restored correctly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20633
LGTM. Thanks! @jkbradley
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #88502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88502/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r175965272
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,88 @@ private[ml] object DefaultParamsReader {
timestamp: Long,
sparkVersion: String,
params: JValue,
+ defaultParams: JValue,
metadata: JValue,
metadataJson: String) {
+
+ private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
+ params match {
+ case JObject(pairs) => pairs
+ case _ =>
+ throw new IllegalArgumentException(
+ s"Cannot recognize JSON metadata: $metadataJson.")
+ }
+ }
+
/**
* Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
* This can be useful for getting a Param value before an instance of `Params`
- * is available.
+ * is available. This will look up `params` first, if not existing then looking up
+ * `defaultParams`.
*/
def getParamValue(paramName: String): JValue = {
implicit val format = DefaultFormats
- params match {
+
+ // Looking up for `params` first.
+ var pairs = getValueFromParams(params)
+ var foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ if (foundPairs.length == 0) {
+ // Looking up for `defaultParams` then.
+ pairs = getValueFromParams(defaultParams)
+ foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ }
+ assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
+ s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
+
+ foundPairs.map(_._2).head
+ }
+
+ /**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. This is useful if some
+ * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
+ * and need special handling.
+ */
+ def getAndSetParams(
+ instance: Params,
+ skipParams: Option[List[String]] = None): Unit = {
+ setParams(instance, false, skipParams)
+ setParams(instance, true, skipParams)
+ }
+
+ private def setParams(
+ instance: Params,
+ isDefault: Boolean,
+ skipParams: Option[List[String]]): Unit = {
+ implicit val format = DefaultFormats
+ val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+ val paramsToSet = if (isDefault) defaultParams else params
+ paramsToSet match {
case JObject(pairs) =>
- val values = pairs.filter { case (pName, jsonValue) =>
- pName == paramName
- }.map(_._2)
- assert(values.length == 1, s"Expected one instance of Param '$paramName' but found" +
- s" ${values.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
- values.head
+ pairs.foreach { case (paramName, jsonValue) =>
+ if (skipParams == None || !skipParams.get.contains(paramName)) {
+ val param = instance.getParam(paramName)
+ val value = param.jsonDecode(compact(render(jsonValue)))
+ if (isDefault) {
+ instance.setDefault(param, value)
+ } else {
+ instance.set(param, value)
+ }
+ }
+ }
+ // For metadata file prior to Spark 2.4, there is no default section.
+ case JNothing if isDefault && (major == 2 && minor < 4 || major < 2) =>
--- End diff --
This logic would be simpler if this check were put in the getAndSetParams method, which could just skip calling setParams(instance, true, skipParams) for Spark 2.3-.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87996/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20633
@dbtsai Thanks! I've solved the conflicts.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1311/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r175962188
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -296,14 +297,19 @@ private[ml] object DefaultParamsWriter {
paramMap: Option[JValue] = None): String = {
val uid = instance.uid
val cls = instance.getClass.getName
- val params = instance.extractParamMap().toSeq.asInstanceOf[Seq[ParamPair[Any]]]
+ val params = instance.paramMap.toSeq
+ val defaultParams = instance.defaultParamMap.toSeq
val jsonParams = paramMap.getOrElse(render(params.map { case ParamPair(p, v) =>
p.name -> parse(p.jsonEncode(v))
}.toList))
+ val jsonDefaultParams = render(defaultParams.map { case ParamPair(p, v) =>
+ p.name -> parse(p.jsonEncode(v))
+ }.toList)
val basicMetadata = ("class" -> cls) ~
("timestamp" -> System.currentTimeMillis()) ~
("sparkVersion" -> sc.version) ~
("uid" -> uid) ~
+ ("defaultParamMap" -> jsonDefaultParams) ~
--- End diff --
nit: How about putting this below paramMap since that's nicer for viewing the JSON?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #87521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87521/testReport)** for PR 20633 at commit [`69648d6`](https://github.com/apache/spark/commit/69648d67546b292037d26ef3a282bf26afd4863e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #89421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89421/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r172139633
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader {
timestamp: Long,
sparkVersion: String,
params: JValue,
+ defaultParams: JValue,
metadata: JValue,
metadataJson: String) {
/**
* Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
* This can be useful for getting a Param value before an instance of `Params`
* is available.
+ *
+ * @param isDefaultParam Whether the given param name is a default param. Default is false.
*/
- def getParamValue(paramName: String): JValue = {
+ def getParamValue(paramName: String, isDefaultParam: Boolean = false): JValue = {
--- End diff --
Should the logic be "lookup params first, if not exist then lookup defaultParams" ?
If so, we don't need the `isDefaultParam` param.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20633#discussion_r179073826
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,27 +359,90 @@ private[ml] object DefaultParamsReader {
timestamp: Long,
sparkVersion: String,
params: JValue,
+ defaultParams: JValue,
metadata: JValue,
metadataJson: String) {
+
+ private def getValueFromParams(params: JValue): Seq[(String, JValue)] = {
+ params match {
+ case JObject(pairs) => pairs
+ case _ =>
+ throw new IllegalArgumentException(
+ s"Cannot recognize JSON metadata: $metadataJson.")
+ }
+ }
+
/**
* Get the JSON value of the [[org.apache.spark.ml.param.Param]] of the given name.
* This can be useful for getting a Param value before an instance of `Params`
- * is available.
+ * is available. This will look up `params` first, if not existing then looking up
+ * `defaultParams`.
*/
def getParamValue(paramName: String): JValue = {
implicit val format = DefaultFormats
- params match {
+
+ // Looking up for `params` first.
+ var pairs = getValueFromParams(params)
+ var foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ if (foundPairs.length == 0) {
+ // Looking up for `defaultParams` then.
+ pairs = getValueFromParams(defaultParams)
+ foundPairs = pairs.filter { case (pName, jsonValue) =>
+ pName == paramName
+ }
+ }
+ assert(foundPairs.length == 1, s"Expected one instance of Param '$paramName' but found" +
+ s" ${foundPairs.length} in JSON Params: " + pairs.map(_.toString).mkString(", "))
+
+ foundPairs.map(_._2).head
+ }
+
+ /**
+ * Extract Params from metadata, and set them in the instance.
+ * This works if all Params (except params included by `skipParams` list) implement
+ * [[org.apache.spark.ml.param.Param.jsonDecode()]].
+ *
+ * @param skipParams The params included in `skipParams` won't be set. This is useful if some
+ * params don't implement [[org.apache.spark.ml.param.Param.jsonDecode()]]
+ * and need special handling.
+ */
+ def getAndSetParams(
+ instance: Params,
+ skipParams: Option[List[String]] = None): Unit = {
+ setParams(instance, skipParams, isDefault = false)
+
+ // For metadata file prior to Spark 2.4, there is no default section.
+ val (major, minor) = VersionUtils.majorMinorVersion(sparkVersion)
+ if (major >= 2 && minor >= 4) {
--- End diff --
Good catch!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #88510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88510/testReport)** for PR 20633 at commit [`a2b6917`](https://github.com/apache/spark/commit/a2b69175d109a46f5fbb51b0e370a6df48d67e2f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/20633
Sorry for the pause in review.
LGTM
Merging with master
@dbtsai I'm going to merge this since I'm worried it will collect more conflicts, but let's discuss more if needed.
@viirya We'll need to update Python's DefaultParamsReader as well for Spark 2.4 in order to keep it in sync with Scala/Java. R thankfully should not require anything since it only has wrappers. I'll make & link a JIRA. Will you have time to work on that?
Thanks @viirya !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20633
@viirya Have you run backward compatibility test ? e.g., saving estimator/models via master version spark, and load estimator/models via this PR version spark. To check whether it works fine and all params get saved and restored correctly.
Especially this should be done against `QuantileDiscretizer`/`Bucketizer` which you remove temporary fix code.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89421/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20633
**[Test build #87998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87998/testReport)** for PR 20633 at commit [`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20633
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/945/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org