You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2015/12/16 15:03:47 UTC
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/10327
[SPARK-12349] [ML] Make spark.ml PCAModel load backwards compatible
Only load explainedVariance in PCAModel if it was written with Spark > 1.6.x
@jkbradley is this kind of what you had in mind?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srowen/spark SPARK-12349
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10327.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10327
----
commit eec7655029fce21d140e7f564d2bfea74979adc2
Author: Sean Owen <so...@cloudera.com>
Date: 2015-12-16T14:03:10Z
Only load explainedVariance in PCAModel if it was written with Spark > 1.6.x
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-166262381
Merged to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165707217
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10327#discussion_r48093699
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
@@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
private val className = classOf[PCAModel].getName
+ /**
+ * Loads a [[PCAModel]] from data the input path. Note that the model includes an
--- End diff --
isnt there a word missing like: "Loads a PCAModel from data **located at** the input path"?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/10327
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165547930
**[Test build #47934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47934/consoleFull)** for PR 10327 at commit [`46698eb`](https://github.com/apache/spark/commit/46698eb62780e5ab446b0f847f6aa93cab30fe1a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-166118821
**[Test build #48075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48075/consoleFull)** for PR 10327 at commit [`06ca002`](https://github.com/apache/spark/commit/06ca0029bfab1a5771512cf579299a446518d49a).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165548334
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165133113
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47814/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165534988
**[Test build #47934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47934/consoleFull)** for PR 10327 at commit [`46698eb`](https://github.com/apache/spark/commit/46698eb62780e5ab446b0f847f6aa93cab30fe1a).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165997370
lgtm except one minor doc comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165548337
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47934/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165118231
**[Test build #47814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47814/consoleFull)** for PR 10327 at commit [`eec7655`](https://github.com/apache/spark/commit/eec7655029fce21d140e7f564d2bfea74979adc2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-166125038
**[Test build #48075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48075/consoleFull)** for PR 10327 at commit [`06ca002`](https://github.com/apache/spark/commit/06ca0029bfab1a5771512cf579299a446518d49a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165133112
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/10327#discussion_r48101296
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
@@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
private val className = classOf[PCAModel].getName
+ /**
+ * Loads a [[PCAModel]] from data the input path. Note that the model includes an
--- End diff --
Oops, will fix
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/10327#discussion_r48502942
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
@@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
private val className = classOf[PCAModel].getName
+ /**
+ * Loads a [[PCAModel]] from data located at the input path. Note that the model includes an
+ * `explainedVariance` member that is not recorded by Spark 1.6 and earlier. A model
+ * can be loaded from such older data but will have an empty vector for
+ * `explainedVariance`.
+ *
+ * @param path path to serialized model data
+ * @return a [[PCAModel]]
+ */
override def load(path: String): PCAModel = {
val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
+
+ // explainedVariance field is not present in Spark <= 1.6
+ val versionRegex = "([0-9]+)\\.([0-9])+.*".r
--- End diff --
Sorry for the late comment (just getting back from traveling), but the regex is a little off: The second "+" should be inside the parentheses (though this will probably never be needed).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/10327#discussion_r48503471
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
@@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
private val className = classOf[PCAModel].getName
+ /**
+ * Loads a [[PCAModel]] from data located at the input path. Note that the model includes an
+ * `explainedVariance` member that is not recorded by Spark 1.6 and earlier. A model
+ * can be loaded from such older data but will have an empty vector for
+ * `explainedVariance`.
+ *
+ * @param path path to serialized model data
+ * @return a [[PCAModel]]
+ */
override def load(path: String): PCAModel = {
val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
+
+ // explainedVariance field is not present in Spark <= 1.6
+ val versionRegex = "([0-9]+)\\.([0-9])+.*".r
--- End diff --
Oh shoot, typo. It won't matter anytime soon but yes that's not quite correct.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165132883
**[Test build #47814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47814/consoleFull)** for PR 10327 at commit [`eec7655`](https://github.com/apache/spark/commit/eec7655029fce21d140e7f564d2bfea74979adc2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-167645124
My only comment is that, in general, I'd like us to add unit tests for backwards compatibility (as was done for spark.mllib NaiveBayesModel). But I'm OK with skipping it this time since save/load is Experimental for 1.6.
Thanks for sending the patch @srowen !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-165987915
Will merge soon if there are no other comments
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-166125072
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48075/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10327#issuecomment-166125071
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org