You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2015/12/16 15:03:47 UTC

[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/10327

    [SPARK-12349] [ML] Make spark.ml PCAModel load backwards compatible

    Only load explainedVariance in PCAModel if it was written with Spark > 1.6.x
    @jkbradley is this kind of what you had in mind?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-12349

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10327
    
----
commit eec7655029fce21d140e7f564d2bfea74979adc2
Author: Sean Owen <so...@cloudera.com>
Date:   2015-12-16T14:03:10Z

    Only load explainedVariance in PCAModel if it was written with Spark > 1.6.x

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-166262381
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165707217
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10327#discussion_r48093699
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
    @@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
     
         private val className = classOf[PCAModel].getName
     
    +    /**
    +     * Loads a [[PCAModel]] from data the input path. Note that the model includes an
    --- End diff --
    
    isnt there a word missing like: "Loads a PCAModel from data **located at** the input path"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10327


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165547930
  
    **[Test build #47934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47934/consoleFull)** for PR 10327 at commit [`46698eb`](https://github.com/apache/spark/commit/46698eb62780e5ab446b0f847f6aa93cab30fe1a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-166118821
  
    **[Test build #48075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48075/consoleFull)** for PR 10327 at commit [`06ca002`](https://github.com/apache/spark/commit/06ca0029bfab1a5771512cf579299a446518d49a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165548334
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165133113
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47814/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165534988
  
    **[Test build #47934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47934/consoleFull)** for PR 10327 at commit [`46698eb`](https://github.com/apache/spark/commit/46698eb62780e5ab446b0f847f6aa93cab30fe1a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165997370
  
    lgtm except one minor doc comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165548337
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47934/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165118231
  
    **[Test build #47814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47814/consoleFull)** for PR 10327 at commit [`eec7655`](https://github.com/apache/spark/commit/eec7655029fce21d140e7f564d2bfea74979adc2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-166125038
  
    **[Test build #48075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48075/consoleFull)** for PR 10327 at commit [`06ca002`](https://github.com/apache/spark/commit/06ca0029bfab1a5771512cf579299a446518d49a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165133112
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10327#discussion_r48101296
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
    @@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
     
         private val className = classOf[PCAModel].getName
     
    +    /**
    +     * Loads a [[PCAModel]] from data the input path. Note that the model includes an
    --- End diff --
    
    Oops, will fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10327#discussion_r48502942
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
    @@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
     
         private val className = classOf[PCAModel].getName
     
    +    /**
    +     * Loads a [[PCAModel]] from data located at the input path. Note that the model includes an
    +     * `explainedVariance` member that is not recorded by Spark 1.6 and earlier. A model
    +     * can be loaded from such older data but will have an empty vector for
    +     * `explainedVariance`.
    +     *
    +     * @param path path to serialized model data
    +     * @return a [[PCAModel]]
    +     */
         override def load(path: String): PCAModel = {
           val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
    +
    +      // explainedVariance field is not present in Spark <= 1.6
    +      val versionRegex = "([0-9]+)\\.([0-9])+.*".r
    --- End diff --
    
    Sorry for the late comment (just getting back from traveling), but the regex is a little off: The second "+" should be inside the parentheses (though this will probably never be needed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10327#discussion_r48503471
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala ---
    @@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] {
     
         private val className = classOf[PCAModel].getName
     
    +    /**
    +     * Loads a [[PCAModel]] from data located at the input path. Note that the model includes an
    +     * `explainedVariance` member that is not recorded by Spark 1.6 and earlier. A model
    +     * can be loaded from such older data but will have an empty vector for
    +     * `explainedVariance`.
    +     *
    +     * @param path path to serialized model data
    +     * @return a [[PCAModel]]
    +     */
         override def load(path: String): PCAModel = {
           val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
    +
    +      // explainedVariance field is not present in Spark <= 1.6
    +      val versionRegex = "([0-9]+)\\.([0-9])+.*".r
    --- End diff --
    
    Oh shoot, typo. It won't matter anytime soon but yes that's not quite correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165132883
  
    **[Test build #47814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47814/consoleFull)** for PR 10327 at commit [`eec7655`](https://github.com/apache/spark/commit/eec7655029fce21d140e7f564d2bfea74979adc2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-167645124
  
    My only comment is that, in general, I'd like us to add unit tests for backwards compatibility (as was done for spark.mllib NaiveBayesModel).  But I'm OK with skipping it this time since save/load is Experimental for 1.6.
    
    Thanks for sending the patch @srowen !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-165987915
  
    Will merge soon if there are no other comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-166125072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48075/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10327#issuecomment-166125071
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org