You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2017/01/19 14:46:39 UTC

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/16646

    [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports output log-likelihood.

    ## What changes were proposed in this pull request?
    ```spark.gaussianMixture``` supports output total log-likelihood for the model like R ```mvnormalmixEM```.
    
    ## How was this patch tested?
    R unit test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-19291

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16646
    
----
commit a2aa6ce0c25d1ca0e8f42e52459a62c7c09c9a46
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-01-19T14:43:27Z

    spark.gaussianMixture supports output log-likelihood.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97029531
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    --- End diff --
    
    Yeah, it depends on which stage you want to get.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16646
  
    **[Test build #71665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71665/testReport)** for PR 16646 at commit [`a2aa6ce`](https://github.com/apache/spark/commit/a2aa6ce0c25d1ca0e8f42e52459a62c7c09c9a46).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by wangmiao1981 <gi...@git.apache.org>.
Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97018334
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    --- End diff --
    
    I have a question: I saw in some wrappers,, it uses `pipeline.stages.last` and some uses `pipeline.stages(1)`. What is the difference of the two use case? I tried using them interchangably and the tests are still passed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16646


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16646
  
    Merged into master. If there are new comments about the model persistence compatibility issue, we can address them in follow-up work. Thanks for all your reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97013459
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    --- End diff --
    
    hmm, I see what you are saying


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97016294
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
           val rMetadata = parse(rMetadataStr)
           val dim = (rMetadata \ "dim").extract[Int]
    -      new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
    +      val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
    +      new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = true)
    --- End diff --
    
    Yeah, it will break existing persisted model, but I think we don't guarantee mode persistent compatibility between different versions for SparkR. We are planing to make model persistence consistent between SparkR and MLlib, then there is no specific handling for SparkR and will let MLlib to handle all model persistent issue.
    However if we want to make model persistent compatibility for SparkR currently, I can add code to handle different versions here but will lead maintenance more complicated. What's your opinions? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97019304
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    --- End diff --
    
    they are the same when the pipeline has 1 stage.
    I prefer `stages.last` because if we later add a stage to transform the input data it will break `stages(1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97013742
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
           val rMetadata = parse(rMetadataStr)
           val dim = (rMetadata \ "dim").extract[Int]
    -      new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
    +      val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
    +      new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = true)
    --- End diff --
    
    would this break with any existing persisted model (that is missing a double here for logLikelihood)?
    is there a way to mitigate that?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r96910742
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    +    val logLikelihood: Double = gmm.summary.logLikelihood
    +
    +    new GaussianMixtureWrapper(pipeline, dim, logLikelihood)
    --- End diff --
    
    consider not saving this into the wrapper (and reader/writer) and instead access the model?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97007694
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    +    val logLikelihood: Double = gmm.summary.logLikelihood
    +
    +    new GaussianMixtureWrapper(pipeline, dim, logLikelihood)
    --- End diff --
    
    We can't, since ```summary``` was not saved in the pipeline model, so we need to save it into the wrapper explicitly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16646
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71665/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16646
  
    **[Test build #71665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71665/testReport)** for PR 16646 at commit [`a2aa6ce`](https://github.com/apache/spark/commit/a2aa6ce0c25d1ca0e8f42e52459a62c7c09c9a46).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97007956
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    --- End diff --
    
    We need here to explicitly get ```logLikelihood``` and make it a member of the wrapper, since ```summary``` was not saved in the pipeline model so we can't get it (after L40) from a persistent R gaussian mixture model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16646
  
    cc @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97033168
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
           val rMetadata = parse(rMetadataStr)
           val dim = (rMetadata \ "dim").extract[Int]
    -      new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
    +      val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
    +      new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = true)
    --- End diff --
    
    Yeah, I'll link this to SPARK-18864 and collect others' thoughts. If we really need make R model persistence compatibility with old versions, I can address it in follow-up work. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97037090
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
           val rMetadata = parse(rMetadataStr)
           val dim = (rMetadata \ "dim").extract[Int]
    -      new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
    +      val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
    +      new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = true)
    --- End diff --
    
    @jkbradley What do you think of the model persistence incompatibility issue? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97019071
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
           val rMetadata = parse(rMetadataStr)
           val dim = (rMetadata \ "dim").extract[Int]
    -      new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
    +      val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
    +      new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = true)
    --- End diff --
    
    it may not be a big deal right now, since spark.gmm is relatively new.
    but I think we should come up with a plan on model persistent compability not only with R vs JVM but also across versions of Spark.
    also might be useful to link this JIRA to SPARK-18864


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16646
  
    looks good. just have some question not specific to this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16646
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97013542
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    +    val logLikelihood: Double = gmm.summary.logLikelihood
    --- End diff --
    
    for this line and above it, why do we need to explicitly give it a type (ie, `Double` or `GaussianMixtureModel`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r97015657
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    +    val logLikelihood: Double = gmm.summary.logLikelihood
    --- End diff --
    
    Both are ok, to explicitly give a type will make developers clear to understand what it means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r96910423
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           .setStages(Array(rFormulaModel, gm))
           .fit(data)
     
    -    new GaussianMixtureWrapper(pipeline, dim)
    +    val gmm: GaussianMixtureModel = pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
    --- End diff --
    
    gmm is already defined above? L40


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16646#discussion_r99253729
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
    @@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends MLReadable[GaussianMixtureWrapp
           val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
           val rMetadata = parse(rMetadataStr)
           val dim = (rMetadata \ "dim").extract[Int]
    -      new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
    +      val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
    +      new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = true)
    --- End diff --
    
    Breaking it is OK with me.  Thanks for adding it to the "changes of behavior" JIRA for tracking!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org