You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by wangmiao1981 <gi...@git.apache.org> on 2017/01/03 23:30:02 UTC

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

GitHub user wangmiao1981 opened a pull request:

    https://github.com/apache/spark/pull/16464

    [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer correctly

    ## What changes were proposed in this pull request?
    
    spark.lda passes the optimizer "em" or "online" as a string to the backend. However, LDAWrapper doesn't set optimizer based on the value from R. Therefore, for optimizer "em", the `isDistributed` field is FALSE, which should be TRUE based on scala code.
    
    In addition, the `summary` method should bring back the results related to `DistributedLDAModel`.
    
    ## How was this patch tested?
    Manual tests by comparing with scala example.
    Modified the current unit test: fix the incorrect unit test and add necessary tests for `summary` method.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangmiao1981/spark new

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16464
    
----
commit 277763614934aef2625274e2e55e7efe6eb4c2df
Author: wm624@hotmail.com <wm...@hotmail.com>
Date:   2017-01-03T23:26:28Z

    fix the optimizer bug

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Sure. I will revert it to the previous commit once the 16491 is in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    @felixcheung PR #16491 is filed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71303/testReport)** for PR 16464 at commit [`882c70d`](https://github.com/apache/spark/commit/882c70da32756e7603bd293b2ba010a585fdc0c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94984564
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable[LDAWrapper] {
           model,
           ldaModel.logLikelihood(preprocessedData),
           ldaModel.logPerplexity(preprocessedData),
    +      trainingLogLikelihood,
    +      logPrior,
    --- End diff --
    
    LogPrior is calculated based on the serialized topics etc, which are also used by the trainingLikelyhood. But the trainingLikelyhood is the same for both original and loaded model. Let me debug more. It looks like a bug. The original MLLIB implementation doesn't serialize the two parameters as they can be calculated from other saved values. In addition, there is no unit test for comparing the two values, which could be the reason of not catching this issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71383/testReport)** for PR 16464 at commit [`95a6910`](https://github.com/apache/spark/commit/95a69106ca52844bafdf820b50ed8353d6c80a25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71383/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71418/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94983934
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -123,6 +126,10 @@ private[r] object LDAWrapper extends MLReadable[LDAWrapper] {
           .setMaxIter(maxIter)
           .setSubsamplingRate(subsamplingRate)
     
    +    if (optimizer == "em") {
    --- End diff --
    
    If it is "online", it is not necessary to set the optimizer. But setting it anyway will make the code clean. I will do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    cc @felixcheung @yanboliang A bug fix. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94721880
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable[LDAWrapper] {
           model,
           ldaModel.logLikelihood(preprocessedData),
           ldaModel.logPerplexity(preprocessedData),
    +      trainingLogLikelihood,
    +      logPrior,
    --- End diff --
    
    The first version, I got them from the model in the `LDAWrapper` class. However, when I read `logPrior`, I found that the loaded `logPrior` is not the same as the value before save. So, I followed the logLikelihood and logPerplexity to save it in the metadata. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94866311
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable[LDAWrapper] {
           model,
           ldaModel.logLikelihood(preprocessedData),
           ldaModel.logPerplexity(preprocessedData),
    +      trainingLogLikelihood,
    +      logPrior,
    --- End diff --
    
    With the same dataset, Scala side tests:
    Original LogPrior:-3.3387459952856338
    LogPrior from saved model: -0.9202435107654922



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    I think it is a bug. I will file a PR to fix it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94905455
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable[LDAWrapper] {
           model,
           ldaModel.logLikelihood(preprocessedData),
           ldaModel.logPerplexity(preprocessedData),
    +      trainingLogLikelihood,
    +      logPrior,
    --- End diff --
    
    that's odd, is that an issue with model persistence?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r96049008
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    When returning NA
    `> stats
    $docConcentration
     [1] 0.09116572 0.13013764 0.09116557 0.09116559 0.09116584 0.09116617
     [7] 0.09116576 0.09116573 0.09116569 0.09116564
    
    $topicConcentration
    [1] 0.1
    
    $logLikelihood
    [1] -392.9573
    
    $logPerplexity
    [1] 2.976949
    
    $isDistributed
    [1] FALSE
    
    $vocabSize
    [1] 10
    
    $topics
    SparkDataFrame[topic:int, term:array<string>, termWeights:array<double>]
    
    $vocabulary
     [1] "0" "1" "2" "3" "4" "9" "5" "8" "7" "6"
    
    $trainingLogLikelihood
    [1] NA
    
    $logPrior
    [1] NA
    `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70988/testReport)** for PR 16464 at commit [`5f1c644`](https://github.com/apache/spark/commit/5f1c6443672a50699450c2787ca221b29a7b3291).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70988/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71093/testReport)** for PR 16464 at commit [`456e06d`](https://github.com/apache/spark/commit/456e06da42e65a658d8625ea94d1ea4208b3ac62).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16464


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71301/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71302/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    LGTM, merged into master. Thanks for all you.
    @wangmiao1981 Could you prepare another PR for branch-2.1? The PR can only include to set the correct optimizer part. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71338/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94716683
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable[LDAWrapper] {
           model,
           ldaModel.logLikelihood(preprocessedData),
           ldaModel.logPerplexity(preprocessedData),
    +      trainingLogLikelihood,
    +      logPrior,
    --- End diff --
    
    since model is referenced and persisted, is there a need to handle trainingLogLikelihood and logPrior separately like this, and writing to metadata, instead of just getting from the model when fetching for the summary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70988/testReport)** for PR 16464 at commit [`5f1c644`](https://github.com/apache/spark/commit/5f1c6443672a50699450c2787ca221b29a7b3291).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r95948100
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    I think it's more appropriate to return ```NULL``` rather than ```NaN``` for local LDA model, since the ```logPrior``` is not existing rather than not a number.
    BTW, I think we can return NULL directly according to ```isDistributed```, otherwise, call corresponding Scala methods. This should reduce the complexity of ```LDAWrapper``` and reduce communication between R and Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71301/testReport)** for PR 16464 at commit [`0134a26`](https://github.com/apache/spark/commit/0134a2693f6abfc51d0c11d693b97971072affaa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r95436115
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -45,6 +45,11 @@ private[r] class LDAWrapper private (
       import LDAWrapper._
     
       private val lda: LDAModel = pipeline.stages.last.asInstanceOf[LDAModel]
    +  private val distributedMoel = lda.isDistributed match {
    --- End diff --
    
    Fixed. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r95954100
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    In MLlib, if the model is ```DistributedLDAModel```, it has variables called ```trainingLogLikelihood``` and ```logPrior```. If the model is ```LocalLDAModel```, there is no above variables exist, we can not tell users whether they are numeric. So I'd prefer ```NULL``` to represent not existing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94905196
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -123,6 +126,10 @@ private[r] object LDAWrapper extends MLReadable[LDAWrapper] {
           .setMaxIter(maxIter)
           .setSubsamplingRate(subsamplingRate)
     
    +    if (optimizer == "em") {
    --- End diff --
    
    shouldn't it just set `optimizer` - without having to check if it is `em`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71093/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r95434408
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
    @@ -45,6 +45,11 @@ private[r] class LDAWrapper private (
       import LDAWrapper._
     
       private val lda: LDAModel = pipeline.stages.last.asInstanceOf[LDAModel]
    +  private val distributedMoel = lda.isDistributed match {
    --- End diff --
    
    `distributedModel`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    `graph.vertices.aggregate(0.0)(seqOp, _ + _)` when calculating `logPrior` gets different values for the two models, even if all parameters of the Model are the same. 
    
    I print out each step of `seqOp`, which are different for the two models. I am trying to understand the logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71303/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71418/testReport)** for PR 16464 at commit [`e133ee6`](https://github.com/apache/spark/commit/e133ee64961beaf10ccccb7885ece76ded021ae5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70889/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r96047751
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    When returning NULL,
    `stats
    $docConcentration
     [1] 0.09117984 0.09117993 0.09117964 0.09117969 0.09117991 0.13010149
     [7] 0.09117986 0.09117983 0.09192744 0.09117969
    
    $topicConcentration
    [1] 0.1
    
    $logLikelihood
    [1] -395.4505
    
    $logPerplexity
    [1] 2.995837
    
    $isDistributed
    [1] FALSE
    
    $vocabSize
    [1] 10
    
    $topics
    SparkDataFrame[topic:int, term:array<string>, termWeights:array<double>]
    
    $vocabulary
     [1] "0" "1" "2" "3" "4" "9" "5" "8" "7" "6"
    
    $trainingLogLikelihood
    NULL
    
    $logPrior
    NULL`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71149/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71149/testReport)** for PR 16464 at commit [`aee8da5`](https://github.com/apache/spark/commit/aee8da5c770f0fef2b49f196e38723e92c28d677).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r96035584
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    My take is the convention in R for missing values in a structure (list, data.frame) is `NA`.
    And since we are returning a list we could also simply omit it if the value is not applicable or available. (ie. don't add `trainingLogLikelihood` to the list in L421)
    
    In any case, what does it look like when printing the `summary` returned here for these values (NULL, NA) we are proposing?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r94905327
  
    --- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
    @@ -907,6 +917,8 @@ test_that("spark.lda with text input", {
       expect_equal(logPerplexity, stats2$logPerplexity)
       expect_equal(vocabSize, stats2$vocabSize)
       expect_true(all.equal(vocabulary, stats2$vocabulary))
    +  expect_equal(trainingLogLikelihood, stats2$trainingLogLikelihood)
    +  expect_equal(logPrior, stats2$logPrior)
    --- End diff --
    
    can you have a test for when these are not `nan`? like here https://github.com/apache/spark/blob/6ee28423ad1b2e6089b82af64a31d77d3552bb38/mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala#L239


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r96144837
  
    --- Diff: R/pkg/inst/tests/testthat/test_mllib_clustering.R ---
    @@ -146,12 +146,16 @@ test_that("spark.lda with libsvm", {
       topics <- stats$topicTopTerms
       weights <- stats$topicTopTermsWeights
       vocabulary <- stats$vocabulary
    +  trainingLogLikelihood <- stats$trainingLogLikelihood
    +  logPrior <- stats$logPrior
     
    -  expect_false(isDistributed)
    +  expect_true(isDistributed)
       expect_true(logLikelihood <= 0 & is.finite(logLikelihood))
       expect_true(logPerplexity >= 0 & is.finite(logPerplexity))
       expect_equal(vocabSize, 11)
       expect_true(is.null(vocabulary))
    +  expect_true(trainingLogLikelihood <= 0 & !is.nan(trainingLogLikelihood))
    +  expect_true(logPrior <= 0 & !is.nan(logPrior))
    --- End diff --
    
    shouldn't these two tests be `!is.na`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70840/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70889/testReport)** for PR 16464 at commit [`26fa14b`](https://github.com/apache/spark/commit/26fa14b55172fc3282d3422a78e3b4860e2eefda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71383/testReport)** for PR 16464 at commit [`95a6910`](https://github.com/apache/spark/commit/95a69106ca52844bafdf820b50ed8353d6c80a25).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    I print out the values of each step of `seqOp`, `graph.vertices.aggregate(0.0)(seqOp, _ + _)` just returns the `seqOp` of the last vertex.
    VertexID sequence for original model:
    -8 -6 -3 10 -5 4 11 -1 0 1 -2 6 -7 7 8 -10 -9 9 -4 3 -11 5 2
    For this sequence, it returns the value of Vertex2.
    VertexID sequence for the loaded model:
    3 5 2 -11 -5 4 11 -1 0 1 -2 6 -7 -8 7 -10 8 9 -9 -4 10 -6 -3
    For this sequence, it returns the value of Vertex -3.
    
    Actually, the value of Vertex2 is the same in both sequences (same of Vertex -3 too).
    
    It seems that `graph.vertices.aggregate(0.0)(seqOp, _ + _)`  doesn't returns the aggregated value as it is supposed to return (i.e., `sum(SeqOp(Vertex_i))` ). 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71338/testReport)** for PR 16464 at commit [`882c70d`](https://github.com/apache/spark/commit/882c70da32756e7603bd293b2ba010a585fdc0c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71418/testReport)** for PR 16464 at commit [`e133ee6`](https://github.com/apache/spark/commit/e133ee64961beaf10ccccb7885ece76ded021ae5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r96103386
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    another data point is that `NA` fits in as just another element in vector and list in R but not NULL
    ```
    > as.data.frame(list(1, 2, 2))
      X1 X2 X2.1
    1  1  2    2
    > as.data.frame(list(1, NULL, 2))
    Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
      arguments imply differing number of rows: 1, 0
    > as.data.frame(list(1, NA, 2))
      X1 NA. X2
    1  1  NA  2
    ```
    
    In Spark SQL, we convert NULL from JVM to NA in DataFrame


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    let's get 16491 through so we could simply use the model instead of saving extra copies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70845/testReport)** for PR 16464 at commit [`6c096bb`](https://github.com/apache/spark/commit/6c096bb3128fbf3a12c21eee8d8bcb026f209fbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70845/testReport)** for PR 16464 at commit [`6c096bb`](https://github.com/apache/spark/commit/6c096bb3128fbf3a12c21eee8d8bcb026f209fbb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    `graph.vertices.aggregate(0.0)(seqOp, _ + _)` the vertex sequence is different in the two models.
    VertexID sequence for original model:
    -8 -6 -3 10 -5 4 11 -1 0 1 -2 6 -7 7 8 -10 -9 9 -4 3 -11 5 2
    VertexID sequence for the loaded model:
    3 5 2 -11 -5 4 11 -1 0 1 -2 6 -7 -8 7 -10 8 9 -9 -4 10 -6 -3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70845/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70840/testReport)** for PR 16464 at commit [`2777636`](https://github.com/apache/spark/commit/277763614934aef2625274e2e55e7efe6eb4c2df).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r95951488
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    I'd prefer NA rather than NULL.
    If it's numeric I think NaN is appropriate, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r96125283
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    @felixcheung I will update it to NA. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    @felixcheung I made modifications and don't save the two metrics of DistributedModels.
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71302/testReport)** for PR 16464 at commit [`b72592c`](https://github.com/apache/spark/commit/b72592ce02e9a8af518a103ab81a2dfe8a103d51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70863/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71149/testReport)** for PR 16464 at commit [`aee8da5`](https://github.com/apache/spark/commit/aee8da5c770f0fef2b49f196e38723e92c28d677).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    This is great, just one minor test comment.
    @yanboliang what do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70863/testReport)** for PR 16464 at commit [`14bafc1`](https://github.com/apache/spark/commit/14bafc1bd8b2c621cfd2f83f543182a2e38f8fd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70889/testReport)** for PR 16464 at commit [`26fa14b`](https://github.com/apache/spark/commit/26fa14b55172fc3282d3422a78e3b4860e2eefda).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r95948289
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -388,6 +388,13 @@ setMethod("spark.lda", signature(data = "SparkDataFrame"),
     #'         \item{\code{topics}}{top 10 terms and their weights of all topics}
     #'         \item{\code{vocabulary}}{whole terms of the training corpus, NULL if libsvm format file
     #'               used as training set}
    +#'         \item{\code{trainingLogLikelihood}}{Log likelihood of the observed tokens in the training set,
    +#'               given the current parameter estimates:
    +#'               log P(docs | topics, topic distributions for docs, Dirichlet hyperparameters)
    +#'               It is only for \code{DistributedLDAModel} (i.e., optimizer = "em")}
    --- End diff --
    
    ```\code{DistributedLDAModel}``` should convert to text description, since there is no class called ```DistributedLDAModel``` in SparkR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71093/testReport)** for PR 16464 at commit [`456e06d`](https://github.com/apache/spark/commit/456e06da42e65a658d8625ea94d1ea4208b3ac62).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #70840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70840/testReport)** for PR 16464 at commit [`2777636`](https://github.com/apache/spark/commit/277763614934aef2625274e2e55e7efe6eb4c2df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16464#discussion_r96049147
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
                 vocabSize <- callJMethod(jobj, "vocabSize")
                 topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic))
                 vocabulary <- callJMethod(jobj, "vocabulary")
    +            trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood")
    +            logPrior <- callJMethod(jobj, "logPrior")
    --- End diff --
    
    They look very similar. I am fine with any of the two. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16464
  
    **[Test build #71338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71338/testReport)** for PR 16464 at commit [`882c70d`](https://github.com/apache/spark/commit/882c70da32756e7603bd293b2ba010a585fdc0c5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org