You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2015/11/12 22:11:44 UTC

[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/9678

    [SPARK-11712] [ML] Make spark.ml LDAModel be abstract

    Per discussion in the initial Pipelines LDA PR [https://github.com/apache/spark/pull/9513], we should make LDAModel abstract and create a LocalLDAModel. This code simplification should be done before the 1.6 release to ensure API compatibility in future releases.
    
    CC @feynmanliang @mengxr 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark lda-pipelines-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9678
    
----
commit 2545c89d8b48cf41dc1d76c1a35ead51e7b181ee
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2015-11-11T01:32:31Z

    partial implementation, but hit issue with lazy val

commit 3c97e418283d92fb26322e63a568d6f491ad0068
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2015-11-12T21:04:42Z

    refactored spark.ml LDAModel to be abstract

commit 9f39384131b1e0e236ac6d9c65698f93112aac34
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2015-11-12T21:07:11Z

    doc fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156285706
  
    Merging with master, branch-1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156278790
  
    **[Test build #45785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45785/consoleFull)** for PR 9678 at commit [`b3e9341`](https://github.com/apache/spark/commit/b3e9341498840e137d277ec3347a1cc4fce54179).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156236638
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156278914
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45785/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156256847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45774/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by feynmanliang <gi...@git.apache.org>.
Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44720208
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -468,7 +452,37 @@ class LDAModel private[ml] (
     /**
      * :: Experimental ::
      *
    - * Distributed model fitted by [[LDA]] using Expectation-Maximization (EM).
    + * Local (non-distributed) model fitted by [[LDA]].
    + *
    + * This model stores the inferred topics only; it does not store info about the training dataset.
    + */
    +@Since("1.6.0")
    +@Experimental
    +class LocalLDAModel private[ml] (
    +    uid: String,
    +    vocabSize: Int,
    --- End diff --
    
    Since annotations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44731696
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -480,58 +494,38 @@ class DistributedLDAModel private[ml] (
         vocabSize: Int,
         private val oldDistributedModel: OldDistributedLDAModel,
         sqlContext: SQLContext)
    -  extends LDAModel(uid, vocabSize, None, sqlContext) {
    +  extends LDAModel(uid, vocabSize, sqlContext) {
    +
    +  /** Used to implement [[oldLocalModel]] as a lazy val, but with cheap [[copy()]] */
    +  private var oldLocalModelOption: Option[OldLocalLDAModel] = None
    +
    +  override protected def oldLocalModel: OldLocalLDAModel = {
    +    if (oldLocalModelOption.isEmpty) {
    +      oldLocalModelOption = Some(oldDistributedModel.toLocal)
    +    }
    +    oldLocalModelOption.get
    +  }
    +
    +  override protected def getModel: OldLDAModel = oldDistributedModel
     
       /**
        * Convert this distributed model to a local representation.  This discards info about the
        * training dataset.
        */
       @Since("1.6.0")
    -  def toLocal: LDAModel = {
    -    if (oldLocalModel.isEmpty) {
    -      oldLocalModel = Some(oldDistributedModel.toLocal)
    -    }
    -    new LDAModel(uid, vocabSize, oldLocalModel, sqlContext)
    -  }
    -
    -  @Since("1.6.0")
    -  override protected def getModel: OldLDAModel = oldDistributedModel
    +  def toLocal: LocalLDAModel = new LocalLDAModel(uid, vocabSize, oldLocalModel, sqlContext)
     
       @Since("1.6.0")
       override def copy(extra: ParamMap): DistributedLDAModel = {
         val copied = new DistributedLDAModel(uid, vocabSize, oldDistributedModel, sqlContext)
    -    if (oldLocalModel.nonEmpty) copied.oldLocalModel = oldLocalModel
    +    copied.oldLocalModelOption = oldLocalModelOption
    --- End diff --
    
    Sounds reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156270951
  
    @feynmanliang Thanks for reviewing!  I think I addressed everything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156235031
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by feynmanliang <gi...@git.apache.org>.
Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44720555
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -480,58 +494,38 @@ class DistributedLDAModel private[ml] (
         vocabSize: Int,
    --- End diff --
    
    Do these need Since annotations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44731686
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -468,7 +452,37 @@ class LDAModel private[ml] (
     /**
      * :: Experimental ::
      *
    - * Distributed model fitted by [[LDA]] using Expectation-Maximization (EM).
    + * Local (non-distributed) model fitted by [[LDA]].
    + *
    + * This model stores the inferred topics only; it does not store info about the training dataset.
    + */
    +@Since("1.6.0")
    +@Experimental
    +class LocalLDAModel private[ml] (
    +    uid: String,
    +    vocabSize: Int,
    +    @Since("1.6.0") override protected val oldLocalModel: OldLocalLDAModel,
    --- End diff --
    
    I'll have to clarify with @mengxr 
    It's technically a public API since the class is not final.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44731693
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -468,7 +452,37 @@ class LDAModel private[ml] (
     /**
      * :: Experimental ::
      *
    - * Distributed model fitted by [[LDA]] using Expectation-Maximization (EM).
    + * Local (non-distributed) model fitted by [[LDA]].
    + *
    + * This model stores the inferred topics only; it does not store info about the training dataset.
    + */
    +@Since("1.6.0")
    +@Experimental
    +class LocalLDAModel private[ml] (
    +    uid: String,
    --- End diff --
    
    Do you mean the Since annotations?  It's because those constructor args are also vals which will appear in the public API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156272119
  
    **[Test build #45785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45785/consoleFull)** for PR 9678 at commit [`b3e9341`](https://github.com/apache/spark/commit/b3e9341498840e137d277ec3347a1cc4fce54179).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156256610
  
    **[Test build #45774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45774/consoleFull)** for PR 9678 at commit [`29d08a8`](https://github.com/apache/spark/commit/29d08a8c2c07ceae88dab1f3abef575ef5b5322c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156240853
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156256846
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156240857
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45771/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156236666
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156234993
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156278912
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by feynmanliang <gi...@git.apache.org>.
Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44720409
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -314,31 +314,31 @@ private[clustering] trait LDAParams extends Params with HasFeaturesCol with HasM
      * Model fitted by [[LDA]].
      *
      * @param vocabSize  Vocabulary size (number of terms or terms in the vocabulary)
    - * @param oldLocalModel  Underlying spark.mllib model.
    - *                       If this model was produced by Online LDA, then this is the
    - *                       only model representation.
    - *                       If this model was produced by EM, then this local
    - *                       representation may be built lazily.
      * @param sqlContext  Used to construct local DataFrames for returning query results
      */
     @Since("1.6.0")
     @Experimental
    -class LDAModel private[ml] (
    +sealed abstract class LDAModel private[ml] (
         @Since("1.6.0") override val uid: String,
         @Since("1.6.0") val vocabSize: Int,
    -    @Since("1.6.0") protected var oldLocalModel: Option[OldLocalLDAModel],
         @Since("1.6.0") @transient protected val sqlContext: SQLContext)
       extends Model[LDAModel] with LDAParams with Logging {
     
    -  /** Returns underlying spark.mllib model */
    +  // NOTE to developers:
    +  //  This abstraction should contain all important functionality for basic LDA usage.
    +  //  Specializations of this class can contain expert-only functionality.
    +
    +  /**
    +   * Underlying spark.mllib model.
    +   * If this model was produced by Online LDA, then this is the only model representation.
    +   * If this model was produced by EM, then this local representation may be built lazily.
    +   */
       @Since("1.6.0")
    -  protected def getModel: OldLDAModel = oldLocalModel match {
    -    case Some(m) => m
    -    case None =>
    -      // Should never happen.
    -      throw new RuntimeException("LDAModel required local model format," +
    -        " but the underlying model is missing.")
    -  }
    +  protected def oldLocalModel: OldLocalLDAModel
    +
    +  /** Returns underlying spark.mllib model, which may be local or distributed */
    +  @Since("1.6.0")
    +  protected def getModel: OldLDAModel
    --- End diff --
    
    Should we warn that this (any any method that uses this) will trigger a `collect` in `DistributedLDAModel`, or can we expect users of `DistributedLDAModel` understand that already?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9678


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by feynmanliang <gi...@git.apache.org>.
Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44721007
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -480,58 +494,38 @@ class DistributedLDAModel private[ml] (
         vocabSize: Int,
         private val oldDistributedModel: OldDistributedLDAModel,
         sqlContext: SQLContext)
    -  extends LDAModel(uid, vocabSize, None, sqlContext) {
    +  extends LDAModel(uid, vocabSize, sqlContext) {
    +
    +  /** Used to implement [[oldLocalModel]] as a lazy val, but with cheap [[copy()]] */
    +  private var oldLocalModelOption: Option[OldLocalLDAModel] = None
    +
    +  override protected def oldLocalModel: OldLocalLDAModel = {
    +    if (oldLocalModelOption.isEmpty) {
    +      oldLocalModelOption = Some(oldDistributedModel.toLocal)
    +    }
    +    oldLocalModelOption.get
    +  }
    +
    +  override protected def getModel: OldLDAModel = oldDistributedModel
     
       /**
        * Convert this distributed model to a local representation.  This discards info about the
        * training dataset.
        */
       @Since("1.6.0")
    -  def toLocal: LDAModel = {
    -    if (oldLocalModel.isEmpty) {
    -      oldLocalModel = Some(oldDistributedModel.toLocal)
    -    }
    -    new LDAModel(uid, vocabSize, oldLocalModel, sqlContext)
    -  }
    -
    -  @Since("1.6.0")
    -  override protected def getModel: OldLDAModel = oldDistributedModel
    +  def toLocal: LocalLDAModel = new LocalLDAModel(uid, vocabSize, oldLocalModel, sqlContext)
     
       @Since("1.6.0")
       override def copy(extra: ParamMap): DistributedLDAModel = {
         val copied = new DistributedLDAModel(uid, vocabSize, oldDistributedModel, sqlContext)
    -    if (oldLocalModel.nonEmpty) copied.oldLocalModel = oldLocalModel
    +    copied.oldLocalModelOption = oldLocalModelOption
    --- End diff --
    
    Any harm in making `oldLocalModelOption` part of the constructor and avoiding this private var access?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156237517
  
    **[Test build #45774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45774/consoleFull)** for PR 9678 at commit [`29d08a8`](https://github.com/apache/spark/commit/29d08a8c2c07ceae88dab1f3abef575ef5b5322c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by feynmanliang <gi...@git.apache.org>.
Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44720740
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -468,7 +452,37 @@ class LDAModel private[ml] (
     /**
      * :: Experimental ::
      *
    - * Distributed model fitted by [[LDA]] using Expectation-Maximization (EM).
    + * Local (non-distributed) model fitted by [[LDA]].
    + *
    + * This model stores the inferred topics only; it does not store info about the training dataset.
    + */
    +@Since("1.6.0")
    +@Experimental
    +class LocalLDAModel private[ml] (
    +    uid: String,
    --- End diff --
    
    We annotated the constructor args to the `private` constructor for `LDAModel` but not here; why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by feynmanliang <gi...@git.apache.org>.
Github user feynmanliang commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156249049
  
    LGTM, all comments minor / optional


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156271500
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156251863
  
    **[Test build #2051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2051/consoleFull)** for PR 9678 at commit [`29d08a8`](https://github.com/apache/spark/commit/29d08a8c2c07ceae88dab1f3abef575ef5b5322c).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156246240
  
    **[Test build #2051 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2051/consoleFull)** for PR 9678 at commit [`29d08a8`](https://github.com/apache/spark/commit/29d08a8c2c07ceae88dab1f3abef575ef5b5322c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44731681
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -314,31 +314,31 @@ private[clustering] trait LDAParams extends Params with HasFeaturesCol with HasM
      * Model fitted by [[LDA]].
      *
      * @param vocabSize  Vocabulary size (number of terms or terms in the vocabulary)
    - * @param oldLocalModel  Underlying spark.mllib model.
    - *                       If this model was produced by Online LDA, then this is the
    - *                       only model representation.
    - *                       If this model was produced by EM, then this local
    - *                       representation may be built lazily.
      * @param sqlContext  Used to construct local DataFrames for returning query results
      */
     @Since("1.6.0")
     @Experimental
    -class LDAModel private[ml] (
    +sealed abstract class LDAModel private[ml] (
         @Since("1.6.0") override val uid: String,
         @Since("1.6.0") val vocabSize: Int,
    -    @Since("1.6.0") protected var oldLocalModel: Option[OldLocalLDAModel],
         @Since("1.6.0") @transient protected val sqlContext: SQLContext)
       extends Model[LDAModel] with LDAParams with Logging {
     
    -  /** Returns underlying spark.mllib model */
    +  // NOTE to developers:
    +  //  This abstraction should contain all important functionality for basic LDA usage.
    +  //  Specializations of this class can contain expert-only functionality.
    +
    +  /**
    +   * Underlying spark.mllib model.
    +   * If this model was produced by Online LDA, then this is the only model representation.
    +   * If this model was produced by EM, then this local representation may be built lazily.
    +   */
       @Since("1.6.0")
    -  protected def getModel: OldLDAModel = oldLocalModel match {
    -    case Some(m) => m
    -    case None =>
    -      // Should never happen.
    -      throw new RuntimeException("LDAModel required local model format," +
    -        " but the underlying model is missing.")
    -  }
    +  protected def oldLocalModel: OldLocalLDAModel
    +
    +  /** Returns underlying spark.mllib model, which may be local or distributed */
    +  @Since("1.6.0")
    +  protected def getModel: OldLDAModel
    --- End diff --
    
    This does not trigger a collect.  It's oldLocalModel which can trigger a collect.
    
    I'll add some more warnings in places where it can happen in the public API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by feynmanliang <gi...@git.apache.org>.
Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9678#discussion_r44720656
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
    @@ -468,7 +452,37 @@ class LDAModel private[ml] (
     /**
      * :: Experimental ::
      *
    - * Distributed model fitted by [[LDA]] using Expectation-Maximization (EM).
    + * Local (non-distributed) model fitted by [[LDA]].
    + *
    + * This model stores the inferred topics only; it does not store info about the training dataset.
    + */
    +@Since("1.6.0")
    +@Experimental
    +class LocalLDAModel private[ml] (
    +    uid: String,
    +    vocabSize: Int,
    +    @Since("1.6.0") override protected val oldLocalModel: OldLocalLDAModel,
    --- End diff --
    
    Does a `protected val` in a private constructor need a Since annotation? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11712] [ML] Make spark.ml LDAModel be a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9678#issuecomment-156271527
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org