You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by hhbyyh <gi...@git.apache.org> on 2015/04/23 13:48:09 UTC

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/5661

    [Spark-7090][MLlib] Introduce LDAOptimizer to LDA to further improve extensibility

    jira: https://issues.apache.org/jira/browse/SPARK-7090 
    
    LDA was implemented with extensibility in mind. And with the development of OnlineLDA and Gibbs Sampling, we are collecting more detailed requirements from different algorithms.
    As Joseph Bradley proposed in https://github.com/apache/spark/pull/4807 and with some further discussion, we'd like to adjust the code structure a little to present the common interface and extension point clearly.
    Basically class LDA would be a common entrance for LDA computing. And each LDA object will refer to a LDAOptimizer for the concrete algorithm implementation. Users can customize LDAOptimizer with specific parameters and assign it to LDA.
    
    
    Concrete changes:
    
    1. Add a trait `LDAOptimizer`, which defines the common iterface for concrete implementations. Each subClass is a wrapper for a specific LDA algorithm. 
    
    2. Move EMOptimizer to file LDAOptimizer and inherits from LDAOptimizer, rename to EMLDAOptimizer. (in case a more generic EMOptimizer comes in the future)
            -adjust the constructor of EMOptimizer, since all the parameters should be passed in through initialState method. This can avoid unwanted confusion or overwrite.
            -move the code from LDA.initalState to initalState of EMLDAOptimizer
    
    3. Add property ldaOptimizer to LDA and its getter/setter, and EMLDAOptimizer is the default Optimizer.
    
    4. Change the return type of LDA.run from DistributedLDAModel to LDAModel.
    
    Further work:
    add OnlineLDAOptimizer and other possible Optimizers once ready.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark ldaRefactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5661.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5661
    
----
commit ec2f857645bdcabc8f51c310237d0365e7d2230e
Author: Yuhao Yang <hh...@gmail.com>
Date:   2015-04-22T12:49:37Z

    protoptype for discussion

commit 0bb8400e70011c8f97ece31d395a8c75b15bab4f
Author: Yuhao Yang <hh...@gmail.com>
Date:   2015-04-23T11:15:04Z

    refactor LDA with Optimizer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116124
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala ---
    @@ -220,6 +206,38 @@ class LDA private (
         this
       }
     
    +
    +  /** LDAOptimizer used to perform the actual calculation */
    +  def getOptimizer(): LDAOptimizer = ldaOptimizer
    +
    +  /**
    +   * LDAOptimizer used to perform the actual calculation (default = EMLDAOptimizer)
    +   */
    +  def setOptimizer(optimizer: LDAOptimizer): this.type = {
    +    this.ldaOptimizer = optimizer
    +    this
    +  }
    +
    +  /**
    +   * Set the LDAOptimizer used to perform the actual calculation by algorithm name.
    +   * Currently "EM" is supported.
    --- End diff --
    
    Could this be made case-insensitive?  Other algorithms mostly use lowercase string values.  We can note it in the doc too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116128
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -0,0 +1,201 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.clustering
    +
    +import java.util.Random
    +
    +import breeze.linalg.{DenseVector => BDV, normalize}
    +import org.apache.spark.annotation.Experimental
    --- End diff --
    
    newline between this and previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95590874
  
      [Test build #30843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30843/consoleFull) for   PR 5661 at commit [`e756ce4`](https://github.com/apache/spark/commit/e756ce4c351a67e92afc0faef42b314c8ab8a31d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95622391
  
      [Test build #30843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30843/consoleFull) for   PR 5661 at commit [`e756ce4`](https://github.com/apache/spark/commit/e756ce4c351a67e92afc0faef42b314c8ab8a31d).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait LDAOptimizer`
      * `class EMLDAOptimizer extends LDAOptimizer`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95561221
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30836/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95558778
  
    Since SPARK-7090 was a duplicate, I closed it. Retag this for SPARK-7089?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96482693
  
    @hhbyyh  Thanks for reminding me of the discussion in the other PR.  I guess it's hard to say what's better given that I've contradicted myself now about whether to split the Optimizer and LearningState concepts.  I think it's fine if you keep them both under the Optimizer concept.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by hhbyyh <gi...@git.apache.org>.

Github user hhbyyh commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96846023
  
    @jkbradley Thanks, I think it's fine to merge the current version. 
    And if the pending API name change is a concern, I can do a quick update. ( Need to wait for test)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116132
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -0,0 +1,201 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.clustering
    +
    +import java.util.Random
    +
    +import breeze.linalg.{DenseVector => BDV, normalize}
    +import org.apache.spark.annotation.Experimental
    +import org.apache.spark.graphx._
    +import org.apache.spark.graphx.impl.GraphImpl
    +import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
    +import org.apache.spark.mllib.linalg.Vector
    +import org.apache.spark.rdd.RDD
    +
    +/**
    + * :: Experimental ::
    + *
    + * An LDAOptimizer contains an algorithm for LDA and performs the actual computation, which
    + * stores internal data structure (Graph or Matrix) and any other parameter for the algorithm.
    + * The interface is isolated to improve the extensibility of LDA.
    + */
    +@Experimental
    +trait LDAOptimizer{
    +
    +  /**
    +   * Initializer for the optimizer. LDA passes the common parameters to the optimizer and
    +   * the internal structure can be initialized properly.
    +   */
    +  private[clustering] def initialState(
    +      docs: RDD[(Long, Vector)],
    +      k: Int,
    +      docConcentration: Double,
    +      topicConcentration: Double,
    +      randomSeed: Long,
    +      checkpointInterval: Int): LDAOptimizer
    +
    +  private[clustering] def next(): LDAOptimizer
    +
    +  private[clustering] def getLDAModel(iterationTimes: Array[Double]): LDAModel
    +}
    +
    +/**
    + * :: Experimental ::
    + *
    + * Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.
    + *
    + * Currently, the underlying implementation uses Expectation-Maximization (EM), implemented
    + * according to the Asuncion et al. (2009) paper referenced below.
    + *
    + * References:
    + *  - Original LDA paper (journal version):
    + *    Blei, Ng, and Jordan.  "Latent Dirichlet Allocation."  JMLR, 2003.
    + *     - This class implements their "smoothed" LDA model.
    + *  - Paper which clearly explains several algorithms, including EM:
    + *    Asuncion, Welling, Smyth, and Teh.
    + *    "On Smoothing and Inference for Topic Models."  UAI, 2009.
    + *
    + */
    +@Experimental
    +class EMLDAOptimizer extends LDAOptimizer{
    +
    +  import LDA._
    +  /**
    --- End diff --
    
    newline between this and previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96768999
  
      [Test build #30985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30985/consoleFull) for   PR 5661 at commit [`0e2e006`](https://github.com/apache/spark/commit/0e2e006645dda48d0b8c6eb32c5509a5694bc3d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96850841
  
    No, I'd just wait for the test.  I think that previous test was cancelled, so I'll start a new one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96761718
  
      [Test build #720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/720/consoleFull) for   PR 5661 at commit [`0e2e006`](https://github.com/apache/spark/commit/0e2e006645dda48d0b8c6eb32c5509a5694bc3d9).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait LDAOptimizer`
      * `class EMLDAOptimizer extends LDAOptimizer`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116118
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala ---
    @@ -42,17 +37,6 @@ import org.apache.spark.util.Utils
      *  - "token": instance of a term appearing in a document
      *  - "topic": multinomial distribution over words representing some concept
      *
    - * Currently, the underlying implementation uses Expectation-Maximization (EM), implemented
    - * according to the Asuncion et al. (2009) paper referenced below.
    - *
    - * References:
    - *  - Original LDA paper (journal version):
    --- End diff --
    
    This 1 reference should stay here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95622405
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30843/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116119
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala ---
    @@ -68,6 +52,8 @@ class LDA private (
       def this() = this(k = 10, maxIterations = 20, docConcentration = -1, topicConcentration = -1,
         seed = Utils.random.nextLong(), checkpointInterval = 10)
     
    +  private var ldaOptimizer: LDAOptimizer = getDefaultOptimizer("EM")
    --- End diff --
    
    I'd list this with the other private vars above and initialize it in the default constructor this()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29168956
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -0,0 +1,210 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.clustering
    +
    +import java.util.Random
    +
    +import breeze.linalg.{DenseVector => BDV, normalize}
    +
    +import org.apache.spark.annotation.Experimental
    +import org.apache.spark.graphx._
    +import org.apache.spark.graphx.impl.GraphImpl
    +import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
    +import org.apache.spark.mllib.linalg.Vector
    +import org.apache.spark.rdd.RDD
    +
    +/**
    + * :: Experimental ::
    + *
    + * An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can
    + * hold optimizer-specific parameters for users to set.
    + */
    +@Experimental
    +trait LDAOptimizer{
    --- End diff --
    
    still need space: ```LDAOptimizer {```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96750474
  
    @hhbyyh  Thanks for the updates!  I made a few small comments, but can you please fix them in your next PR which adds OnlineLDA?  (That way, we can go ahead and merge this one.)
    
    LGTM pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96750654
  
    Btw, there have been some issues with Jenkins recently (not starting tests or posting results automatically)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by hhbyyh <gi...@git.apache.org>.

Github user hhbyyh commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96869991
  
    The test has finished, yet it's not posting to github


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96450719
  
    @hhbyyh Thanks for the PR!  It looks good, except for 1 item on which I think we weren't clear before:
    
    I meant for us to separate the Optimizer and LearningState concepts.
    * Optimizer should be a class which stores parameters and not much else. Optimizer.initialState should return an instance of a LearningState class.
    * LearningState should have the next() and getModel() methods.
    
    Could you please refactor according to that?  It should only require moving code some, but I think it will help clarify the distinction between the parameters and the learning state.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29168965
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -0,0 +1,210 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.clustering
    +
    +import java.util.Random
    +
    +import breeze.linalg.{DenseVector => BDV, normalize}
    +
    +import org.apache.spark.annotation.Experimental
    +import org.apache.spark.graphx._
    +import org.apache.spark.graphx.impl.GraphImpl
    +import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
    +import org.apache.spark.mllib.linalg.Vector
    +import org.apache.spark.rdd.RDD
    +
    +/**
    + * :: Experimental ::
    + *
    + * An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can
    + * hold optimizer-specific parameters for users to set.
    + */
    +@Experimental
    +trait LDAOptimizer{
    +
    +  /*
    +    DEVELOPERS NOTE:
    +
    +    An LDAOptimizer contains an algorithm for LDA and performs the actual computation, which
    +    stores internal data structure (Graph or Matrix) and other parameters for the algorithm.
    +    The interface is isolated to improve the extensibility of LDA.
    +   */
    +
    +  /**
    +   * Initializer for the optimizer. LDA passes the common parameters to the optimizer and
    +   * the internal structure can be initialized properly.
    +   */
    +  private[clustering] def initialState(
    +      docs: RDD[(Long, Vector)],
    +      k: Int,
    +      docConcentration: Double,
    +      topicConcentration: Double,
    +      randomSeed: Long,
    +      checkpointInterval: Int): LDAOptimizer
    +
    +  private[clustering] def next(): LDAOptimizer
    +
    +  private[clustering] def getLDAModel(iterationTimes: Array[Double]): LDAModel
    +}
    +
    +/**
    + * :: Experimental ::
    + *
    + * Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.
    + *
    + * Currently, the underlying implementation uses Expectation-Maximization (EM), implemented
    + * according to the Asuncion et al. (2009) paper referenced below.
    + *
    + * References:
    + *  - Original LDA paper (journal version):
    + *    Blei, Ng, and Jordan.  "Latent Dirichlet Allocation."  JMLR, 2003.
    + *     - This class implements their "smoothed" LDA model.
    + *  - Paper which clearly explains several algorithms, including EM:
    + *    Asuncion, Welling, Smyth, and Teh.
    + *    "On Smoothing and Inference for Topic Models."  UAI, 2009.
    + *
    + */
    +@Experimental
    +class EMLDAOptimizer extends LDAOptimizer{
    +
    +  import LDA._
    +
    +  /**
    +   * Following fields will only be initialized through initialState method
    +   */
    +  private[clustering] var graph: Graph[TopicCounts, TokenCount] = null
    +  private[clustering] var k: Int = 0
    +  private[clustering] var vocabSize: Int = 0
    +  private[clustering] var docConcentration: Double = 0
    +  private[clustering] var topicConcentration: Double = 0
    +  private[clustering] var checkpointInterval: Int = 10
    +  private var graphCheckpointer: PeriodicGraphCheckpointer[TopicCounts, TokenCount] = null
    +
    +  /**
    +   * Compute bipartite term/doc graph.
    +   */
    +  private[clustering] override def initialState(
    --- End diff --
    
    rename: ```initialize```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116126
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala ---
    @@ -220,6 +206,38 @@ class LDA private (
         this
       }
     
    +
    +  /** LDAOptimizer used to perform the actual calculation */
    +  def getOptimizer(): LDAOptimizer = ldaOptimizer
    +
    +  /**
    +   * LDAOptimizer used to perform the actual calculation (default = EMLDAOptimizer)
    +   */
    +  def setOptimizer(optimizer: LDAOptimizer): this.type = {
    +    this.ldaOptimizer = optimizer
    +    this
    +  }
    +
    +  /**
    +   * Set the LDAOptimizer used to perform the actual calculation by algorithm name.
    +   * Currently "EM" is supported.
    +   */
    +  def setOptimizer(optimizerName: String): this.type = {
    +    this.ldaOptimizer = getDefaultOptimizer(optimizerName)
    +    this
    +  }
    +
    +  /**
    +   * Get the default optimizer from String parameter.
    +   */
    +  private def getDefaultOptimizer(optimizerName: String): LDAOptimizer = {
    +    optimizerName match{
    +      case "EM" => new EMLDAOptimizer()
    +      case other =>
    +        throw new UnsupportedOperationException(s"Only EM are supported but got $other.")
    --- End diff --
    
    Should be IllegalArgumentException
    type: "EM are" --> "EM is"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95559173
  
      [Test build #30836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30836/consoleFull) for   PR 5661 at commit [`0bb8400`](https://github.com/apache/spark/commit/0bb8400e70011c8f97ece31d395a8c75b15bab4f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116130
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -0,0 +1,201 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.clustering
    +
    +import java.util.Random
    +
    +import breeze.linalg.{DenseVector => BDV, normalize}
    +import org.apache.spark.annotation.Experimental
    +import org.apache.spark.graphx._
    +import org.apache.spark.graphx.impl.GraphImpl
    +import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
    +import org.apache.spark.mllib.linalg.Vector
    +import org.apache.spark.rdd.RDD
    +
    +/**
    + * :: Experimental ::
    + *
    + * An LDAOptimizer contains an algorithm for LDA and performs the actual computation, which
    --- End diff --
    
    This explanation is only needed for developers.  To users, an LDAOptimizer is a class holding parameters.  I'd state something like: "An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can hold optimizer-specific parameters for users to set."
    
    I'd keep the text you wrote here but copy it to an internal comment (not for the Scala/Java doc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by hhbyyh <gi...@git.apache.org>.

Github user hhbyyh commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96497611
  
    Thanks @jkbradley. I think Optimizer is simpler and provide sufficient flexibility for now. I made some changes according to other comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by hhbyyh <gi...@git.apache.org>.

Github user hhbyyh commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96454511
  
    @jkbradley Thanks for the review. I'd send update according to the suggestions soon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116125
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala ---
    @@ -220,6 +206,38 @@ class LDA private (
         this
       }
     
    +
    +  /** LDAOptimizer used to perform the actual calculation */
    +  def getOptimizer(): LDAOptimizer = ldaOptimizer
    +
    +  /**
    +   * LDAOptimizer used to perform the actual calculation (default = EMLDAOptimizer)
    +   */
    +  def setOptimizer(optimizer: LDAOptimizer): this.type = {
    +    this.ldaOptimizer = optimizer
    +    this
    +  }
    +
    +  /**
    +   * Set the LDAOptimizer used to perform the actual calculation by algorithm name.
    +   * Currently "EM" is supported.
    +   */
    +  def setOptimizer(optimizerName: String): this.type = {
    +    this.ldaOptimizer = getDefaultOptimizer(optimizerName)
    +    this
    +  }
    +
    +  /**
    +   * Get the default optimizer from String parameter.
    +   */
    +  private def getDefaultOptimizer(optimizerName: String): LDAOptimizer = {
    +    optimizerName match{
    --- End diff --
    
    style: space before brace: ```match {```  (Please correct this elsewhere too.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5661


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96448979
  
    Sorry for the delay!  I'll review the PR now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29168959
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -0,0 +1,210 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.clustering
    +
    +import java.util.Random
    +
    +import breeze.linalg.{DenseVector => BDV, normalize}
    +
    +import org.apache.spark.annotation.Experimental
    +import org.apache.spark.graphx._
    +import org.apache.spark.graphx.impl.GraphImpl
    +import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
    +import org.apache.spark.mllib.linalg.Vector
    +import org.apache.spark.rdd.RDD
    +
    +/**
    + * :: Experimental ::
    + *
    + * An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can
    + * hold optimizer-specific parameters for users to set.
    + */
    +@Experimental
    +trait LDAOptimizer{
    +
    +  /*
    +    DEVELOPERS NOTE:
    +
    +    An LDAOptimizer contains an algorithm for LDA and performs the actual computation, which
    +    stores internal data structure (Graph or Matrix) and other parameters for the algorithm.
    +    The interface is isolated to improve the extensibility of LDA.
    +   */
    +
    +  /**
    +   * Initializer for the optimizer. LDA passes the common parameters to the optimizer and
    +   * the internal structure can be initialized properly.
    +   */
    +  private[clustering] def initialState(
    +      docs: RDD[(Long, Vector)],
    +      k: Int,
    +      docConcentration: Double,
    +      topicConcentration: Double,
    +      randomSeed: Long,
    +      checkpointInterval: Int): LDAOptimizer
    +
    +  private[clustering] def next(): LDAOptimizer
    +
    +  private[clustering] def getLDAModel(iterationTimes: Array[Double]): LDAModel
    +}
    +
    +/**
    + * :: Experimental ::
    + *
    + * Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.
    + *
    + * Currently, the underlying implementation uses Expectation-Maximization (EM), implemented
    + * according to the Asuncion et al. (2009) paper referenced below.
    + *
    + * References:
    + *  - Original LDA paper (journal version):
    + *    Blei, Ng, and Jordan.  "Latent Dirichlet Allocation."  JMLR, 2003.
    + *     - This class implements their "smoothed" LDA model.
    + *  - Paper which clearly explains several algorithms, including EM:
    + *    Asuncion, Welling, Smyth, and Teh.
    + *    "On Smoothing and Inference for Topic Models."  UAI, 2009.
    + *
    + */
    +@Experimental
    +class EMLDAOptimizer extends LDAOptimizer{
    --- End diff --
    
    need space here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by hhbyyh <gi...@git.apache.org>.

Github user hhbyyh commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95559175
  
    Oh Thanks, I closed 7089 just now...  Can I just use 7090? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-95561190
  
      [Test build #30836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30836/consoleFull) for   PR 5661 at commit [`0bb8400`](https://github.com/apache/spark/commit/0bb8400e70011c8f97ece31d395a8c75b15bab4f).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait LDAOptimizer`
      * `class EMLDAOptimizer extends LDAOptimizer`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96731457
  
      [Test build #720 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/720/consoleFull) for   PR 5661 at commit [`0e2e006`](https://github.com/apache/spark/commit/0e2e006645dda48d0b8c6eb32c5509a5694bc3d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5661#discussion_r29116122
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala ---
    @@ -220,6 +206,38 @@ class LDA private (
         this
       }
     
    +
    +  /** LDAOptimizer used to perform the actual calculation */
    +  def getOptimizer(): LDAOptimizer = ldaOptimizer
    --- End diff --
    
    No parentheses are needed for a getter method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96875369
  
    OK, I'll merge it into master.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5661#issuecomment-96850902
  
      [Test build #724 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/724/consoleFull) for   PR 5661 at commit [`0e2e006`](https://github.com/apache/spark/commit/0e2e006645dda48d0b8c6eb32c5509a5694bc3d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org