You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Jake Mannix <ja...@gmail.com> on 2011/11/27 21:37:25 UTC

Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

See MAHOUT-897


This addresses bug MAHOUT-897.
    https://issues.apache.org/jira/browse/MAHOUT-897


Diffs
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 
  trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
  trunk/src/conf/driver.classes.props 1206835 

Diff: https://reviews.apache.org/r/2944/diff


Testing
-------

mvn clean test


Thanks,

Jake


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Jake Mannix <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/
-----------------------------------------------------------

(Updated 2011-12-02 20:49:52.055735)


Review request for mahout and Ted Dunning.


Changes
-------

Updates to VectorDumper and VectorHelper


Summary
-------

See MAHOUT-897


This addresses bug MAHOUT-897.
    https://issues.apache.org/jira/browse/MAHOUT-897


Diffs (updated)
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1209684 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1209684 
  trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1209684 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1209684 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION 
  trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 1209684 
  trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 1209684 
  trunk/integration/src/test/java/org/apache/mahout/utils/vectors/VectorHelperTest.java PRE-CREATION 
  trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java 1209684 
  trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java 1209684 
  trunk/src/conf/driver.classes.props 1209684 

Diff: https://reviews.apache.org/r/2944/diff


Testing
-------

mvn clean test


Thanks,

Jake


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Jake Mannix <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/
-----------------------------------------------------------

(Updated 2011-12-01 03:44:25.140987)


Review request for mahout and Ted Dunning.


Changes
-------

VectorDumper becomes a "top-terms" dumper as well.


Summary
-------

See MAHOUT-897


This addresses bug MAHOUT-897.
    https://issues.apache.org/jira/browse/MAHOUT-897


Diffs (updated)
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208933 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208933 
  trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208933 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208933 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION 
  trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 1208933 
  trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 1208933 
  trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java 1208933 
  trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java 1208933 
  trunk/src/conf/driver.classes.props 1208933 

Diff: https://reviews.apache.org/r/2944/diff


Testing
-------

mvn clean test


Thanks,

Jake


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Jake Mannix <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/
-----------------------------------------------------------

(Updated 2011-11-30 18:42:21.747592)


Review request for mahout and Ted Dunning.


Changes
-------

More license headers, and some javadocs.

Also: factor out an "LDASampler" which builds a sampler based on a Matrix representation of a topic model.  


Summary
-------

See MAHOUT-897


This addresses bug MAHOUT-897.
    https://issues.apache.org/jira/browse/MAHOUT-897


Diffs (updated)
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294 
  trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION 
  trunk/src/conf/driver.classes.props 1208294 

Diff: https://reviews.apache.org/r/2944/diff


Testing
-------

mvn clean test


Thanks,

Jake


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Jake Mannix <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/
-----------------------------------------------------------

(Updated 2011-11-30 08:58:25.710229)


Review request for mahout and Ted Dunning.


Changes
-------

Adds appropriate ASL headers to new files, adds a bunch of nice javadocs, some TODOs to clean up some detritus, moves MemoryUtil to a more common location.

Could use some more review if anyone has an urge, but otherwise this code is ready to go.  More updates can come in the future.  And more docs, etc.


Summary
-------

See MAHOUT-897


This addresses bug MAHOUT-897.
    https://issues.apache.org/jira/browse/MAHOUT-897


Diffs (updated)
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294 
  trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION 
  trunk/src/conf/driver.classes.props 1208294 

Diff: https://reviews.apache.org/r/2944/diff


Testing
-------

mvn clean test


Thanks,

Jake


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Jake Mannix <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/
-----------------------------------------------------------

(Updated 2011-11-30 07:35:21.576144)


Review request for mahout and Ted Dunning.


Changes
-------

addressing Ted's comments


Summary
-------

See MAHOUT-897


This addresses bug MAHOUT-897.
    https://issues.apache.org/jira/browse/MAHOUT-897


Diffs (updated)
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294 
  trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294 
  trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION 
  trunk/src/conf/driver.classes.props 1208294 

Diff: https://reviews.apache.org/r/2944/diff


Testing
-------

mvn clean test


Thanks,

Jake


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Jake Mannix <ja...@gmail.com>.

> On 2011-11-28 07:54:29, Ted Dunning wrote:
> > Generally this looks like pretty clean code.  Some more comments about intent would be nice.
> > 
> > My review so far is very superficial.

I'm pretty blind to places which need more docs, as it all does, and I know the code.  If you could point out places most in need of docs, I'll know where to start. :)


> On 2011-11-28 07:54:29, Ted Dunning wrote:
> > trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java, line 217
> > <https://reviews.apache.org/r/2944/diff/1/?file=60151#file60151line217>
> >
> >     Why return double?  Main ignores this.

Because any other program running this (currently: just the unit test I added) may want to know what the final converged perplexity was, so now it's available from the run() call.


> On 2011-11-28 07:54:29, Ted Dunning wrote:
> > trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java, line 103
> > <https://reviews.apache.org/r/2944/diff/1/?file=60153#file60153line103>
> >
> >     I think convention is @override on a seaparate line.

Ah yes, I'll fix that.


> On 2011-11-28 07:54:29, Ted Dunning wrote:
> > trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java, line 9
> > <https://reviews.apache.org/r/2944/diff/1/?file=60164#file60164line9>
> >
> >     Javadoc here would be nice.  Why is this sampler different from samplers we already have?
> >     
> >     Also, I don't see test code for this.

I'll add documentation like follows:

/**
   * Samples from a given discrete distribution: you provide a source of randomness and a Vector (cardinality N) which describes a distribution over [0,N), and calls to sample() sample from 0 to N 
   * using this distribution
   */

Do we already have a sampler which does this?

I can add tests, good point.


> On 2011-11-28 07:54:29, Ted Dunning wrote:
> > trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java, line 58
> > <https://reviews.apache.org/r/2944/diff/1/?file=60164#file60164line58>
> >
> >     So this looks like a multinomial sampler.  Why not fit it into what already exists?

Point me to the class!


- Jake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/#review3531
-----------------------------------------------------------


On 2011-11-27 20:37:25, Jake Mannix wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2944/
> -----------------------------------------------------------
> 
> (Updated 2011-11-27 20:37:25)
> 
> 
> Review request for mahout.
> 
> 
> Summary
> -------
> 
> See MAHOUT-897
> 
> 
> This addresses bug MAHOUT-897.
>     https://issues.apache.org/jira/browse/MAHOUT-897
> 
> 
> Diffs
> -----
> 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 
>   trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
>   trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 
>   trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 
>   trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
>   trunk/src/conf/driver.classes.props 1206835 
> 
> Diff: https://reviews.apache.org/r/2944/diff
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Jake
> 
>


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Ted Dunning <td...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/#review3531
-----------------------------------------------------------


Generally this looks like pretty clean code.  Some more comments about intent would be nice.

My review so far is very superficial.


trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java
<https://reviews.apache.org/r/2944/#comment7864>

    Why return double?  Main ignores this.



trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java
<https://reviews.apache.org/r/2944/#comment7865>

    I think convention is @override on a seaparate line.



trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java
<https://reviews.apache.org/r/2944/#comment7866>

    This comment is very confusing.



trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java
<https://reviews.apache.org/r/2944/#comment7867>

    Javadoc here would be nice.  Why is this sampler different from samplers we already have?
    
    Also, I don't see test code for this.



trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java
<https://reviews.apache.org/r/2944/#comment7868>

    So this looks like a multinomial sampler.  Why not fit it into what already exists?


- Ted


On 2011-11-27 20:37:25, Jake Mannix wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2944/
> -----------------------------------------------------------
> 
> (Updated 2011-11-27 20:37:25)
> 
> 
> Review request for mahout.
> 
> 
> Summary
> -------
> 
> See MAHOUT-897
> 
> 
> This addresses bug MAHOUT-897.
>     https://issues.apache.org/jira/browse/MAHOUT-897
> 
> 
> Diffs
> -----
> 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 
>   trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
>   trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 
>   trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 
>   trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
>   trunk/src/conf/driver.classes.props 1206835 
> 
> Diff: https://reviews.apache.org/r/2944/diff
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Jake
> 
>


Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

Posted by Jake Mannix <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2944/#review3538
-----------------------------------------------------------



trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java
<https://reviews.apache.org/r/2944/#comment7875>

    Indeed.  Will update with something more descriptive.


- Jake


On 2011-11-27 20:37:25, Jake Mannix wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2944/
> -----------------------------------------------------------
> 
> (Updated 2011-11-27 20:37:25)
> 
> 
> Review request for mahout.
> 
> 
> Summary
> -------
> 
> See MAHOUT-897
> 
> 
> This addresses bug MAHOUT-897.
>     https://issues.apache.org/jira/browse/MAHOUT-897
> 
> 
> Diffs
> -----
> 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 
>   trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION 
>   trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION 
>   trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 
>   trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 
>   trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION 
>   trunk/src/conf/driver.classes.props 1206835 
> 
> Diff: https://reviews.apache.org/r/2944/diff
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Jake
> 
>