You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gilles (JIRA)" <ji...@apache.org> on 2012/11/10 04:45:13 UTC

[jira] [Commented] (MATH-817) Multivariate Normal Mixture Model Fitting by Expectation Maximization

    [ https://issues.apache.org/jira/browse/MATH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494552#comment-13494552 ] 

Gilles commented on MATH-817:
-----------------------------

Quite a few remarks, in no particular order.

In "MultivariateNormalMixtureExpectationMaximizationFitter", wouldn't it be possible to pass the distributions as arguments to the constructor (similar to what was eventually done for "MixtureMultivariateRealDistribution")?

"(Math)Arrays.copyOf" is preferred over "System.arraycopy".

"equals" in "MultivariateNormalDistribution":
* Unneeded checks (means and covariance matrix arrays cannot be null)
* Comparing entries of the sampling matrix is perhaps enough to establish the equivalence relation (?)

Why all the
{noformat}
@SuppressWarnings("unused")
{noformat}
in "MultivariateNormalMixtureExpectationMaximizationFitterTest"?

"MathArrays.copyOf" is preferred over "System.arraycopy".

It would be better to use "RealVector" and "RealMatrix" objects (in place of 1D and 2D arrays) whenever appropriate. Their methods can make the code tighter (e.g. avoiding many explicit loops).

At first sight, I don't think that defining "equals" and "hashCode" should be mandatory (i.e. those methods should not be declared in the abstract base class).

There is some odd code formatting.

Why two RNG? Wouldn't be clearer to define a list of "NormalDistribution" objects from which to sample (instead of using "nextGaussian")?

Documentation is missing for the "@param" and "@return" tags.

The "stillFitting" flag could be avoided by using a label on the outer loop (in method "fit").

"catch (Exception e)" is annoying.

I wonder whether the class would be more flexible if the "data" is not passed to the constructor: the "fit" method could be "public" and would return the fitted mixture.

Mixing data with different meanings in the same array is not a good idea:
{code}
final double[][] unsortedData = new double[data.length][numDataColumns + 1];
{code}
(where the entry at index "numDataColumns" will contain the mean of the values at index 0 .. numDataColumns - 1). If you want to associate different data, put them in a dedicated class e.g.:
{code}
private static class DataRow implements Comparable<DataRow> {
  private double[] row;
  private Double mean;

  DataRow(double[] data) {
    // Store reference.
    row = data;
    // Compute mean.
    mean = ... ;
  }

  public int compareTo(DataRow other) {
    return mean.compareTo(other.mean);
  }
}
{code}
Then, the above line would become:
{code}
final DataRow[] unsortedData = new DataRow[data.length];
{code}
and you can use "Arrays.sort" without specifying a "Comparator" and without copying data back and forth.

IIUC, the "fit" can return quietly from satisfying either of two criteria:
* convergence threshold
* number of iterations

but no exception is thrown to signal that the fit does not meet the convergence criterion.

                
> Multivariate Normal Mixture Model Fitting by Expectation Maximization
> ---------------------------------------------------------------------
>
>                 Key: MATH-817
>                 URL: https://issues.apache.org/jira/browse/MATH-817
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Jared Becksfort
>            Priority: Minor
>         Attachments: AbstractMultivariateRealDistribution.java.patch, MixtureMultivariateRealDistribution.java.patch, MultivariateNormalDistribution.java.patch, MultivariateNormalMixtureExpectationMaximizationFitter.java, MultivariateNormalMixtureExpectationMaximizationFitterTest.java
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I will submit a class for fitting Multivariate Normal Mixture Models using Expectation Maximization.
> > Hello,
> >
> > I have implemented some classes for multivariate Normal distributions, multivariate normal mixture models, and an expectation maximization fitting class for the mixture model.  I would like to submit it to Apache Commons Math.  I still have some touching up to do so that they fit the style guidelines and implement the correct interfaces.  Before I do so, I thought I would at least ask if the developers of the project are interested in me submitting them.
> >
> > Thanks,
> > Jared Becksfort
> Dear Jared,
> Yes, that would be very nice to have such an addition! Remember to also include unit tests (refer to the current ones for examples). The best would be to split a submission up into multiple minor ones, each covering a natural submission (e.g. multivariate Normal distribution in one submission), and create an issue as described at http://commons.apache.org/math/issue-tracking.html .
> If you run into any problems, please do not hesitate to ask on this mailing list.
> Cheers, Mikkel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira