You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hao Ren (JIRA)" <ji...@apache.org> on 2016/11/24 17:33:59 UTC

[jira] [Created] (SPARK-18581) MultivariateGaussian returns pdf value larger than 1

Hao Ren created SPARK-18581:
-------------------------------

             Summary: MultivariateGaussian returns pdf value larger than 1
                 Key: SPARK-18581
                 URL: https://issues.apache.org/jira/browse/SPARK-18581
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 2.0.2, 1.6.2
            Reporter: Hao Ren


When training GaussianMixtureModel, I found some probability much larger than 1. That leads me to that fact that, the value returned by MultivariateGaussian.pdf can be 10^5, etc.

After reviewing the code, I found that problem lies in the computation of determinant of the covariance matrix.

The computation is simplified by using pseudo-determinant of a positive defined matrix. However, if the eigen value is all between 0 and 1, log(pseudo-determinant) will be a negative number like,  -50. As a result, the logpdf becomes positive (pdf > 1)

The related code that the following:

// In function: MultivariateGaussian.calculateCovarianceConstants()

{code}
      val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
{code}

Maybe we should just use the breeze 'det' opertion on sigma to get the right but slow answer instead of a quick, wrong one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org