You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hao Ren (JIRA)" <ji...@apache.org> on 2016/11/24 17:33:59 UTC
[jira] [Created] (SPARK-18581) MultivariateGaussian returns pdf
value larger than 1
Hao Ren created SPARK-18581:
-------------------------------
Summary: MultivariateGaussian returns pdf value larger than 1
Key: SPARK-18581
URL: https://issues.apache.org/jira/browse/SPARK-18581
Project: Spark
Issue Type: Bug
Components: MLlib
Affects Versions: 2.0.2, 1.6.2
Reporter: Hao Ren
When training GaussianMixtureModel, I found some probability much larger than 1. That leads me to that fact that, the value returned by MultivariateGaussian.pdf can be 10^5, etc.
After reviewing the code, I found that problem lies in the computation of determinant of the covariance matrix.
The computation is simplified by using pseudo-determinant of a positive defined matrix. However, if the eigen value is all between 0 and 1, log(pseudo-determinant) will be a negative number like, -50. As a result, the logpdf becomes positive (pdf > 1)
The related code that the following:
// In function: MultivariateGaussian.calculateCovarianceConstants()
{code}
val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
{code}
Maybe we should just use the breeze 'det' opertion on sigma to get the right but slow answer instead of a quick, wrong one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org