You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hao Ren (JIRA)" <ji...@apache.org> on 2016/11/25 08:43:58 UTC
[jira] [Updated] (SPARK-18581) MultivariateGaussian returns pdf
value larger than 1
[ https://issues.apache.org/jira/browse/SPARK-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hao Ren updated SPARK-18581:
----------------------------
Description:
When training GaussianMixtureModel, I found some probability much larger than 1. That leads me to that fact that, the value returned by MultivariateGaussian.pdf can be 10^5, etc.
After reviewing the code, I found that problem lies in the computation of determinant of the covariance matrix.
The computation is simplified by using pseudo-determinant of a positive defined matrix. However, if the eigen value is all between 0 and 1, log(pseudo-determinant) will be a negative number like, -50. As a result, the logpdf becomes positive (pdf > 1)
The related code that the following:
// In function: MultivariateGaussian.calculateCovarianceConstants()
{code}
val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
{code}
d is the eigen value vector here. If lots of its elements are between 0 and 1, then logPseudoDetSigma could be negative.
Maybe we should just use the breeze 'det' opertion on sigma to get the right but slow answer instead of a quick, wrong one.
was:
When training GaussianMixtureModel, I found some probability much larger than 1. That leads me to that fact that, the value returned by MultivariateGaussian.pdf can be 10^5, etc.
After reviewing the code, I found that problem lies in the computation of determinant of the covariance matrix.
The computation is simplified by using pseudo-determinant of a positive defined matrix. However, if the eigen value is all between 0 and 1, log(pseudo-determinant) will be a negative number like, -50. As a result, the logpdf becomes positive (pdf > 1)
The related code that the following:
// In function: MultivariateGaussian.calculateCovarianceConstants()
{code}
val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
{code}
Maybe we should just use the breeze 'det' opertion on sigma to get the right but slow answer instead of a quick, wrong one.
> MultivariateGaussian returns pdf value larger than 1
> ----------------------------------------------------
>
> Key: SPARK-18581
> URL: https://issues.apache.org/jira/browse/SPARK-18581
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.6.2, 2.0.2
> Reporter: Hao Ren
>
> When training GaussianMixtureModel, I found some probability much larger than 1. That leads me to that fact that, the value returned by MultivariateGaussian.pdf can be 10^5, etc.
> After reviewing the code, I found that problem lies in the computation of determinant of the covariance matrix.
> The computation is simplified by using pseudo-determinant of a positive defined matrix. However, if the eigen value is all between 0 and 1, log(pseudo-determinant) will be a negative number like, -50. As a result, the logpdf becomes positive (pdf > 1)
> The related code that the following:
> // In function: MultivariateGaussian.calculateCovarianceConstants()
> {code}
> val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
> {code}
> d is the eigen value vector here. If lots of its elements are between 0 and 1, then logPseudoDetSigma could be negative.
> Maybe we should just use the breeze 'det' opertion on sigma to get the right but slow answer instead of a quick, wrong one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org