You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Seth Hendrickson (JIRA)" <ji...@apache.org> on 2017/01/20 19:13:26 UTC
[jira] [Created] (SPARK-19313) GaussianMixture throws cryptic error
when number of features is too high
Seth Hendrickson created SPARK-19313:
----------------------------------------
Summary: GaussianMixture throws cryptic error when number of features is too high
Key: SPARK-19313
URL: https://issues.apache.org/jira/browse/SPARK-19313
Project: Spark
Issue Type: Bug
Components: ML, MLlib
Reporter: Seth Hendrickson
Priority: Minor
The following fails
{code}
val df = Seq(
Vectors.sparse(46400, Array(0, 4), Array(3.0, 8.0)),
Vectors.sparse(46400, Array(1, 5), Array(4.0, 9.0)))
.map(Tuple1.apply).toDF("features")
val gm = new GaussianMixture()
gm.fit(df)
{code}
It fails because GMMs allocate an array of size {{numFeatures * numFeatures}} and in this case we'll get integer overflow. We should limit the number of features appropriately.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org