You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Seth Hendrickson (JIRA)" <ji...@apache.org> on 2017/01/20 19:13:26 UTC

[jira] [Created] (SPARK-19313) GaussianMixture throws cryptic error when number of features is too high

Seth Hendrickson created SPARK-19313:
----------------------------------------

             Summary: GaussianMixture throws cryptic error when number of features is too high
                 Key: SPARK-19313
                 URL: https://issues.apache.org/jira/browse/SPARK-19313
             Project: Spark
          Issue Type: Bug
          Components: ML, MLlib
            Reporter: Seth Hendrickson
            Priority: Minor


The following fails

{code}
    val df = Seq(
      Vectors.sparse(46400, Array(0, 4), Array(3.0, 8.0)),
      Vectors.sparse(46400, Array(1, 5), Array(4.0, 9.0)))
      .map(Tuple1.apply).toDF("features")
    val gm = new GaussianMixture()
    gm.fit(df)
{code}

It fails because GMMs allocate an array of size {{numFeatures * numFeatures}} and in this case we'll get integer overflow. We should limit the number of features appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org