You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Seth Hendrickson (JIRA)" <ji...@apache.org> on 2016/04/14 00:00:27 UTC

[jira] [Created] (SPARK-14610) Remove superfluous split from random forest findSplitsForContinousFeature

Seth Hendrickson created SPARK-14610:
----------------------------------------

             Summary: Remove superfluous split from random forest findSplitsForContinousFeature
                 Key: SPARK-14610
                 URL: https://issues.apache.org/jira/browse/SPARK-14610
             Project: Spark
          Issue Type: Improvement
          Components: ML
            Reporter: Seth Hendrickson


Currently, the method findSplitsForContinuousFeature in random forest produces an unnecessary split. For example, if a continuous feature has unique values: {1, 2, 3}, then the possible splits generated by this method are:
{1|2,3}, {1,2|3} and {1,2,3|}. The following unit test is quite clearly incorrect:

{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
      val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
      assert(splits.length === 3)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org