You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Seth Hendrickson (JIRA)" <ji...@apache.org> on 2016/04/14 00:00:27 UTC
[jira] [Created] (SPARK-14610) Remove superfluous split from random
forest findSplitsForContinousFeature
Seth Hendrickson created SPARK-14610:
----------------------------------------
Summary: Remove superfluous split from random forest findSplitsForContinousFeature
Key: SPARK-14610
URL: https://issues.apache.org/jira/browse/SPARK-14610
Project: Spark
Issue Type: Improvement
Components: ML
Reporter: Seth Hendrickson
Currently, the method findSplitsForContinuousFeature in random forest produces an unnecessary split. For example, if a continuous feature has unique values: {1, 2, 3}, then the possible splits generated by this method are:
{1|2,3}, {1,2|3} and {1,2,3|}. The following unit test is quite clearly incorrect:
{code:title=rf.scala|borderStyle=solid}
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
assert(splits.length === 3)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org