You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/08/15 05:40:21 UTC
[jira] [Assigned] (SPARK-17057) ProbabilisticClassifierModels'
prediction more reasonable with multi zero thresholds
[ https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-17057:
------------------------------------
Assignee: Apache Spark
> ProbabilisticClassifierModels' prediction more reasonable with multi zero thresholds
> ------------------------------------------------------------------------------------
>
> Key: SPARK-17057
> URL: https://issues.apache.org/jira/browse/SPARK-17057
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: zhengruifeng
> Assignee: Apache Spark
>
> {code}
> val path = "./data/mllib/sample_multiclass_classification_data.txt"
> val data = spark.read.format("libsvm").load(path)
> val rfm = rf.fit(data)
> scala> rfm.setThresholds(Array(0.0,0.0,0.0))
> res4: org.apache.spark.ml.classification.RandomForestClassificationModel = RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees
> scala> rfm.transform(data).show(5)
> +-----+--------------------+--------------+-------------+----------+
> |label| features| rawPrediction| probability|prediction|
> +-----+--------------------+--------------+-------------+----------+
> | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0|
> | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0|
> | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0|
> | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0|
> | 0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0|
> +-----+--------------------+--------------+-------------+----------+
> only showing top 5 rows
> {code}
> If multi thresholds are set zero, the prediction of {{ProbabilisticClassificationModel}} is the first index whose corresponding threshold is 0.
> However, in this case, the index with max {{probability}} among indices with 0-threshold should be more reasonable to mark as
> {{prediction}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org