You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "TinaLi (Jira)" <ji...@apache.org> on 2020/09/16 22:59:00 UTC
[jira] [Created] (SPARK-32904)
pyspark.mllib.evaluation.MulticlassMetrics needs to swap the results of
precision( ) and recall( )
TinaLi created SPARK-32904:
------------------------------
Summary: pyspark.mllib.evaluation.MulticlassMetrics needs to swap the results of precision( ) and recall( )
Key: SPARK-32904
URL: https://issues.apache.org/jira/browse/SPARK-32904
Project: Spark
Issue Type: Bug
Components: MLlib
Affects Versions: 3.0.1
Reporter: TinaLi
[https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/evaluation/MulticlassMetrics.html]
*The values returned by the precision() and recall() methods of this API should be swapped.*
Following is the example results I got when I run this API. It prints out precision
metrics = MulticlassMetrics(predictionAndLabels)
print (metrics.confusionMatrix().toArray())
print ("precision: ",metrics.precision(1))
print ("recall: ",metrics.recall(1))
[[36631. 2845.]
[ 3839. 1610.]]
precision: 0.3613916947250281
recall: 0.2954670581758121
predictions.select('prediction').agg(\{'prediction':'sum'}).show()
|sum(prediction)| 5449.0|
As you can see, my model predicted 5449 cases with label=1, and 1610 out of the 5449 cases are true positive, so precision should be 1610/5449=0.2954670581758121, but this API assigned the precision value to recall() method, which should be swapped.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org