You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Siddharth Murching (JIRA)" <ji...@apache.org> on 2017/08/18 07:49:00 UTC
[jira] [Comment Edited] (SPARK-21770)
ProbabilisticClassificationModel: Improve normalization of all-zero raw
predictions
[ https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131868#comment-16131868 ]
Siddharth Murching edited comment on SPARK-21770 at 8/18/17 7:48 AM:
---------------------------------------------------------------------
Good question:
* Predictions on all-zero input don't change (they remain 0 for RandomForestClassifier and DecisionTreeClassifier, which are the only models that call normalizeToProbabilitiesInPlace())
* This proposal seeks to make predicted probabilities more interpretable when raw model output is all-zero
* Regardless, it currently seems impossible for normalizeToProbabilitiesInPlace to ever be called on all-zero input, since that'd mean a DecisionTree leaf node had a class count array (raw output) of all zeros.
More detail: both DecisionTreeClassifier and RandomForestClassifier inherit Classifier's [implementation of raw2prediction()|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala#L221], which just takes an argmax ([preferring earlier maximal entries|https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L176]) over the model's output vector. A raw model output of all-equal entries would result in a prediction of 0 either way.
was (Author: siddharth murching):
Good question:
* Predictions on all-zero input don't change (they remain 0 for RandomForestClassifier and DecisionTreeClassifier, which are the only models that call normalizeToProbabilitiesInPlace())
* This proposal seeks to make predicted probabilities more interpretable when raw model output is all-zero
* Regardless, it currently seems impossible for normalizeToProbabilitiesInPlace to ever be called on all-zero input, since that'd mean a DecisionTree leaf node had a class count array (raw output) of all zeros.
Specifically, both DecisionTreeClassifier and RandomForestClassifier inherit Classifier's [implementation of raw2prediction()|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala#L221], which just takes an argmax ([preferring earlier maximal entries|https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L176]) over the model's output vector. A raw model output of all-equal entries would result in a prediction of 0 either way.
> ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions
> -----------------------------------------------------------------------------------
>
> Key: SPARK-21770
> URL: https://issues.apache.org/jira/browse/SPARK-21770
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.3.0
> Reporter: Siddharth Murching
> Priority: Minor
>
> Given an n-element raw prediction vector of all-zeros, ProbabilisticClassifierModel.normalizeToProbabilitiesInPlace() should output a probability vector of all-equal 1/n entries
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org