You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Siddharth Murching (JIRA)" <ji...@apache.org> on 2017/08/18 07:49:00 UTC

[jira] [Comment Edited] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions

    [ https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131868#comment-16131868 ] 

Siddharth Murching edited comment on SPARK-21770 at 8/18/17 7:48 AM:
---------------------------------------------------------------------

Good question:

* Predictions on all-zero input don't change (they remain 0 for RandomForestClassifier and DecisionTreeClassifier, which are the only models that call normalizeToProbabilitiesInPlace())
* This proposal seeks to make predicted probabilities more interpretable when raw model output is all-zero
* Regardless, it currently seems impossible for normalizeToProbabilitiesInPlace to ever be called on all-zero input, since that'd mean a DecisionTree leaf node had a class count array (raw output) of all zeros.

More detail: both DecisionTreeClassifier and RandomForestClassifier inherit Classifier's [implementation of raw2prediction()|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala#L221], which just takes an argmax ([preferring earlier maximal entries|https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L176]) over the model's output vector. A raw model output of all-equal entries would result in a prediction of 0 either way.



was (Author: siddharth murching):
Good question:

* Predictions on all-zero input don't change (they remain 0 for RandomForestClassifier and DecisionTreeClassifier, which are the only models that call normalizeToProbabilitiesInPlace())
* This proposal seeks to make predicted probabilities more interpretable when raw model output is all-zero
* Regardless, it currently seems impossible for normalizeToProbabilitiesInPlace to ever be called on all-zero input, since that'd mean a DecisionTree leaf node had a class count array (raw output) of all zeros.

Specifically, both DecisionTreeClassifier and RandomForestClassifier inherit Classifier's [implementation of raw2prediction()|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala#L221], which just takes an argmax ([preferring earlier maximal entries|https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L176]) over the model's output vector. A raw model output of all-equal entries would result in a prediction of 0 either way.


> ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-21770
>                 URL: https://issues.apache.org/jira/browse/SPARK-21770
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Siddharth Murching
>            Priority: Minor
>
> Given an n-element raw prediction vector of all-zeros, ProbabilisticClassifierModel.normalizeToProbabilitiesInPlace() should output a probability vector of all-equal 1/n entries



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org