You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yuhao yang (JIRA)" <ji...@apache.org> on 2016/01/18 10:25:39 UTC

[jira] [Created] (SPARK-12875) Add Weight of Evidence and Information value to Spark.ml as a feature transformer

yuhao yang created SPARK-12875:
----------------------------------

             Summary: Add Weight of Evidence and Information value to Spark.ml as a feature transformer
                 Key: SPARK-12875
                 URL: https://issues.apache.org/jira/browse/SPARK-12875
             Project: Spark
          Issue Type: New Feature
          Components: ML
            Reporter: yuhao yang
            Priority: Minor


As a feature transformer, WOE and IV enable one to:

Consider each variable’s independent contribution to the outcome.
Detect linear and non-linear relationships.
Rank variables in terms of "univariate" predictive strength.
Visualize the correlations between the predictive variables and the binary outcome.

http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/ gives a good introduction to WoE and IV.

 The Weight of Evidence or WoE value provides a measure of how well a grouping of feature is able to distinguish between a binary response (e.g. "good" versus "bad"), which is widely used in grouping continuous feature or mapping categorical features to continuous values. It is computed from the basic odds ratio:
(Distribution of positive Outcomes) / (Distribution of negative Outcomes)
where Distr refers to the proportion of positive or negative in the respective group, relative to the column totals.

The WoE recoding of features is particularly well suited for subsequent modeling using Logistic Regression or MLP.

In addition, the information value or IV can be computed based on WoE, which is a popular technique to select variables in a predictive model.

TODO: Currently we support only calculation for categorical features. Add an estimator to estimate the proper grouping for continuous feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org