You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by iyerr3 <gi...@git.apache.org> on 2018/02/06 16:37:15 UTC
[GitHub] madlib pull request #231: RF: Output non-negative importance values
GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/231
RF: Output non-negative importance values
Variable importance is computed in RF as the difference in prediction
accuracy between original data and permuted data from out-of-bag
samples (OOB). Permuted data is defined as each variable resampled from
its own distribution. This value can end up being negative if the number
of levels for a variable is small and is unbalanced, as the
redistribution doesn't change the data much. This commit shifts all the
importance values if some of them are negative to ensure that the lowest
importance value is 0.
Closes #231
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/iyerr3/incubator-madlib bugfix/rf_neg_var_imp
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/231.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #231
----
commit f4265854dd94899145c9b40d4ce77450f34bdd78
Author: Rahul Iyer <ri...@...>
Date: 2018-02-06T16:20:49Z
RF: Output non-negative importance values
Variable importance is computed in RF as the difference in prediction
accuracy between original data and permuted data from out-of-bag
samples (OOB). Permuted data is defined as each variable resampled from
its own distribution. This value can end up being negative if the number
of levels for a variable is small and is unbalanced, as the
redistribution doesn't change the data much. This commit shifts all the
importance values if some of them are negative to ensure that the lowest
importance value is 0.
Closes #231
----
---
[GitHub] madlib pull request #231: RF: Output non-negative importance values
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/madlib/pull/231
---
[GitHub] madlib issue #231: RF: Output non-negative importance values
Posted by fmcquillan99 <gi...@git.apache.org>.
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/231
Does this mean, then, that all var importance values are >= 0 now, and that the largest positive value corresponds to the most "important" variable?
Also, what is the range of possible values for variable performance?
---
[GitHub] madlib issue #231: RF: Output non-negative importance values
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on the issue:
https://github.com/apache/madlib/pull/231
This change ensures that all variable importance values are positive. The
remaining properties remain as is: i.e. the feature with max value is most
important and the values are not normalized.
On Feb 6, 2018 9:34 AM, "Frank McQuillan" <no...@github.com> wrote:
> Does this mean, then, that all var importance values are >= 0 now, and
> that the largest positive value corresponds to the most "important"
> variable?
>
> Also, what is the range of possible values for variable performance?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/madlib/pull/231#issuecomment-363501636>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ACIkB9XFLm0bYDUyZbmUoeyW3FYrze5Zks5tSI0ZgaJpZM4R7WXS>
> .
>
---
[GitHub] madlib issue #231: RF: Output non-negative importance values
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/madlib/pull/231
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/336/
---