You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by iyerr3 <gi...@git.apache.org> on 2018/02/06 16:37:15 UTC

[GitHub] madlib pull request #231: RF: Output non-negative importance values

GitHub user iyerr3 opened a pull request:

    https://github.com/apache/madlib/pull/231

    RF: Output non-negative importance values

    Variable importance is computed in RF as the difference in prediction
    accuracy between original data and permuted data from out-of-bag
    samples (OOB). Permuted data is defined as each variable resampled from
    its own distribution. This value can end up being negative if the number
    of levels for a variable is small and is unbalanced, as the
    redistribution doesn't change the data much. This commit shifts all the
    importance values if some of them are negative to ensure that the lowest
    importance value is 0.
    
    Closes #231

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iyerr3/incubator-madlib bugfix/rf_neg_var_imp

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #231
    
----
commit f4265854dd94899145c9b40d4ce77450f34bdd78
Author: Rahul Iyer <ri...@...>
Date:   2018-02-06T16:20:49Z

    RF: Output non-negative importance values
    
    Variable importance is computed in RF as the difference in prediction
    accuracy between original data and permuted data from out-of-bag
    samples (OOB). Permuted data is defined as each variable resampled from
    its own distribution. This value can end up being negative if the number
    of levels for a variable is small and is unbalanced, as the
    redistribution doesn't change the data much. This commit shifts all the
    importance values if some of them are negative to ensure that the lowest
    importance value is 0.
    
    Closes #231

----


---

[GitHub] madlib pull request #231: RF: Output non-negative importance values

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/madlib/pull/231


---

[GitHub] madlib issue #231: RF: Output non-negative importance values

Posted by fmcquillan99 <gi...@git.apache.org>.
Github user fmcquillan99 commented on the issue:

    https://github.com/apache/madlib/pull/231
  
    Does this mean, then, that all var importance values are >= 0 now, and that the largest positive value corresponds to the most "important" variable?
    
    Also, what is the range of possible values for variable performance?


---

[GitHub] madlib issue #231: RF: Output non-negative importance values

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on the issue:

    https://github.com/apache/madlib/pull/231
  
    This change ensures that all variable importance values are positive. The
    remaining properties remain as is: i.e. the feature with max value is most
    important and the values are not normalized.
    
    On Feb 6, 2018 9:34 AM, "Frank McQuillan" <no...@github.com> wrote:
    
    > Does this mean, then, that all var importance values are >= 0 now, and
    > that the largest positive value corresponds to the most "important"
    > variable?
    >
    > Also, what is the range of possible values for variable performance?
    >
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/madlib/pull/231#issuecomment-363501636>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/ACIkB9XFLm0bYDUyZbmUoeyW3FYrze5Zks5tSI0ZgaJpZM4R7WXS>
    > .
    >



---

[GitHub] madlib issue #231: RF: Output non-negative importance values

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/231
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/336/



---