You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Karl Wettin <ka...@gmail.com> on 2008/05/31 16:17:19 UTC

computational power assertion

I've been thinking a bit about feature selection and have problems  
making up my mind when it comes to what sort of solutions makes most  
sense to initally focus on. If people have an unlimited number of  
nodes then complexity is no longer a big problem.

My guess is that people historically use nested subset ranking as the  
complexity is log N rather than N of exhaustive search wrappers. My  
guts tells me that the latter will produce a better result (given you  
also do Bonferroni correction or so) but who has an unlimited number  
of nodes?

The real question is hidden somewhere in between the lines and it  
doesn't only apply to feature selection.



          karl

Re: computational power assertion

Posted by Ted Dunning <te...@gmail.com>.

It is very rare for variable selection to require much more than forward or
reverse step-wise refinement.  In practice, main effects and low order
interactions are prominent enough that you can see them one at a time.  It
is very unusual for interactions to be so balanced that they entirely mask
away the main effects.

Also, if you are using something like regression (which is what many methods
are at their heart), it is instructive to build a toy problem with a few
main effects and many noise variables and then look at the posterior
distribution of the coefficients.  For non-interactive models, what you see
is that the posterior distribution of the main effects does not vary with
the addition of noise variables.  What this means is that techniques like
Bonferroni correction really lead you astray.  What is happening with
variable selection is that you gather enough data in order to be sure that
you will find the main effects that you want.  Whatever cutoff you choose
will have some probability of selecting noise variables as well, but if you
select only a few of them, then you are going to be pretty well off.  The
complexity of this sort of procedure is not very high at all.

With interactions, the situation is a bit more complex because the design
matrix isn't so nicely orthogonal, but the same rough principles apply.

The crux of all of this, then, is to have as good a selection test as
possible that increases the probability of selecting terms with (truly)
non-zero coefficients and rejecting terms with (truly) zero coefficients.
For very large numbers of variables that clearly requires substantial
regularization.  SVM fans will introduce large margin techniques for this
and Bayesians will introduce priors, but the effect is essentially the same.

The Breiman paper on random forests also has some lovely techniques for
selecting variables that re-run the training after randomizing the variable
under consideration or simply by re-running the model evaluation.  If the
randomization has little effect, then you can presume that the variable is
of little importance.  Random forests and all of the related techniques that
Breiman suggested have the lovely property (for us) of being very compute
intensive.

On Sat, May 31, 2008 at 7:17 AM, Karl Wettin <ka...@gmail.com> wrote:

> I've been thinking a bit about feature selection and have problems making
> up my mind when it comes to what sort of solutions makes most sense to
> initally focus on. If people have an unlimited number of nodes then
> complexity is no longer a big problem.
>
> My guess is that people historically use nested subset ranking as the
> complexity is log N rather than N of exhaustive search wrappers. My guts
> tells me that the latter will produce a better result (given you also do
> Bonferroni correction or so) but who has an unlimited number of nodes?
>
> The real question is hidden somewhere in between the lines and it doesn't
> only apply to feature selection.
>
>
>
>         karl
>

-- 
ted