You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Don Pazel <dp...@adconion.com> on 2011/07/12 19:52:39 UTC

Random Forest feature types

>From what I can see, the random forest implementation takes either numerical or categorical feature data.  That worked fine for me, until I tried to incorporate word or text features.  I liked the encoders used in SGD, but they don't seem to apply to random forests.  So, did I overlook something simple that would allow me to include word or text features?  If not, are there plans (assuming the core algorithm allows) to add these feature types to random forests in the future?


Thanks,
Don Pazel 

Re : Random Forest feature types

Posted by deneche abdelhakim <a_...@yahoo.fr>.
I will gladly help any improvement to Mahout's Decision Forests


________________________________
De : Ted Dunning <te...@gmail.com>
À : user@mahout.apache.org
Envoyé le : Mardi 12 Juillet 2011 19h00
Objet : Re: Random Forest feature types

The random forest code predates the fancy encoders so support is limited for
that.

I would expect that you might be able to adapt the code to improve support.
Deneche would likely be willing to help (the original implementor).

On Tue, Jul 12, 2011 at 10:52 AM, Don Pazel <dp...@adconion.com> wrote:

> From what I can see, the random forest implementation takes either
> numerical or categorical feature data.  That worked fine for me, until I
> tried to incorporate word or text features.  I liked the encoders used in
> SGD, but they don't seem to apply to random forests.  So, did I overlook
> something simple that would allow me to include word or text features?  If
> not, are there plans (assuming the core algorithm allows) to add these
> feature types to random forests in the future?
>
>
> Thanks,
> Don Pazel

Re: Random Forest feature types

Posted by Ted Dunning <te...@gmail.com>.
The random forest code predates the fancy encoders so support is limited for
that.

I would expect that you might be able to adapt the code to improve support.
 Deneche would likely be willing to help (the original implementor).

On Tue, Jul 12, 2011 at 10:52 AM, Don Pazel <dp...@adconion.com> wrote:

> From what I can see, the random forest implementation takes either
> numerical or categorical feature data.  That worked fine for me, until I
> tried to incorporate word or text features.  I liked the encoders used in
> SGD, but they don't seem to apply to random forests.  So, did I overlook
> something simple that would allow me to include word or text features?  If
> not, are there plans (assuming the core algorithm allows) to add these
> feature types to random forests in the future?
>
>
> Thanks,
> Don Pazel