You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2013/11/05 14:27:18 UTC

[jira] [Commented] (MAHOUT-1153) Implement streaming random forests

    [ https://issues.apache.org/jira/browse/MAHOUT-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813912#comment-13813912 ] 

Suneel Marthi commented on MAHOUT-1153:
---------------------------------------

Hey Andy,

The github link doesn't work anymore, do u think this can be part of 0.9?

> Implement streaming random forests
> ----------------------------------
>
>                 Key: MAHOUT-1153
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1153
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Andy Twigg
>              Labels: features
>             Fix For: Backlog
>
>
> The current random forest implementations are in-core and not scalable. This issue is to add an out-of-core, scalable, streaming implementation. Initially it could be based on [1], and using mappers in a master-worker style.
> [1] http://jmlr.csail.mit.edu/papers/volume11/ben-haim10a/ben-haim10a.pdf



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Re: [jira] [Commented] (MAHOUT-1153) Implement streaming random forests

Posted by Andy Twigg <an...@gmail.com>.
Hi Suneel,

I spent a significant amount of effort trying to get this working against
0.8, but unfortunately it seemed a bad fit. Instead I wrote a version
against spark, which is now available as a service - http://featurestream.io

I'm open to open-sourcing it, but I wanted to see what use cases would come
out of it first. If anyone has any good idea, let me know.

Cheers,
Andy

--
andy.twigg@gmail.com




On 5 November 2013 05:27, Suneel Marthi (JIRA) <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813912#comment-13813912]
>
> Suneel Marthi commented on MAHOUT-1153:
> ---------------------------------------
>
> Hey Andy,
>
> The github link doesn't work anymore, do u think this can be part of 0.9?
>
> > Implement streaming random forests
> > ----------------------------------
> >
> >                 Key: MAHOUT-1153
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1153
> >             Project: Mahout
> >          Issue Type: New Feature
> >          Components: Classification
> >            Reporter: Andy Twigg
> >              Labels: features
> >             Fix For: Backlog
> >
> >
> > The current random forest implementations are in-core and not scalable.
> This issue is to add an out-of-core, scalable, streaming implementation.
> Initially it could be based on [1], and using mappers in a master-worker
> style.
> > [1]
> http://jmlr.csail.mit.edu/papers/volume11/ben-haim10a/ben-haim10a.pdf
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1#6144)
>