You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2014/03/02 22:57:21 UTC

[jira] [Commented] (MAHOUT-1153) Implement streaming random forests

    [ https://issues.apache.org/jira/browse/MAHOUT-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917581#comment-13917581 ] 

Suneel Marthi commented on MAHOUT-1153:
---------------------------------------

[~andytwigg]  I understand this has been implemented on Spark and an implementation is  available at (http://featurestream.io), do u think we should start the conversation of rolling this into Mahout?

> Implement streaming random forests
> ----------------------------------
>
>                 Key: MAHOUT-1153
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1153
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Andy Twigg
>              Labels: features
>             Fix For: Backlog
>
>
> The current random forest implementations are in-core and not scalable. This issue is to add an out-of-core, scalable, streaming implementation. Initially it could be based on [1], and using mappers in a master-worker style.
> [1] http://jmlr.csail.mit.edu/papers/volume11/ben-haim10a/ben-haim10a.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [jira] [Commented] (MAHOUT-1153) Implement streaming random forests

Posted by Andy Twigg <an...@gmail.com>.
Yes, we could also consider committing it into the current mahout code
base. There are probably some advantages over the current impl. What
direction are you thinking?

On 2 March 2014 13:57, Suneel Marthi (JIRA) <ji...@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/MAHOUT-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917581#comment-13917581 ]
>
> Suneel Marthi commented on MAHOUT-1153:
> ---------------------------------------
>
> [~andytwigg]  I understand this has been implemented on Spark and an implementation is  available at (http://featurestream.io), do u think we should start the conversation of rolling this into Mahout?
>
>> Implement streaming random forests
>> ----------------------------------
>>
>>                 Key: MAHOUT-1153
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1153
>>             Project: Mahout
>>          Issue Type: New Feature
>>          Components: Classification
>>            Reporter: Andy Twigg
>>              Labels: features
>>             Fix For: Backlog
>>
>>
>> The current random forest implementations are in-core and not scalable. This issue is to add an out-of-core, scalable, streaming implementation. Initially it could be based on [1], and using mappers in a master-worker style.
>> [1] http://jmlr.csail.mit.edu/papers/volume11/ben-haim10a/ben-haim10a.pdf
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)