You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2009/08/12 12:40:15 UTC

[jira] Issue Comment Edited: (MAHOUT-145) PartialData mapreduce Random Forests

    [ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742262#action_12742262 ] 

Deneche A. Hakim edited comment on MAHOUT-145 at 8/12/09 3:39 AM:
------------------------------------------------------------------

update: I did a re-run on 50 map tests, the new results are more coherent

KDD 25%
|| Num Map Tasks || Num Trees || Oob Error || Build Time || Step 1 || Step 1-2 || Step 2 || Step 2-2 || Step 3 ||
| 10 | 100 | 0.0194 | 0h 1m 23s 210 | 39s | 4s | 20s | 20s | 33s |
| 10 | 200 | 0.0203 | 0h 2m 16s 510 | 1m 1s | 9s | 26s | 41s | 33s |
| 10 | 400 | 0.0195 | 0h 4m 10s 9 | 1m 53s | 18s | 39s | 1m 20s | 32s |
| 20 | 100 | 0.3875 | 0h 1m 5s 288 | 20s | 2s | 18s | 25s | 31s |
| 20 | 200 | 0.3626 | 0h 1m 29s 145 | 23s | 5s | 22s | 39s | 33s |
| 20 | 400 | 0.5003 | 0h 2m 30s 789 | 35s | 8s | 28s | 1m 19s | 32s |
| 50 | 100 | 0.5041 | 0h 1m 1s 375 | 19s | 3s | 19s | 21s | 32s |
| 50 | 200 | 0.5041 | 0h 1m 19s 202 | 19s | 2s | 22s | 36s | 32s |
| 50 | 400 | 0.5041 | 0h 2m 2s 250 | 18s | 4s | 28s | 1m 12s | 33s |


      was (Author: adeneche):
    KDD 25%
|| Num Map Tasks || Num Trees || Oob Error || Build Time || Step 1 || Step 1-2 || Step 2 || Step 2-2 || Step 3 ||
| 10 | 100 | 0.0194 | 0h 1m 23s 210 | 39s | 4s | 20s | 20s | 33s |
| 10 | 200 | 0.0203 | 0h 2m 16s 510 | 1m 1s | 9s | 26s | 41s | 33s |
| 10 | 400 | 0.0195 | 0h 4m 10s 9 | 1m 53s | 18s | 39s | 1m 20s | 32s |
| 20 | 100 | 0.3875 | 0h 1m 5s 288 | 20s | 2s | 18s | 25s | 31s |
| 20 | 200 | 0.3626 | 0h 1m 29s 145 | 23s | 5s | 22s | 39s | 33s |
| 20 | 400 | 0.5003 | 0h 2m 30s 789 | 35s | 8s | 28s | 1m 19s | 32s |
| 50 | 100 | 0.5041 | 0h 1m 46s 717 | 14s | 2s | 1m 11s | 20s | 33s |
| 50 | 200 | 0.5041 | 0h 1m 20s 977 | 17s | 2s | 22s | 40s | 33s |
| 50 | 400 | 0.5041 | 0h 2m 1s 714 | 16s | 4s | 29s | 1m 12s | 34s |

  
> PartialData mapreduce Random Forests
> ------------------------------------
>
>                 Key: MAHOUT-145
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-145
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Deneche A. Hakim
>            Priority: Minor
>         Attachments: partial_August_10.patch, partial_August_2.patch, partial_August_9.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions of the data. That loses some of the solidity of the original method, but could actually do better if the splits exposed non-stationary behavior."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.