You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2009/08/05 11:24:14 UTC

[jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

    [ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739386#action_12739386 ] 

Deneche A. Hakim commented on MAHOUT-145:
-----------------------------------------

I'm running some tests to compare between the *in-mem* and *partial* implementations. Here are the first results from my laptop (hadoop 0.19.1 in pseudo-distributed with 2 cores processor):

All the tests are using a random seed = 1 and only one random feature is selected at a time.

KDD 1%
|| Num Map Tasks || Num trees || In-Mem build time || Partial build time || In-Mem oob error || Partial oob error ||
| 2 | 10 |  0h 0m 21s 5 | 0h 0m 31s 823 | 8.38E-4 | 0.43 |
| 2 | 100 | 0h 0m 57s 641 | 0h 0m 44s 43 | 4.45E-4 | 0.42 |
| 2 | 200 | 0h 1m 38s 307 | 0h 1m 4s 523 | 4.45E-4 | 0.43 |
| 2 | 400 | 0h 3m 5s 883 | 0h 1m 43s 852 | 4.65E-4 | 0.42 |
| 5 | 10 | 0h 0m 28s 404 | 0h 0m 33s 374 | 8.38E-4 | 0.32 |
| 5 | 100 | 0h 1m 12s 260 | 0h 0m 43s 628 | 4.65E-4 | 0.34 |
| 5 | 200 | 0h 2m 0s 293 | 0h 0m 47s 994 | 4.45E-4 | 0.34 |
| 5 | 400 | 0h 3m 28s 69 | 0h 1m 4s 351 | 4.65E-4 | 0.34 |
| 10 | 10 | 0h 0m 42s 654 | 0h 0m 49s 785 | 7.98E-4 | 0.23 |
| 10 | 100 | 0h 1m 19s 405 | 0h 0m 53s 646 | 4.45E-4 | 0.23 |
| 10 | 200 | 0h 2m 6s 375 | 0h 0m 56s 89 | 4.65E-4 | 0.23 |
| 10 | 400 | 0h 3m 33s 253 | 0h 1m 8s 29 | 4.45E-4 | 0.23 |
| 20 | 10 |  |  |  |  |
| 20 | 100 | 0h 2m 21s 762 | 0h 1m 23s 883 | 4.04E-4 | 0.23 |
| 20 | 200 | 0h 2m 32s 952 | 0h 1m 22s 12 | 4.45E-4 | 0.23 |
| 20 | 400 | 0h 4m 4s 487 | 0h 1m 31s 248 | 4.25E-4 | 0.23 |
| 50 | 10 |  |  |  |  |
| 50 | 100 | 0h 3m 15s 485 | 0h 2m 53s 70 | 4.25E-4 | 0.23 |
| 50 | 200 | 0h 4m 2s 509 | 0h 2m 51s 733 | 4.45E-4 | 0.23 |
| 50 | 400 | 0h 5m 27s 252 | 0h 3m 7s 542 | 4.25E-4 | 0.23 |


> PartialData mapreduce Random Forests
> ------------------------------------
>
>                 Key: MAHOUT-145
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-145
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Deneche A. Hakim
>            Priority: Minor
>         Attachments: partial_August_2.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions of the data. That loses some of the solidity of the original method, but could actually do better if the splits exposed non-stationary behavior."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.