You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ikumasa Mukai (Updated) (JIRA)" <ji...@apache.org> on 2012/01/26 07:51:42 UTC

[jira] [Updated] (MAHOUT-943) Improbe the way to make the split point on DF.

     [ https://issues.apache.org/jira/browse/MAHOUT-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ikumasa Mukai updated MAHOUT-943:
---------------------------------

    Attachment: MAHOUT-943.patch

I made a patch.

Following Deneche-san's advice, I added a mechanism to change the config of TreeBuilder with xml.

{noformat}
<?xml version="1.0"?>
<configuration>
  <treeBuilder class="org.apache.mahout.classifier.df.builder.DecisionTreeBuilder">
    <igSplit class="org.apache.mahout.classifier.df.split.ClassificationSplit"/>
    <m>5</m>
  </treeBuilder>
</configuration>
{noformat}

ClassificationSplit class is a sample splitter which uses the average value for the point.

{noformat}
./hadoop jar $MAHOUT_HOME/mahout-examples-0.6-SNAPSHOT-job.jar \
org.apache.mahout.classifier.df.mapreduce.BuildForest \
-Dmapred.max.split.size=1874231 \
-d $KDD_DATA/KDDTrain.data \
-ds $KDD_DATA/KDDTrain+.info \
-c $MAHOUT_HOME/conf/df-config.xml \
-p -t 100 -o $KDD_DATA/model
{noformat}

I added "-c" param on BuildForest. This param should pointto the conf(XML) file.
                
> Improbe the way to make the split point on DF.
> ----------------------------------------------
>
>                 Key: MAHOUT-943
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-943
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Ikumasa Mukai
>              Labels: DecisionForest
>         Attachments: MAHOUT-943.patch
>
>
> The numericalSplit() on OptIgSplit adopts the way to regard the attribute value having the best IG as the split point.
> But I think this is a little too strict and think it is better on some situation to  use the average value which is calced with the best IG value and the 2nd value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira