You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/08/17 11:36:27 UTC

[jira] [Updated] (MAHOUT-785) Universal input file format for classifier algorithms in Mahout

     [ https://issues.apache.org/jira/browse/MAHOUT-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-785:
-----------------------------

    Affects Version/s:     (was: 0.6)

I think this is a fairly open-ended item. It's not clear to me that these different algorithms operate on logically the same input, though I imagine the command line options and such could be more standardized. Do you have particular views on concrete changes to this end?

> Universal input file format for classifier algorithms in Mahout
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-785
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-785
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: XiaoboGu
>
> I think a universal input file format is much more convinient for users, especially command line users, and we should even consider use some universal command line options for the classification algorithms, such as options for target/predictor variables and their types. Then users can prepare their data once, and build different models to get the best one. Currentlly we should consider the following:
> 1. SGD LogisticRegression
> 2. NaiveBayes
> 3. Bayes
> 4. Random Forest

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira