You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Robin Anil (JIRA)" <ji...@apache.org> on 2010/07/29 00:24:18 UTC

[jira] Commented: (MAHOUT-451) Simple utility to split bayes input into training/test sets

    [ https://issues.apache.org/jira/browse/MAHOUT-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893394#action_12893394 ] 

Robin Anil commented on MAHOUT-451:
-----------------------------------

Print the option exception along with help. Other wise good to commit.

Good to have. Type of split (random or Use a chunk from 1/4 to 2/4th position) 



> Simple utility to split bayes input into training/test sets
> -----------------------------------------------------------
>
>                 Key: MAHOUT-451
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-451
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Priority: Minor
>         Attachments: MAHOUT-451.patch
>
>
> Provides a simply utility that you point at a directory containing files in Bayes classifier input format. Given the number of documents to write to the test set, for each input file it will produce files in two output directories, one containing training data with the test documents removed and a second containing the test documents. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.