You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Drew Farris (JIRA)" <ji...@apache.org> on 2010/10/01 05:10:33 UTC

[jira] Commented: (MAHOUT-451) Simple utility to split bayes input into training/test sets

    [ https://issues.apache.org/jira/browse/MAHOUT-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916762#action_12916762 ] 

Drew Farris commented on MAHOUT-451:
------------------------------------

I have a patch that changes this to work using the hadoop filesystem api. I plan to get this posted and  tested by Monday for inclusion in 0.4

> Simple utility to split bayes input into training/test sets
> -----------------------------------------------------------
>
>                 Key: MAHOUT-451
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-451
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Assignee: Drew Farris
>            Priority: Minor
>         Attachments: MAHOUT-451.patch, MAHOUT-451.patch
>
>
> Provides a simply utility that you point at a directory containing files in Bayes classifier input format. Given the number of documents to write to the test set, for each input file it will produce files in two output directories, one containing training data with the test documents removed and a second containing the test documents. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.