You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Joe Prasanna Kumar (JIRA)" <ji...@apache.org> on 2010/10/08 02:12:54 UTC

[jira] Commented: (MAHOUT-509) Options in Bayes TrainClassifier and TestClassifier

    [ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919111#action_12919111 ] 

Joe Prasanna Kumar commented on MAHOUT-509:
-------------------------------------------

Gangadhar,

Thanks for testing and your feedbacks.

For 1 (modifying testClassifier.props), we would have to make the parameters in TestClassifier optional rather than providing a default value in the property file. I verified TestClassifier and see that I missed out making these parameters optional. In the patch I had submitted, the default values of parameters were taken care of but changing the parameters to optional was missed out. I'll create a patch for this tonite and post it in here.

For 2 (trimming spaces), I believe this should be a common fix and not specific to TestClassifier. Any class that uses the property files and has an integer value will suffer from this issue. So I digged a little bit to see that the fix should be in MahoutDriver.  current code is {code} argMap.put(longArg, new String[] {mainProps.getProperty(key)}); {code} and should be changed to {code}argMap.put(longArg, new String[] {mainProps.getProperty(key).trim()}); {code}
I believe this should be a separate Jira issue. If so, you could probably create one and can verify my proposal and submit a patch.

regards,
Joe.


> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
>                 Key: MAHOUT-509
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-509
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>            Reporter: Joe Prasanna Kumar
>            Assignee: Robin Anil
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory. 
> The documentation / command line help says that 
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
>         log.info("Training Bayes Classifier");
>         trainNaiveBayes(inputPath, outputPath, params);
>         
>       } else if ("cbayes".equalsIgnoreCase(classifierType)) {
>         log.info("Training Complementary Bayes Classifier");
>         // setup the HDFS and copy the files there, then run the trainer
>         trainCNaiveBayes(inputPath, outputPath, params);
>       }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
>         log.info("Training Complementary Bayes Classifier");
>         trainCNaiveBayes(inputPath, outputPath, params);
>         
>       } else  {
>         log.info("Training  Bayes Classifier");
>         // setup the HDFS and copy the files there, then run the trainer
>         trainNaiveBayes(inputPath, outputPath, params);
>       }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.