You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Joe Prasanna Kumar (JIRA)" <ji...@apache.org> on 2010/09/22 06:44:34 UTC
[jira] Created: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Options in Bayes TrainClassifier and TestClassifier
---------------------------------------------------
Key: MAHOUT-509
URL: https://issues.apache.org/jira/browse/MAHOUT-509
Project: Mahout
Issue Type: Bug
Components: Classification
Reporter: Joe Prasanna Kumar
Priority: Minor
Fix For: 0.4
Hi all,
As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
The documentation / command line help says that
default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
default --classifierType is bayes but withRequired is set to true and we have code like
if ("bayes".equalsIgnoreCase(classifierType)) {
log.info("Training Bayes Classifier");
trainNaiveBayes(inputPath, outputPath, params);
} else if ("cbayes".equalsIgnoreCase(classifierType)) {
log.info("Training Complementary Bayes Classifier");
// setup the HDFS and copy the files there, then run the trainer
trainCNaiveBayes(inputPath, outputPath, params);
}
which should be changed to
if ("cbayes".equalsIgnoreCase(classifierType)) {
log.info("Training Complementary Bayes Classifier");
trainCNaiveBayes(inputPath, outputPath, params);
} else {
log.info("Training Bayes Classifier");
// setup the HDFS and copy the files there, then run the trainer
trainNaiveBayes(inputPath, outputPath, params);
}
Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
reg
Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Joe Prasanna Kumar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe Prasanna Kumar updated MAHOUT-509:
--------------------------------------
Attachment: MAHOUT-509-fix-TestClassifier.patch
I have fixed TestClassifier with making the parameters ngram, classifiertype and datasource as optional. Verified that this works. So to test a classifier (say with wikipedia example), the command would be {code} $MAHOUT_HOME/bin/mahout testclassifier -m wikipediamodel -d wikipediainput -method mapreduce {code}
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509-fix-TestClassifier.patch, MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921225#action_12921225 ]
Hudson commented on MAHOUT-509:
-------------------------------
Integrated in Mahout-Quality #399 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/399/])
MAHOUT-509 make some parameters optional
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509-fix-TestClassifier.patch, MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914964#action_12914964 ]
Hudson commented on MAHOUT-509:
-------------------------------
Integrated in Mahout-Quality #331 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/331/])
MAHOUT-509
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Joe Prasanna Kumar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919111#action_12919111 ]
Joe Prasanna Kumar commented on MAHOUT-509:
-------------------------------------------
Gangadhar,
Thanks for testing and your feedbacks.
For 1 (modifying testClassifier.props), we would have to make the parameters in TestClassifier optional rather than providing a default value in the property file. I verified TestClassifier and see that I missed out making these parameters optional. In the patch I had submitted, the default values of parameters were taken care of but changing the parameters to optional was missed out. I'll create a patch for this tonite and post it in here.
For 2 (trimming spaces), I believe this should be a common fix and not specific to TestClassifier. Any class that uses the property files and has an integer value will suffer from this issue. So I digged a little bit to see that the fix should be in MahoutDriver. current code is {code} argMap.put(longArg, new String[] {mainProps.getProperty(key)}); {code} and should be changed to {code}argMap.put(longArg, new String[] {mainProps.getProperty(key).trim()}); {code}
I believe this should be a separate Jira issue. If so, you could probably create one and can verify my proposal and submit a patch.
regards,
Joe.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Gangadhar Nittala (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gangadhar Nittala updated MAHOUT-509:
-------------------------------------
Attachment: MAHOUT-509_1.patch
Handles two issues
1. The testClassifier.props needs to have the datasource, classifiier type and ngram size
2. If the ngram size has any leading / following spaces, then a NumberFormatException is thrown. The patch handles this by trimming the spaces.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Gangadhar Nittala (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919518#action_12919518 ]
Gangadhar Nittala commented on MAHOUT-509:
------------------------------------------
I tested the MAHOUT-509-fix-TestClassifier.patch and the classifier works without the extra parameters (the wiki commands work as stated in the wiki). I am not sure if this will get committed for the 0.4 release or will get deferred to 0.5.
I checked the code you mentioned in the MahoutDriver.java and that seems a valid place to fix the trimming of the properties read. I will dig this a bit more and will create a JIRA issue for the 0.5 release.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509-fix-TestClassifier.patch, MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921210#action_12921210 ]
Ted Dunning commented on MAHOUT-509:
------------------------------------
Committed MAHOUT-509-fix-TestClassifier.patch
Also accidentally committed a few additional items that I had been holding back on. These are generally new code so they shouldn't have any
impact on users and all tests should still pass.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509-fix-TestClassifier.patch, MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (MAHOUT-509) Options in Bayes
TrainClassifier and TestClassifier
Posted by "Gangadhar Nittala (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919075#action_12919075 ]
Gangadhar Nittala edited comment on MAHOUT-509 at 10/7/10 5:42 PM:
-------------------------------------------------------------------
The attached patch handles two issues
1. The testClassifier.props needs to have the datasource, classifiier type and ngram size
2. If the ngram size has any leading / following spaces, then a NumberFormatException is thrown. The patch handles this by trimming the spaces.
was (Author: gangadhar):
Handles two issues
1. The testClassifier.props needs to have the datasource, classifiier type and ngram size
2. If the ngram size has any leading / following spaces, then a NumberFormatException is thrown. The patch handles this by trimming the spaces.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Joe Prasanna Kumar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe Prasanna Kumar updated MAHOUT-509:
--------------------------------------
Attachment: MAHOUT-509.patch
The patch contains changes to
1. TrainClassifier - setting default values of classifierType, dataSource, ngram, mindf
2. TestClassifier - setting default values of classifierType, dataSource and just rearranging code for setting the default values
3. driver.classes.props - added entry for WikipediaXmlSplitter and WikipediaDatasetCreatorDriver, so they could executed using the mahout command line util.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Joe Prasanna Kumar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921207#action_12921207 ]
Joe Prasanna Kumar commented on MAHOUT-509:
-------------------------------------------
was someone able to apply MAHOUT-509-fix-TestClassifier.patch so that the classifierType, dataSource and ngram are all optional params in TestClassifier
thanks
Joe.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509-fix-TestClassifier.patch, MAHOUT-509.patch, MAHOUT-509_1.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-509) Options in Bayes TrainClassifier and
TestClassifier
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-509.
------------------------------
Assignee: Robin Anil
Resolution: Fixed
Committed, seems reasonable.
> Options in Bayes TrainClassifier and TestClassifier
> ---------------------------------------------------
>
> Key: MAHOUT-509
> URL: https://issues.apache.org/jira/browse/MAHOUT-509
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Reporter: Joe Prasanna Kumar
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-509.patch
>
>
> Hi all,
> As I was going through wikipedia example, I encountered a situation with TrainClassifier wherein some of the options with default values are actually mandatory.
> The documentation / command line help says that
> default source (--datasource) is hdfs but TrainClassifier has withRequired(true) while building the --datasource option. We are checking if the dataSourceType is hbase else set it to hdfs. so ideally withRequired should be set to false
> default --classifierType is bayes but withRequired is set to true and we have code like
> if ("bayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Bayes Classifier");
> trainNaiveBayes(inputPath, outputPath, params);
>
> } else if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainCNaiveBayes(inputPath, outputPath, params);
> }
> which should be changed to
> if ("cbayes".equalsIgnoreCase(classifierType)) {
> log.info("Training Complementary Bayes Classifier");
> trainCNaiveBayes(inputPath, outputPath, params);
>
> } else {
> log.info("Training Bayes Classifier");
> // setup the HDFS and copy the files there, then run the trainer
> trainNaiveBayes(inputPath, outputPath, params);
> }
> Please let me know if this looks valid and I'll submit a patch for a JIRA issue.
> reg
> Joe.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.