You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "XiaoboGu (JIRA)" <ji...@apache.org> on 2011/08/20 04:19:27 UTC
[jira] [Created] (MAHOUT-789) testclassifier seems does not work
using kdd data set
testclassifier seems does not work using kdd data set
-----------------------------------------------------
Key: MAHOUT-789
URL: https://issues.apache.org/jira/browse/MAHOUT-789
Project: Mahout
Issue Type: Bug
Components: Classification
Affects Versions: 0.6
Environment: CENTOS 5.5, Hadoop 0.20.203, and latest Mahout 0.6 snapshop.
Reporter: XiaoboGu
I am now testing the trainclassifier and testclassifier commands in
mabout, I prepaired a nbdf-train.csv and nbdf-test.csv file with the
following R commands:
df <- read.arff(file = "d:/temp/kdd/KDDTrain+.arff")
nbdf <- data.frame(class=df["class"],
protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
nbdf[["land"]] <- as.factor(nbdf[["land"]])
write.table(nbdf, file="D:/nbdf-train.csv", row.names=FALSE,
col.names=FALSE, quote=FALSE, sep="\t")
df <- read.arff(file = "d:/temp/kdd/KDDTest+.arff")
nbdf <- data.frame(class=df["class"],
protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
nbdf[["land"]] <- as.factor(nbdf[["land"]])
write.table(nbdf, file="D:/nbdf-test.csv", row.names=FALSE,
col.names=FALSE, quote=FALSE, sep="\t")
and put them under nbtest/train and nbtest/test in HDFS
then issue
mahout trainclassifier --input nbtest/train --output nbtest/output
mahout testclassifier --testDir nbtest/test --model nbtest/output
trainclassifier seems succed, but testclassifier failed with this:
[gpadmin@linuxsvr2 mahtest]$ mahout testclassifier --testDir
nbtest/test --model nbtest/output
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.6-SNAPSHOT-job.jar
11/08/15 18:06:20 WARN driver.MahoutDriver: No testclassifier.props
found on classpath, will use command-line arguments only
11/08/15 18:06:20 INFO bayes.TestClassifier: Loading model from:
{basePath=nbtest/output, classifierType=bayes, alpha_i=1.0,
dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
defaultCat=unknown, testDirPath=nbtest/test}
11/08/15 18:06:20 INFO bayes.TestClassifier: Testing Bayes Classifier
11/08/15 18:06:20 INFO bayes.SequenceFileModelReader: 77319.90481464032
11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: normal
-213.05542661827678 442.8886516970405 -0.48105867197522617
11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: anomaly
-442.8886516970405 442.8886516970405 -1.0
11/08/15 18:06:20 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 0 锟
Incorrectly Classified Instances : 0 锟
Total Classified Instances : 0
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c <--Classified as
0 0 0 | 0 a = normal
0 0 0 | 0 b = anomaly
0 0 0 | 0 c = unknown
Default Category: unknown: 2
11/08/15 18:06:20 INFO driver.MahoutDriver: Program took 746 ms
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAHOUT-789) testclassifier seems does not work
using kdd data set
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-789.
------------------------------
Resolution: Cannot Reproduce
I don't think this is enough info, or at least, this is not nearly narrowed down enough to point to a problem in the classifier. What have you tried in debugging? Want to get some indication you've ruled out problems in your env or data. Reopen if so.
> testclassifier seems does not work using kdd data set
> -----------------------------------------------------
>
> Key: MAHOUT-789
> URL: https://issues.apache.org/jira/browse/MAHOUT-789
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.6
> Environment: CENTOS 5.5, Hadoop 0.20.203, and latest Mahout 0.6 snapshop.
> Reporter: XiaoboGu
>
> I am now testing the trainclassifier and testclassifier commands in
> mabout, I prepaired a nbdf-train.csv and nbdf-test.csv file with the
> following R commands:
> df <- read.arff(file = "d:/temp/kdd/KDDTrain+.arff")
> nbdf <- data.frame(class=df["class"],
> protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
> nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
> nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
> nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
> nbdf[["land"]] <- as.factor(nbdf[["land"]])
> write.table(nbdf, file="D:/nbdf-train.csv", row.names=FALSE,
> col.names=FALSE, quote=FALSE, sep="\t")
> df <- read.arff(file = "d:/temp/kdd/KDDTest+.arff")
> nbdf <- data.frame(class=df["class"],
> protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
> nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
> nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
> nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
> nbdf[["land"]] <- as.factor(nbdf[["land"]])
> write.table(nbdf, file="D:/nbdf-test.csv", row.names=FALSE,
> col.names=FALSE, quote=FALSE, sep="\t")
> and put them under nbtest/train and nbtest/test in HDFS
> then issue
> mahout trainclassifier --input nbtest/train --output nbtest/output
> mahout testclassifier --testDir nbtest/test --model nbtest/output
> trainclassifier seems succed, but testclassifier failed with this:
> [gpadmin@linuxsvr2 mahtest]$ mahout testclassifier --testDir
> nbtest/test --model nbtest/output
> Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
> HADOOP_CONF_DIR=/usr/local/hadoop/conf
> MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.6-SNAPSHOT-job.jar
> 11/08/15 18:06:20 WARN driver.MahoutDriver: No testclassifier.props
> found on classpath, will use command-line arguments only
> 11/08/15 18:06:20 INFO bayes.TestClassifier: Loading model from:
> {basePath=nbtest/output, classifierType=bayes, alpha_i=1.0,
> dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
> defaultCat=unknown, testDirPath=nbtest/test}
> 11/08/15 18:06:20 INFO bayes.TestClassifier: Testing Bayes Classifier
> 11/08/15 18:06:20 INFO bayes.SequenceFileModelReader: 77319.90481464032
> 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: normal
> -213.05542661827678 442.8886516970405 -0.48105867197522617
> 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: anomaly
> -442.8886516970405 442.8886516970405 -1.0
> 11/08/15 18:06:20 INFO bayes.TestClassifier:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 0 锟
> Incorrectly Classified Instances : 0 锟
> Total Classified Instances : 0
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c <--Classified as
> 0 0 0 | 0 a = normal
> 0 0 0 | 0 b = anomaly
> 0 0 0 | 0 c = unknown
> Default Category: unknown: 2
> 11/08/15 18:06:20 INFO driver.MahoutDriver: Program took 746 ms
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira