You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Xiaobo Gu <gu...@gmail.com> on 2011/08/15 16:20:34 UTC

testclassifier seems does not work using kdd data set

Hi,

I am now testing the trainclassifier and testclassifier commands in
mabout, I prepaired a nbdf-train.csv and nbdf-test.csv file with the
following R commands:

df <- read.arff(file = "d:/temp/kdd/KDDTrain+.arff")
nbdf <- data.frame(class=df["class"],
protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
nbdf[["land"]] <- as.factor(nbdf[["land"]])
write.table(nbdf, file="D:/nbdf-train.csv", row.names=FALSE,
col.names=FALSE, quote=FALSE, sep="\t")

df <- read.arff(file = "d:/temp/kdd/KDDTest+.arff")
nbdf <- data.frame(class=df["class"],
protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
nbdf[["land"]] <- as.factor(nbdf[["land"]])
write.table(nbdf, file="D:/nbdf-test.csv", row.names=FALSE,
col.names=FALSE, quote=FALSE, sep="\t")

and put them under nbtest/train and nbtest/test in HDFS
then issue

mahout trainclassifier --input nbtest/train --output nbtest/output
mahout testclassifier --testDir nbtest/test --model nbtest/output

trainclassifier seems succed, but testclassifier failed with this:

[gpadmin@linuxsvr2 mahtest]$ mahout testclassifier --testDir
nbtest/test --model nbtest/output
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.6-SNAPSHOT-job.jar
11/08/15 18:06:20 WARN driver.MahoutDriver: No testclassifier.props
found on classpath, will use command-line arguments only
11/08/15 18:06:20 INFO bayes.TestClassifier: Loading model from:
{basePath=nbtest/output, classifierType=bayes, alpha_i=1.0,
dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
defaultCat=unknown, testDirPath=nbtest/test}
11/08/15 18:06:20 INFO bayes.TestClassifier: Testing Bayes Classifier
11/08/15 18:06:20 INFO bayes.SequenceFileModelReader: 77319.90481464032
11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: normal
-213.05542661827678 442.8886516970405 -0.48105867197522617
11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: anomaly
-442.8886516970405 442.8886516970405 -1.0
11/08/15 18:06:20 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          0             锟


Incorrectly Classified Instances        :          0                 锟



Total Classified Instances              :          0

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       <--Classified as
0       0       0        |  0           a     = normal
0       0       0        |  0           b     = anomaly
0       0       0        |  0           c     = unknown
Default Category: unknown: 2


11/08/15 18:06:20 INFO driver.MahoutDriver: Program took 746 ms