You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "XiaoboGu (JIRA)" <ji...@apache.org> on 2011/08/20 04:19:27 UTC

[jira] [Created] (MAHOUT-789) testclassifier seems does not work using kdd data set

testclassifier seems does not work using kdd data set
-----------------------------------------------------

                 Key: MAHOUT-789
                 URL: https://issues.apache.org/jira/browse/MAHOUT-789
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.6
         Environment: CENTOS 5.5, Hadoop 0.20.203, and latest Mahout 0.6 snapshop.
            Reporter: XiaoboGu


I am now testing the trainclassifier and testclassifier commands in
mabout, I prepaired a nbdf-train.csv and nbdf-test.csv file with the
following R commands:

df <- read.arff(file = "d:/temp/kdd/KDDTrain+.arff")
nbdf <- data.frame(class=df["class"],
protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
nbdf[["land"]] <- as.factor(nbdf[["land"]])
write.table(nbdf, file="D:/nbdf-train.csv", row.names=FALSE,
col.names=FALSE, quote=FALSE, sep="\t")

df <- read.arff(file = "d:/temp/kdd/KDDTest+.arff")
nbdf <- data.frame(class=df["class"],
protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
nbdf[["land"]] <- as.factor(nbdf[["land"]])
write.table(nbdf, file="D:/nbdf-test.csv", row.names=FALSE,
col.names=FALSE, quote=FALSE, sep="\t")

and put them under nbtest/train and nbtest/test in HDFS
then issue

mahout trainclassifier --input nbtest/train --output nbtest/output
mahout testclassifier --testDir nbtest/test --model nbtest/output

trainclassifier seems succed, but testclassifier failed with this:

[gpadmin@linuxsvr2 mahtest]$ mahout testclassifier --testDir
nbtest/test --model nbtest/output
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.6-SNAPSHOT-job.jar
11/08/15 18:06:20 WARN driver.MahoutDriver: No testclassifier.props
found on classpath, will use command-line arguments only
11/08/15 18:06:20 INFO bayes.TestClassifier: Loading model from:
{basePath=nbtest/output, classifierType=bayes, alpha_i=1.0,
dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
defaultCat=unknown, testDirPath=nbtest/test}
11/08/15 18:06:20 INFO bayes.TestClassifier: Testing Bayes Classifier
11/08/15 18:06:20 INFO bayes.SequenceFileModelReader: 77319.90481464032
11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: normal
-213.05542661827678 442.8886516970405 -0.48105867197522617
11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: anomaly
-442.8886516970405 442.8886516970405 -1.0
11/08/15 18:06:20 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          0             锟


Incorrectly Classified Instances        :          0                 锟



Total Classified Instances              :          0

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       <--Classified as
0       0       0        |  0           a     = normal
0       0       0        |  0           b     = anomaly
0       0       0        |  0           c     = unknown
Default Category: unknown: 2


11/08/15 18:06:20 INFO driver.MahoutDriver: Program took 746 ms


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (MAHOUT-789) testclassifier seems does not work using kdd data set

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-789.
------------------------------

    Resolution: Cannot Reproduce

I don't think this is enough info, or at least, this is not nearly narrowed down enough to point to a problem in the classifier. What have you tried in debugging? Want to get some indication you've ruled out problems in your env or data. Reopen if so.
                
> testclassifier seems does not work using kdd data set
> -----------------------------------------------------
>
>                 Key: MAHOUT-789
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-789
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: CENTOS 5.5, Hadoop 0.20.203, and latest Mahout 0.6 snapshop.
>            Reporter: XiaoboGu
>
> I am now testing the trainclassifier and testclassifier commands in
> mabout, I prepaired a nbdf-train.csv and nbdf-test.csv file with the
> following R commands:
> df <- read.arff(file = "d:/temp/kdd/KDDTrain+.arff")
> nbdf <- data.frame(class=df["class"],
> protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
> nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
> nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
> nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
> nbdf[["land"]] <- as.factor(nbdf[["land"]])
> write.table(nbdf, file="D:/nbdf-train.csv", row.names=FALSE,
> col.names=FALSE, quote=FALSE, sep="\t")
> df <- read.arff(file = "d:/temp/kdd/KDDTest+.arff")
> nbdf <- data.frame(class=df["class"],
> protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"])
> nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]])
> nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]])
> nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]])
> nbdf[["land"]] <- as.factor(nbdf[["land"]])
> write.table(nbdf, file="D:/nbdf-test.csv", row.names=FALSE,
> col.names=FALSE, quote=FALSE, sep="\t")
> and put them under nbtest/train and nbtest/test in HDFS
> then issue
> mahout trainclassifier --input nbtest/train --output nbtest/output
> mahout testclassifier --testDir nbtest/test --model nbtest/output
> trainclassifier seems succed, but testclassifier failed with this:
> [gpadmin@linuxsvr2 mahtest]$ mahout testclassifier --testDir
> nbtest/test --model nbtest/output
> Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
> HADOOP_CONF_DIR=/usr/local/hadoop/conf
> MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.6-SNAPSHOT-job.jar
> 11/08/15 18:06:20 WARN driver.MahoutDriver: No testclassifier.props
> found on classpath, will use command-line arguments only
> 11/08/15 18:06:20 INFO bayes.TestClassifier: Loading model from:
> {basePath=nbtest/output, classifierType=bayes, alpha_i=1.0,
> dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
> defaultCat=unknown, testDirPath=nbtest/test}
> 11/08/15 18:06:20 INFO bayes.TestClassifier: Testing Bayes Classifier
> 11/08/15 18:06:20 INFO bayes.SequenceFileModelReader: 77319.90481464032
> 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: normal
> -213.05542661827678 442.8886516970405 -0.48105867197522617
> 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: anomaly
> -442.8886516970405 442.8886516970405 -1.0
> 11/08/15 18:06:20 INFO bayes.TestClassifier:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :          0             锟
> Incorrectly Classified Instances        :          0                 锟
> Total Classified Instances              :          0
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       <--Classified as
> 0       0       0        |  0           a     = normal
> 0       0       0        |  0           b     = anomaly
> 0       0       0        |  0           c     = unknown
> Default Category: unknown: 2
> 11/08/15 18:06:20 INFO driver.MahoutDriver: Program took 746 ms

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira