You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2010/06/08 10:55:06 UTC

Re: problem in using apache mahout

Hi Rakesh, the classifier cannot read Arff at the moment. The input is a
file with each line tab separated as key<TAB>value. The key is class, the
value is space separated tokens. The same format is used for classification.
let me know if you can get this running correctly. I will be updating the
code to make it run using vectors. But for now you will have to use this
format. See the twenty newsgroups example

Robin

PS: Subscribe by sending email to dev-subscribe@mahout.apache.org and reply
to the confirmation email that comes.


*Forwarded message:*

Dear Robin,

First of all my sincere apologies for directly emailing you regarding a
problem with mahout. I have been trying to subscribe to the apache mahout
mailing list but the mailer daemon is not responding. I will really
appreciate it if you can help me find a solution to my problem

I am trying to use mahout's bayesian classifier over the iris dataset.
Please note that I am using mahout-0.3. These are the steps I followed

*Step 1*: convert the iris.arff file to mahout's vector format. I used the
following command

*java -cp
/home/rakesh/mahout/mahout-0.3/utils/target/mahout-utils-0.3.jar:$(echo
/home/rakesh/mahout/mahout-0.3/utils/target/dependency/*.jar . | sed 's/
/:/g')  org.apache.mahout.utils.vectors.arff.Driver -d
/home/rakesh/workspace/mahout/input/ -o
/home/rakesh/workspace/mahout/output/ -t
/home/rakesh/workspace/mahout/output/dict.txt*


this created the iris.arff.mvc file but the dict.txt was empty. Nevertheless
I went ahead with the training step

*Step 2*: training

*$HADOOP_HOME/bin/hadoop     jar
$MAHOUT_HOME/core/target/mahout-core-0.3.job
org.apache.mahout.classifier.bayes.TrainClassifier     -i output     -o
model     -type bayes --gramSize 1 -source hdfs*

This step also went through and I did not get any exceptions.

*Step 3*: Test over the input dataset

$HADOOP_HOME/bin/hadoop     jar
$MAHOUT_HOME/core/target/mahout-core-0.3.job
org.apache.mahout.classifier.bayes.TestClassifier     -m model     -d output
    -ng 1     -type bayes     -source hdfs -method sequential --verbose

This command gives the following error

*rakesh@ubuntu:~/workspace/mahout$ $HADOOP_HOME/bin/hadoop     jar
$MAHOUT_HOME/core/target/mahout-core-0.3.job
org.apache.mahout.classifier.bayes.TestClassifier     -m model     -d output
    -ng 1     -type bayes     -source hdfs -method sequential --verbose*
*10/06/08 03:19:09 INFO bayes.TestClassifier: Loading model from:
{basePath=model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
gramSize=1, verbose=true, encoding=UTF-8, defaultCat=unknown,
testDirPath=output}*
*10/06/08 03:19:09 INFO bayes.TestClassifier: Testing Bayes Classifier*
*10/06/08 03:19:09 INFO bayes.TestClassifier: --------------*
*10/06/08 03:19:09 INFO bayes.TestClassifier: Testing: output/iris.arff.mvc*
*java.lang.NullPointerException*
* **at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:100)
*
* **at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:116)
*
* **at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:120)
*
* **at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:88)
*
* **at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:68)
*
* **at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:256)
*
* **at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:176)
*
* **at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
* **at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
*
* **at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
*
* **at java.lang.reflect.Method.invoke(Method.java:597)*
* **at org.apache.hadoop.util.RunJar.main(RunJar.java:155)*
* **at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)*
* **at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
* **at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)*
* **at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)*

I am getting a null pointer exception and I have no clue as to how to
proceed. Could you please let me know if I am doing something wrong or
whether its a bug. I have attached the irir.arff file and iris.arff.mvc file
for your use.

thanks,
rakesh

Re: problem in using apache mahout

Posted by "k6.amruta" <k6...@gmail.com>.
Hi,

I am a student and new to machine learning and I want to use Naive Bayesian
for graph mining purpose.
Graph data has following information:

Nodes & edges both have values associated with themselves.
Nodes & edges both have number of attributes associated with it.

I read that this algorithm takes Key,Value pairs as input. Is it right? or
is there any other input format?

My question is how can i convert such a graph data in key,value pairs or any
other required data format?
Thanks in advance!





--
View this message in context: http://lucene.472066.n3.nabble.com/Re-problem-in-using-apache-mahout-tp878783p3997067.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

Re: problem in using apache mahout

Posted by Ted Dunning <te...@gmail.com>.
I think you will have problems there.  The problem is that the iris data set
has 5 or so continuous variables and naive bayes really only likes sparse
binary features.

There is a patch at https://issues.apache.org/jira/browse/MAHOUT-228 that
gives you the beginnings of an online logistic regression classifier (you
will need to be ambitious to use that).

The random forest implementation already in mahout can handle continuous
variables as well.

Neither of these other two implementations is nearly as polished (yet) as
the Naive Bayes stuff.

On Tue, Jun 8, 2010 at 1:55 AM, Robin Anil <ro...@gmail.com> wrote:

> I am trying to use mahout's bayesian classifier over the iris dataset.
>