You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ramprakash Ramamoorthy <yo...@gmail.com> on 2011/12/16 10:10:41 UTC

Input to a bayes classifier

Dear all,

           I am doing my final year project using apache mahout. I have
tried working with the bayesian classifier and it works fine for a single
.txt input. My use case would be this way, I need to input a folder path
(which contains .txt files to be classified) and the output should be the
name of the file,classification category,score.

           This might sound silly for you guys, but I am a very beginner to
mahout and Java as well. Kindly help me in this aspect!

-- 
With Thanks and Regards,
Ramprakash R,
B.Tech ICT,
SASTRA University.

Re: Input to a bayes classifier

Posted by Lance Norskog <go...@gmail.com>.
Try replicating the 20newsgroups shell script where it runs Naive
Bayes. It does a fairly ornate range of changes and filters. I
understand part of it.

On Fri, Dec 16, 2011 at 7:50 AM, JAGANADH G <ja...@gmail.com> wrote:
>> @Jagan
>>
>> My pre-processed folder has three folders in it - Sigma_j, Sigma_k,
>> Sigma_kSigma_j and each has part files in it.
>>
>> You mean me to check those files?
>>
>
> @Ramaprakash
>
> I think you misunderstood my point . I am not asking to inspect the model.
> You have to do a preprocessing before the text passed to classifier .
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in



-- 
Lance Norskog
goksron@gmail.com

Re: Input to a bayes classifier

Posted by JAGANADH G <ja...@gmail.com>.
> @Jagan
>
> My pre-processed folder has three folders in it - Sigma_j, Sigma_k,
> Sigma_kSigma_j and each has part files in it.
>
> You mean me to check those files?
>

@Ramaprakash

I think you misunderstood my point . I am not asking to inspect the model.
You have to do a preprocessing before the text passed to classifier .
-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in

Re: Input to a bayes classifier

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.
On Fri, Dec 16, 2011 at 7:07 PM, JAGANADH G <ja...@gmail.com> wrote:

> > Also, since the command line (bin/mahout) gives me a proper response and
> > the same thing through java does not, that is the problem now. Is there a
> > problem with my JAVA by any chance?
> >
>  @Ramaprakash
>
> That is what I told.
> After reading the file contents do the preprocessing then pass it to the
> classifier . Make code change accordingly
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
>

@Jagan

My pre-processed folder has three folders in it - Sigma_j, Sigma_k,
Sigma_kSigma_j and each has part files in it.

You mean me to check those files?

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
B.Tech ICT,
SASTRA University.
+91 9626975420

Re: Input to a bayes classifier

Posted by JAGANADH G <ja...@gmail.com>.
> Also, since the command line (bin/mahout) gives me a proper response and
> the same thing through java does not, that is the problem now. Is there a
> problem with my JAVA by any chance?
>
 @Ramaprakash

That is what I told.
After reading the file contents do the preprocessing then pass it to the
classifier . Make code change accordingly
-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in

Re: Input to a bayes classifier

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.
On Fri, Dec 16, 2011 at 6:58 PM, JAGANADH G <ja...@gmail.com> wrote:

> @ Ramaprakash
> Whwnmahout trains a model it does some normalisations in the training set
> such as convert to lower case, remove stop words etc..
> When you are giving input text to the classifier first make sure that u
> convert the entire text to lower-case and removed stop words etc ..
>
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
>

@Jagan

Yes, I have considered all those. Also I had prepared the trained model via
the command line. My trained model is going to be a static one and I am not
going to change it.

Also, since the command line (bin/mahout) gives me a proper response and
the same thing through java does not, that is the problem now. Is there a
problem with my JAVA by any chance?

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
B.Tech ICT,
SASTRA University.
+91 9626975420

Re: Input to a bayes classifier

Posted by JAGANADH G <ja...@gmail.com>.
@ Ramaprakash
Whwnmahout trains a model it does some normalisations in the training set
such as convert to lower case, remove stop words etc..
When you are giving input text to the classifier first make sure that u
convert the entire text to lower-case and removed stop words etc ..


-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in

Re: Input to a bayes classifier

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.
On Fri, Dec 16, 2011 at 3:44 PM, Ramprakash Ramamoorthy <
youngestachiever@gmail.com> wrote:

>
>
> On Fri, Dec 16, 2011 at 3:40 PM, JAGANADH G <ja...@gmail.com> wrote:
>
>> > Ok. So i will basically write a java file that calls the classifier
>> > function, and the folder path as parameters. Should I write it in
>> > mahout-core? If not, where should I write the file?
>> >
>>
>>
>>  @Ramaprakash
>>
>> It can be done in your classifier java code itself.
>> Create a method called listDir which returns all the .txt files in the
>> directory. Itreate the list and open each files and pass to classifier .
>> that is all . There is no need to got to mahout-core etc.. Still if you
>> feel it hard please show your code
>>
>>
>>
>> --
>> **********************************
>> JAGANADH G
>> http://jaganadhg.in
>> *ILUGCBE*
>> http://ilugcbe.org.in
>>
>
> @Jagan
>
>        That is great news. Will go ahead. Thanks :)
>
>
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> B.Tech ICT,
> SASTRA University.
> +91 9626975420
>
>
@Jagan

I had been executing my classifier through the command line only so far,
that is through /bin/mahout.

Just attempted to write this java file that takes a single file as input.

*package org.apache.mahout.classifier.bayes;*
*
*
*import java.io.BufferedReader;*
*import java.io.File;*
*import java.io.FileReader;*
*import java.io.IOException;*
*import java.util.List;*
*
*
*import org.apache.mahout.classifier.ClassifierResult;*
*import org.apache.mahout.classifier.bayes.algorithm.BayesAlgorithm;*
*import org.apache.mahout.classifier.bayes.common.BayesParameters;*
*import org.apache.mahout.classifier.bayes.datastore.InMemoryBayesDatastore;
*
*import
org.apache.mahout.classifier.bayes.exceptions.InvalidDatastoreException;*
*import org.apache.mahout.classifier.bayes.interfaces.Algorithm;*
*import org.apache.mahout.classifier.bayes.interfaces.Datastore;*
*import org.apache.mahout.classifier.bayes.model.ClassifierContext;*
*import org.apache.mahout.common.nlp.NGrams;*
*
*
*public class ramSample {*
*
*
* /***
* * @param args*
* * @throws IOException *
* * @throws InvalidDatastoreException *
* */*
* public static void main(String[] args) throws IOException,
InvalidDatastoreException {*
* final BayesParameters params=new BayesParameters();*
* params.setGramSize(1);*
*
params.setBasePath("/home/ramprakash-pt09/mahout-distribution-0.5/examples/src/main/java/org/apache/mahout/classifier/bayes/bayes-model");
*
* params.set( "verbose", "false" );*
* params.set( "classifierType", "bayes" );*
* params.set( "defaultCat", "OTHER" );*
* params.set( "encoding", "UTF-8" );*
* params.set( "alpha_i", "1.0" );*
* params.set( "dataSource", "hdfs" );*
* *
* Algorithm algorithm=new BayesAlgorithm();*
* Datastore datastore = new InMemoryBayesDatastore( params );*
* ClassifierContext classifier = new ClassifierContext( algorithm,
datastore );*
* classifier.initialize();*
* *
* File file=new
File("/home/ramprakash-pt09/mahout-distribution-0.5/examples/src/main/java/org/apache/mahout/classifier/bayes/input.txt");
*
* *
*      final BufferedReader reader = new BufferedReader( new FileReader(
file ) );*
*      String entry = reader.readLine();*
*      *
*      while( entry != null ) {*
*          List< String > document = new NGrams( entry, *
*                          Integer.parseInt( params.get( "gramSize" ) ) )*
*                          .generateNGramsWithoutLabel();*
*
*
*          ClassifierResult result = classifier.classifyDocument( *
*                           document.toArray( new String[ document.size() ]
), *
*                           params.get( "defaultCat" ));          *
*
*
*          entry = reader.readLine();*
* }*
* }*
*
*
*}*


On compiling and running this code, I get the following output :

*16 Dec, 2011 6:37:05 PM org.slf4j.impl.JCLLoggerAdapter info*
*INFO: 57425.12741460857*
*16 Dec, 2011 6:37:06 PM org.slf4j.impl.JCLLoggerAdapter info*
*INFO: pos -374948.0234153431 374948.0234153431 -1.0*
*16 Dec, 2011 6:37:06 PM org.slf4j.impl.JCLLoggerAdapter info*
*INFO: neg -236477.77478425365 374948.0234153431 -0.630694816391388 *
*
*
I have two categories : pos & neg. But this states both. When checking the
same input content through /bin mahout, the following is the output.

*INFO: Category for examples/ACTUAL/input.txt is
ClassifierResult{category='pos', score=35.42897640254213}*
*16 Dec, 2011 6:43:18 PM org.slf4j.impl.JCLLoggerAdapter info*
*
*
I can make the input folder parsing via Java IO, but this seems to be a
bigger problem now - running the classifier through a JAVA file. Sorry for
bugging and thanks for your response.


-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
B.Tech ICT,
SASTRA University.
+91 9626975420

Re: Input to a bayes classifier

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.
On Fri, Dec 16, 2011 at 3:40 PM, JAGANADH G <ja...@gmail.com> wrote:

> > Ok. So i will basically write a java file that calls the classifier
> > function, and the folder path as parameters. Should I write it in
> > mahout-core? If not, where should I write the file?
> >
>
>
>  @Ramaprakash
>
> It can be done in your classifier java code itself.
> Create a method called listDir which returns all the .txt files in the
> directory. Itreate the list and open each files and pass to classifier .
> that is all . There is no need to got to mahout-core etc.. Still if you
> feel it hard please show your code
>
>
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
>

@Jagan

       That is great news. Will go ahead. Thanks :)



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
B.Tech ICT,
SASTRA University.
+91 9626975420

Re: Input to a bayes classifier

Posted by JAGANADH G <ja...@gmail.com>.
> Ok. So i will basically write a java file that calls the classifier
> function, and the folder path as parameters. Should I write it in
> mahout-core? If not, where should I write the file?
>


 @Ramaprakash

It can be done in your classifier java code itself.
Create a method called listDir which returns all the .txt files in the
directory. Itreate the list and open each files and pass to classifier .
that is all . There is no need to got to mahout-core etc.. Still if you
feel it hard please show your code



-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in

Re: Input to a bayes classifier

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.
On Fri, Dec 16, 2011 at 3:10 PM, JAGANADH G <ja...@gmail.com> wrote:

> @ Ramaprakash
>
> I assume that you are already created your classifier and tested with one
> file.
> To read each file from a folder and predict sentiment of each file is just
> a simple issue .
> It can be done with java.io*
> Just specify the folder name. Get files and read ech file pass it to
> classifier method. return class with file name .
> Quite simple
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in


Ok. So i will basically write a java file that calls the classifier
function, and the folder path as parameters. Should I write it in
mahout-core? If not, where should I write the file?



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
B.Tech ICT,
SASTRA University.
+91 9626975420

Re: Input to a bayes classifier

Posted by JAGANADH G <ja...@gmail.com>.
@ Ramaprakash

I assume that you are already created your classifier and tested with one
file.
To read each file from a folder and predict sentiment of each file is just
a simple issue .
It can be done with java.io*
Just specify the folder name. Get files and read ech file pass it to
classifier method. return class with file name .
Quite simple
-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in