You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jyoti Gupta <jy...@gmail.com> on 2011/04/01 07:56:14 UTC

Naive Bayes score comparison across multiple classifiers

Hi,

I am using NaiveBayes Classifier to classify my input into one of N
categories. I am creating N binary classifiers using One vs All approach.
The train document size is different for each classifier and the probability
of each category is same (1/N).

Can I compare these scores across these classifiers to get a final category
as output? Or can you suggest any way to normalize them?

Also, while testing I found that the label returned by the
ClassifierContext.classify method has lower score value than the other
label.
e.g. there are two categories... X and Non-X
classifier.classify(input) returns (X,score1)
and classifier.classify(input,2) returns a list [{X,score1}, {Non-X,score2}]
Here I found that score1 < score2. I did not go into the implementation but
I thought that greater score means greater probability.

Thanks,
Jyoti

Re: Naive Bayes score comparison across multiple classifiers

Posted by Robin Anil <ro...@gmail.com>.
Inspect the confusion matrix for nb and cnb. It will tell you where majority
of the errors are happening and which categories are misclassified the most.
You might need more samples or better features to improve from 48%. Did you
try with higher ngrams? say 2 or 3. But be careful, higher n-grams generates
useless features which might show higher prediction on training set but
might actually perform worse on the test set.


Robin


On Wed, May 25, 2011 at 6:59 PM, Jyoti Gupta <jy...@gmail.com>wrote:

> I was using the latest 0.6-Snapshot version . It worked fine with the 0.4
> version but still gave the same accuracy of 48%.
> It might be because my categories have some features in common means a
> sample can belong to both the categories at the same time.
>
> But regarding my original question, are the scores from different
> classifiers comparable?
>
> On Wed, May 25, 2011 at 6:41 PM, Robin Anil <ro...@gmail.com> wrote:
>
> > Might have to check the input format. and the model generated. Cannot tel
> > otherwise.  Does it work when you change method to mapreduce from
> > sequential
> > during classification?
> >
> >
> > On Wed, May 25, 2011 at 4:28 PM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> > >wrote:
> >
> > > Its a plain text classification and I used 1K samples for each
> category.
> > >
> > > Btw I again tried with cbayes algorithm and on testing, all the samples
> > got
> > > classified as Unknown category.
> > >
> > > The command I used to train is :
> > > ./bin/mahout trainclassifier -i <path to the train directory>
> > > -o <path to the model directory>  -type cbayes -ng 1 -source hdfs
> > >
> > > To Test the Classifier
> > > ./bin/mahout testclassifier -m <path to the model directory>
> > > -d <path to the test directory> -type cbayes -ng 1 -source hdfs -method
> > > sequential
> > >
> > > Is anything wrong with what I am doing?
> > >
> > > On Wed, May 25, 2011 at 2:58 PM, Robin Anil <ro...@gmail.com>
> > wrote:
> > >
> > > > Depends on size of data. The NB implementation works well for a lot
> of
> > > data
> > > > and large records(especially text). if you are trying other type of
> > data
> > > > like attribute -enums and dense features, It might not work as well.
> > > >
> > > > Robin
> > > >
> > > > On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <
> > jyotigupta.iitd@gmail.com
> > > > >wrote:
> > > >
> > > > > I have tried that previously but it was not giving good accuracy.
> Got
> > > > > around
> > > > > 50 % accuracy for 14 categories.
> > > > >
> > > > > On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <
> ted.dunning@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Why not just use the multi-class capability of the Naive Bayes
> > > > > > categorizers?
> > > > > >
> > > > > > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
> > > > jyotigupta.iitd@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am using NaiveBayes Classifier to classify my input into one
> of
> > N
> > > > > > > categories. I am creating N binary classifiers using One vs All
> > > > > approach.
> > > > > > > The train document size is different for each classifier and
> the
> > > > > > > probability
> > > > > > > of each category is same (1/N).
> > > > > > >
> > > > > > > Can I compare these scores across these classifiers to get a
> > final
> > > > > > category
> > > > > > > as output? Or can you suggest any way to normalize them?
> > > > > > >
> > > > > > > Also, while testing I found that the label returned by the
> > > > > > > ClassifierContext.classify method has lower score value than
> the
> > > > other
> > > > > > > label.
> > > > > > > e.g. there are two categories... X and Non-X
> > > > > > > classifier.classify(input) returns (X,score1)
> > > > > > > and classifier.classify(input,2) returns a list [{X,score1},
> > > > > > > {Non-X,score2}]
> > > > > > > Here I found that score1 < score2. I did not go into the
> > > > implementation
> > > > > > but
> > > > > > > I thought that greater score means greater probability.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jyoti
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > "Be the change you want to see"
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > "Be the change you want to see"
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > "Be the change you want to see"
> > >
> >
>
>
>
> --
> "Be the change you want to see"
>

Re: Naive Bayes score comparison across multiple classifiers

Posted by Robin Anil <ro...@gmail.com>.
Scores are not probabilities so no, they are not comparable.


On Wed, May 25, 2011 at 6:59 PM, Jyoti Gupta <jy...@gmail.com>wrote:

> I was using the latest 0.6-Snapshot version . It worked fine with the 0.4
> version but still gave the same accuracy of 48%.
> It might be because my categories have some features in common means a
> sample can belong to both the categories at the same time.
>
> But regarding my original question, are the scores from different
> classifiers comparable?
>
> On Wed, May 25, 2011 at 6:41 PM, Robin Anil <ro...@gmail.com> wrote:
>
> > Might have to check the input format. and the model generated. Cannot tel
> > otherwise.  Does it work when you change method to mapreduce from
> > sequential
> > during classification?
> >
> >
> > On Wed, May 25, 2011 at 4:28 PM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> > >wrote:
> >
> > > Its a plain text classification and I used 1K samples for each
> category.
> > >
> > > Btw I again tried with cbayes algorithm and on testing, all the samples
> > got
> > > classified as Unknown category.
> > >
> > > The command I used to train is :
> > > ./bin/mahout trainclassifier -i <path to the train directory>
> > > -o <path to the model directory>  -type cbayes -ng 1 -source hdfs
> > >
> > > To Test the Classifier
> > > ./bin/mahout testclassifier -m <path to the model directory>
> > > -d <path to the test directory> -type cbayes -ng 1 -source hdfs -method
> > > sequential
> > >
> > > Is anything wrong with what I am doing?
> > >
> > > On Wed, May 25, 2011 at 2:58 PM, Robin Anil <ro...@gmail.com>
> > wrote:
> > >
> > > > Depends on size of data. The NB implementation works well for a lot
> of
> > > data
> > > > and large records(especially text). if you are trying other type of
> > data
> > > > like attribute -enums and dense features, It might not work as well.
> > > >
> > > > Robin
> > > >
> > > > On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <
> > jyotigupta.iitd@gmail.com
> > > > >wrote:
> > > >
> > > > > I have tried that previously but it was not giving good accuracy.
> Got
> > > > > around
> > > > > 50 % accuracy for 14 categories.
> > > > >
> > > > > On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <
> ted.dunning@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Why not just use the multi-class capability of the Naive Bayes
> > > > > > categorizers?
> > > > > >
> > > > > > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
> > > > jyotigupta.iitd@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am using NaiveBayes Classifier to classify my input into one
> of
> > N
> > > > > > > categories. I am creating N binary classifiers using One vs All
> > > > > approach.
> > > > > > > The train document size is different for each classifier and
> the
> > > > > > > probability
> > > > > > > of each category is same (1/N).
> > > > > > >
> > > > > > > Can I compare these scores across these classifiers to get a
> > final
> > > > > > category
> > > > > > > as output? Or can you suggest any way to normalize them?
> > > > > > >
> > > > > > > Also, while testing I found that the label returned by the
> > > > > > > ClassifierContext.classify method has lower score value than
> the
> > > > other
> > > > > > > label.
> > > > > > > e.g. there are two categories... X and Non-X
> > > > > > > classifier.classify(input) returns (X,score1)
> > > > > > > and classifier.classify(input,2) returns a list [{X,score1},
> > > > > > > {Non-X,score2}]
> > > > > > > Here I found that score1 < score2. I did not go into the
> > > > implementation
> > > > > > but
> > > > > > > I thought that greater score means greater probability.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jyoti
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > "Be the change you want to see"
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > "Be the change you want to see"
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > "Be the change you want to see"
> > >
> >
>
>
>
> --
> "Be the change you want to see"
>

Re: Naive Bayes score comparison across multiple classifiers

Posted by Daniel McEnnis <dm...@gmail.com>.
Especially with only 1K entries per category, 48% is a fairly typical
result over 14 categories. It is *much* higher than chance.

Daniel.

On Wed, May 25, 2011 at 9:29 AM, Jyoti Gupta <jy...@gmail.com> wrote:
> I was using the latest 0.6-Snapshot version . It worked fine with the 0.4
> version but still gave the same accuracy of 48%.
> It might be because my categories have some features in common means a
> sample can belong to both the categories at the same time.
>
> But regarding my original question, are the scores from different
> classifiers comparable?
>
> On Wed, May 25, 2011 at 6:41 PM, Robin Anil <ro...@gmail.com> wrote:
>
>> Might have to check the input format. and the model generated. Cannot tel
>> otherwise.  Does it work when you change method to mapreduce from
>> sequential
>> during classification?
>>
>>
>> On Wed, May 25, 2011 at 4:28 PM, Jyoti Gupta <jyotigupta.iitd@gmail.com
>> >wrote:
>>
>> > Its a plain text classification and I used 1K samples for each category.
>> >
>> > Btw I again tried with cbayes algorithm and on testing, all the samples
>> got
>> > classified as Unknown category.
>> >
>> > The command I used to train is :
>> > ./bin/mahout trainclassifier -i <path to the train directory>
>> > -o <path to the model directory>  -type cbayes -ng 1 -source hdfs
>> >
>> > To Test the Classifier
>> > ./bin/mahout testclassifier -m <path to the model directory>
>> > -d <path to the test directory> -type cbayes -ng 1 -source hdfs -method
>> > sequential
>> >
>> > Is anything wrong with what I am doing?
>> >
>> > On Wed, May 25, 2011 at 2:58 PM, Robin Anil <ro...@gmail.com>
>> wrote:
>> >
>> > > Depends on size of data. The NB implementation works well for a lot of
>> > data
>> > > and large records(especially text). if you are trying other type of
>> data
>> > > like attribute -enums and dense features, It might not work as well.
>> > >
>> > > Robin
>> > >
>> > > On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <
>> jyotigupta.iitd@gmail.com
>> > > >wrote:
>> > >
>> > > > I have tried that previously but it was not giving good accuracy. Got
>> > > > around
>> > > > 50 % accuracy for 14 categories.
>> > > >
>> > > > On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Why not just use the multi-class capability of the Naive Bayes
>> > > > > categorizers?
>> > > > >
>> > > > > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
>> > > jyotigupta.iitd@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I am using NaiveBayes Classifier to classify my input into one of
>> N
>> > > > > > categories. I am creating N binary classifiers using One vs All
>> > > > approach.
>> > > > > > The train document size is different for each classifier and the
>> > > > > > probability
>> > > > > > of each category is same (1/N).
>> > > > > >
>> > > > > > Can I compare these scores across these classifiers to get a
>> final
>> > > > > category
>> > > > > > as output? Or can you suggest any way to normalize them?
>> > > > > >
>> > > > > > Also, while testing I found that the label returned by the
>> > > > > > ClassifierContext.classify method has lower score value than the
>> > > other
>> > > > > > label.
>> > > > > > e.g. there are two categories... X and Non-X
>> > > > > > classifier.classify(input) returns (X,score1)
>> > > > > > and classifier.classify(input,2) returns a list [{X,score1},
>> > > > > > {Non-X,score2}]
>> > > > > > Here I found that score1 < score2. I did not go into the
>> > > implementation
>> > > > > but
>> > > > > > I thought that greater score means greater probability.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Jyoti
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > "Be the change you want to see"
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > "Be the change you want to see"
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > "Be the change you want to see"
>> >
>>
>
>
>
> --
> "Be the change you want to see"
>

Re: Naive Bayes score comparison across multiple classifiers

Posted by Jyoti Gupta <jy...@gmail.com>.
I was using the latest 0.6-Snapshot version . It worked fine with the 0.4
version but still gave the same accuracy of 48%.
It might be because my categories have some features in common means a
sample can belong to both the categories at the same time.

But regarding my original question, are the scores from different
classifiers comparable?

On Wed, May 25, 2011 at 6:41 PM, Robin Anil <ro...@gmail.com> wrote:

> Might have to check the input format. and the model generated. Cannot tel
> otherwise.  Does it work when you change method to mapreduce from
> sequential
> during classification?
>
>
> On Wed, May 25, 2011 at 4:28 PM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> >wrote:
>
> > Its a plain text classification and I used 1K samples for each category.
> >
> > Btw I again tried with cbayes algorithm and on testing, all the samples
> got
> > classified as Unknown category.
> >
> > The command I used to train is :
> > ./bin/mahout trainclassifier -i <path to the train directory>
> > -o <path to the model directory>  -type cbayes -ng 1 -source hdfs
> >
> > To Test the Classifier
> > ./bin/mahout testclassifier -m <path to the model directory>
> > -d <path to the test directory> -type cbayes -ng 1 -source hdfs -method
> > sequential
> >
> > Is anything wrong with what I am doing?
> >
> > On Wed, May 25, 2011 at 2:58 PM, Robin Anil <ro...@gmail.com>
> wrote:
> >
> > > Depends on size of data. The NB implementation works well for a lot of
> > data
> > > and large records(especially text). if you are trying other type of
> data
> > > like attribute -enums and dense features, It might not work as well.
> > >
> > > Robin
> > >
> > > On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <
> jyotigupta.iitd@gmail.com
> > > >wrote:
> > >
> > > > I have tried that previously but it was not giving good accuracy. Got
> > > > around
> > > > 50 % accuracy for 14 categories.
> > > >
> > > > On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > Why not just use the multi-class capability of the Naive Bayes
> > > > > categorizers?
> > > > >
> > > > > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
> > > jyotigupta.iitd@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using NaiveBayes Classifier to classify my input into one of
> N
> > > > > > categories. I am creating N binary classifiers using One vs All
> > > > approach.
> > > > > > The train document size is different for each classifier and the
> > > > > > probability
> > > > > > of each category is same (1/N).
> > > > > >
> > > > > > Can I compare these scores across these classifiers to get a
> final
> > > > > category
> > > > > > as output? Or can you suggest any way to normalize them?
> > > > > >
> > > > > > Also, while testing I found that the label returned by the
> > > > > > ClassifierContext.classify method has lower score value than the
> > > other
> > > > > > label.
> > > > > > e.g. there are two categories... X and Non-X
> > > > > > classifier.classify(input) returns (X,score1)
> > > > > > and classifier.classify(input,2) returns a list [{X,score1},
> > > > > > {Non-X,score2}]
> > > > > > Here I found that score1 < score2. I did not go into the
> > > implementation
> > > > > but
> > > > > > I thought that greater score means greater probability.
> > > > > >
> > > > > > Thanks,
> > > > > > Jyoti
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > "Be the change you want to see"
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > "Be the change you want to see"
> > > >
> > >
> >
> >
> >
> > --
> > "Be the change you want to see"
> >
>



-- 
"Be the change you want to see"

Re: Naive Bayes score comparison across multiple classifiers

Posted by Robin Anil <ro...@gmail.com>.
Might have to check the input format. and the model generated. Cannot tel
otherwise.  Does it work when you change method to mapreduce from sequential
during classification?


On Wed, May 25, 2011 at 4:28 PM, Jyoti Gupta <jy...@gmail.com>wrote:

> Its a plain text classification and I used 1K samples for each category.
>
> Btw I again tried with cbayes algorithm and on testing, all the samples got
> classified as Unknown category.
>
> The command I used to train is :
> ./bin/mahout trainclassifier -i <path to the train directory>
> -o <path to the model directory>  -type cbayes -ng 1 -source hdfs
>
> To Test the Classifier
> ./bin/mahout testclassifier -m <path to the model directory>
> -d <path to the test directory> -type cbayes -ng 1 -source hdfs -method
> sequential
>
> Is anything wrong with what I am doing?
>
> On Wed, May 25, 2011 at 2:58 PM, Robin Anil <ro...@gmail.com> wrote:
>
> > Depends on size of data. The NB implementation works well for a lot of
> data
> > and large records(especially text). if you are trying other type of data
> > like attribute -enums and dense features, It might not work as well.
> >
> > Robin
> >
> > On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> > >wrote:
> >
> > > I have tried that previously but it was not giving good accuracy. Got
> > > around
> > > 50 % accuracy for 14 categories.
> > >
> > > On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > Why not just use the multi-class capability of the Naive Bayes
> > > > categorizers?
> > > >
> > > > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
> > jyotigupta.iitd@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using NaiveBayes Classifier to classify my input into one of N
> > > > > categories. I am creating N binary classifiers using One vs All
> > > approach.
> > > > > The train document size is different for each classifier and the
> > > > > probability
> > > > > of each category is same (1/N).
> > > > >
> > > > > Can I compare these scores across these classifiers to get a final
> > > > category
> > > > > as output? Or can you suggest any way to normalize them?
> > > > >
> > > > > Also, while testing I found that the label returned by the
> > > > > ClassifierContext.classify method has lower score value than the
> > other
> > > > > label.
> > > > > e.g. there are two categories... X and Non-X
> > > > > classifier.classify(input) returns (X,score1)
> > > > > and classifier.classify(input,2) returns a list [{X,score1},
> > > > > {Non-X,score2}]
> > > > > Here I found that score1 < score2. I did not go into the
> > implementation
> > > > but
> > > > > I thought that greater score means greater probability.
> > > > >
> > > > > Thanks,
> > > > > Jyoti
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > "Be the change you want to see"
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > "Be the change you want to see"
> > >
> >
>
>
>
> --
> "Be the change you want to see"
>

Re: Naive Bayes score comparison across multiple classifiers

Posted by Jyoti Gupta <jy...@gmail.com>.
Its a plain text classification and I used 1K samples for each category.

Btw I again tried with cbayes algorithm and on testing, all the samples got
classified as Unknown category.

The command I used to train is :
./bin/mahout trainclassifier -i <path to the train directory>
-o <path to the model directory>  -type cbayes -ng 1 -source hdfs

To Test the Classifier
./bin/mahout testclassifier -m <path to the model directory>
-d <path to the test directory> -type cbayes -ng 1 -source hdfs -method
sequential

Is anything wrong with what I am doing?

On Wed, May 25, 2011 at 2:58 PM, Robin Anil <ro...@gmail.com> wrote:

> Depends on size of data. The NB implementation works well for a lot of data
> and large records(especially text). if you are trying other type of data
> like attribute -enums and dense features, It might not work as well.
>
> Robin
>
> On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> >wrote:
>
> > I have tried that previously but it was not giving good accuracy. Got
> > around
> > 50 % accuracy for 14 categories.
> >
> > On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Why not just use the multi-class capability of the Naive Bayes
> > > categorizers?
> > >
> > > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
> jyotigupta.iitd@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using NaiveBayes Classifier to classify my input into one of N
> > > > categories. I am creating N binary classifiers using One vs All
> > approach.
> > > > The train document size is different for each classifier and the
> > > > probability
> > > > of each category is same (1/N).
> > > >
> > > > Can I compare these scores across these classifiers to get a final
> > > category
> > > > as output? Or can you suggest any way to normalize them?
> > > >
> > > > Also, while testing I found that the label returned by the
> > > > ClassifierContext.classify method has lower score value than the
> other
> > > > label.
> > > > e.g. there are two categories... X and Non-X
> > > > classifier.classify(input) returns (X,score1)
> > > > and classifier.classify(input,2) returns a list [{X,score1},
> > > > {Non-X,score2}]
> > > > Here I found that score1 < score2. I did not go into the
> implementation
> > > but
> > > > I thought that greater score means greater probability.
> > > >
> > > > Thanks,
> > > > Jyoti
> > > >
> > > >
> > > >
> > > > --
> > > > "Be the change you want to see"
> > > >
> > >
> >
> >
> >
> > --
> > "Be the change you want to see"
> >
>



-- 
"Be the change you want to see"

Re: Naive Bayes score comparison across multiple classifiers

Posted by Robin Anil <ro...@gmail.com>.
Depends on size of data. The NB implementation works well for a lot of data
and large records(especially text). if you are trying other type of data
like attribute -enums and dense features, It might not work as well.

Robin

On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <jy...@gmail.com>wrote:

> I have tried that previously but it was not giving good accuracy. Got
> around
> 50 % accuracy for 14 categories.
>
> On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Why not just use the multi-class capability of the Naive Bayes
> > categorizers?
> >
> > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I am using NaiveBayes Classifier to classify my input into one of N
> > > categories. I am creating N binary classifiers using One vs All
> approach.
> > > The train document size is different for each classifier and the
> > > probability
> > > of each category is same (1/N).
> > >
> > > Can I compare these scores across these classifiers to get a final
> > category
> > > as output? Or can you suggest any way to normalize them?
> > >
> > > Also, while testing I found that the label returned by the
> > > ClassifierContext.classify method has lower score value than the other
> > > label.
> > > e.g. there are two categories... X and Non-X
> > > classifier.classify(input) returns (X,score1)
> > > and classifier.classify(input,2) returns a list [{X,score1},
> > > {Non-X,score2}]
> > > Here I found that score1 < score2. I did not go into the implementation
> > but
> > > I thought that greater score means greater probability.
> > >
> > > Thanks,
> > > Jyoti
> > >
> > >
> > >
> > > --
> > > "Be the change you want to see"
> > >
> >
>
>
>
> --
> "Be the change you want to see"
>

Re: Naive Bayes score comparison across multiple classifiers

Posted by Ted Dunning <te...@gmail.com>.
Sorry... didn't see that you had said that.

A thousand training examples per category is very small for NB.  Try the SGD
framework.

Also, with such small data you may prefer to use non-scalable learning
techniques such
as are available in R.

On Wed, May 25, 2011 at 8:38 AM, Ted Dunning <te...@gmail.com> wrote:

> How much data do you have?
>
> On Wed, May 25, 2011 at 1:14 AM, Jyoti Gupta <jy...@gmail.com>wrote:
>
>> I have tried that previously but it was not giving good accuracy. Got
>> around
>> 50 % accuracy for 14 categories.
>>
>> On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>> > Why not just use the multi-class capability of the Naive Bayes
>> > categorizers?
>> >
>> > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
>> jyotigupta.iitd@gmail.com
>> > >wrote:
>> >
>> > > Hi,
>> > >
>> > > I am using NaiveBayes Classifier to classify my input into one of N
>> > > categories. I am creating N binary classifiers using One vs All
>> approach.
>> > > The train document size is different for each classifier and the
>> > > probability
>> > > of each category is same (1/N).
>> > >
>> > > Can I compare these scores across these classifiers to get a final
>> > category
>> > > as output? Or can you suggest any way to normalize them?
>> > >
>> > > Also, while testing I found that the label returned by the
>> > > ClassifierContext.classify method has lower score value than the other
>> > > label.
>> > > e.g. there are two categories... X and Non-X
>> > > classifier.classify(input) returns (X,score1)
>> > > and classifier.classify(input,2) returns a list [{X,score1},
>> > > {Non-X,score2}]
>> > > Here I found that score1 < score2. I did not go into the
>> implementation
>> > but
>> > > I thought that greater score means greater probability.
>> > >
>> > > Thanks,
>> > > Jyoti
>> > >
>> > >
>> > >
>> > > --
>> > > "Be the change you want to see"
>> > >
>> >
>>
>>
>>
>> --
>> "Be the change you want to see"
>>
>
>

Re: Naive Bayes score comparison across multiple classifiers

Posted by Ted Dunning <te...@gmail.com>.
How much data do you have?

On Wed, May 25, 2011 at 1:14 AM, Jyoti Gupta <jy...@gmail.com>wrote:

> I have tried that previously but it was not giving good accuracy. Got
> around
> 50 % accuracy for 14 categories.
>
> On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Why not just use the multi-class capability of the Naive Bayes
> > categorizers?
> >
> > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I am using NaiveBayes Classifier to classify my input into one of N
> > > categories. I am creating N binary classifiers using One vs All
> approach.
> > > The train document size is different for each classifier and the
> > > probability
> > > of each category is same (1/N).
> > >
> > > Can I compare these scores across these classifiers to get a final
> > category
> > > as output? Or can you suggest any way to normalize them?
> > >
> > > Also, while testing I found that the label returned by the
> > > ClassifierContext.classify method has lower score value than the other
> > > label.
> > > e.g. there are two categories... X and Non-X
> > > classifier.classify(input) returns (X,score1)
> > > and classifier.classify(input,2) returns a list [{X,score1},
> > > {Non-X,score2}]
> > > Here I found that score1 < score2. I did not go into the implementation
> > but
> > > I thought that greater score means greater probability.
> > >
> > > Thanks,
> > > Jyoti
> > >
> > >
> > >
> > > --
> > > "Be the change you want to see"
> > >
> >
>
>
>
> --
> "Be the change you want to see"
>

Re: Naive Bayes score comparison across multiple classifiers

Posted by Jyoti Gupta <jy...@gmail.com>.
I have tried that previously but it was not giving good accuracy. Got around
50 % accuracy for 14 categories.

On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <te...@gmail.com> wrote:

> Why not just use the multi-class capability of the Naive Bayes
> categorizers?
>
> On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> >wrote:
>
> > Hi,
> >
> > I am using NaiveBayes Classifier to classify my input into one of N
> > categories. I am creating N binary classifiers using One vs All approach.
> > The train document size is different for each classifier and the
> > probability
> > of each category is same (1/N).
> >
> > Can I compare these scores across these classifiers to get a final
> category
> > as output? Or can you suggest any way to normalize them?
> >
> > Also, while testing I found that the label returned by the
> > ClassifierContext.classify method has lower score value than the other
> > label.
> > e.g. there are two categories... X and Non-X
> > classifier.classify(input) returns (X,score1)
> > and classifier.classify(input,2) returns a list [{X,score1},
> > {Non-X,score2}]
> > Here I found that score1 < score2. I did not go into the implementation
> but
> > I thought that greater score means greater probability.
> >
> > Thanks,
> > Jyoti
> >
> >
> >
> > --
> > "Be the change you want to see"
> >
>



-- 
"Be the change you want to see"

Re: Naive Bayes score comparison across multiple classifiers

Posted by Ted Dunning <te...@gmail.com>.
Why not just use the multi-class capability of the Naive Bayes categorizers?

On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <jy...@gmail.com>wrote:

> Hi,
>
> I am using NaiveBayes Classifier to classify my input into one of N
> categories. I am creating N binary classifiers using One vs All approach.
> The train document size is different for each classifier and the
> probability
> of each category is same (1/N).
>
> Can I compare these scores across these classifiers to get a final category
> as output? Or can you suggest any way to normalize them?
>
> Also, while testing I found that the label returned by the
> ClassifierContext.classify method has lower score value than the other
> label.
> e.g. there are two categories... X and Non-X
> classifier.classify(input) returns (X,score1)
> and classifier.classify(input,2) returns a list [{X,score1},
> {Non-X,score2}]
> Here I found that score1 < score2. I did not go into the implementation but
> I thought that greater score means greater probability.
>
> Thanks,
> Jyoti
>
>
>
> --
> "Be the change you want to see"
>

Naive Bayes score comparison across multiple classifiers

Posted by Jyoti Gupta <jy...@gmail.com>.
Hi,

I am using NaiveBayes Classifier to classify my input into one of N
categories. I am creating N binary classifiers using One vs All approach.
The train document size is different for each classifier and the probability
of each category is same (1/N).

Can I compare these scores across these classifiers to get a final category
as output? Or can you suggest any way to normalize them?

Also, while testing I found that the label returned by the
ClassifierContext.classify method has lower score value than the other
label.
e.g. there are two categories... X and Non-X
classifier.classify(input) returns (X,score1)
and classifier.classify(input,2) returns a list [{X,score1}, {Non-X,score2}]
Here I found that score1 < score2. I did not go into the implementation but
I thought that greater score means greater probability.

Thanks,
Jyoti



-- 
"Be the change you want to see"