You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Timothy Mann <ma...@gmail.com> on 2012/10/22 06:16:08 UTC

classify vs. classifyFull in AbstractVectorClassifier

It seems strange to me that the classify method declared in
AbstractVectorClassifier returns a vector with n-1 scores, where n is the
number of categories. I understand that this decision was made for
efficiency reasons, but it seems like classify is the first place where
people will look in the API. Instead classifyFull provides the
implementation that a user may find more intuitive. Furthermore,
classifyFull does not require the assumption that the scores over all
categories represent probabilities that sum to one, and is therefore more
general. In fact, classify is not even implemented for the Naive Bayes
implementations but classifyFull is, which was initially confusing until I
understood what classify actually does. Any thoughts on this?

-Timothy Mann

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Ted Dunning <te...@gmail.com>.

Your idea sound good.  Better java docs are always good and never enough.  

Sent from my iPhone

On Oct 22, 2012, at 12:59 PM, Timothy Mann <ma...@gmail.com> wrote:

> Yes, I understand. Interfaces are almost impossible to get right the first
> time and equally impossible to change once they have been made public. My
> interest is in improving the javadoc to help point new users in the right
> direction. I was thinking about adding a note in the description of
> AbstractVectorClassifier that simply emphasizes the importance of
> classifyFull as a starting place.
> 
> I also plan on adding javadoc comments to methods where classify throws an
> UnsupportedOperationException to indicate this instead of allowing default
> copying of the superclass javadoc comment (which does not indicate that the
> method is unsupported).
> 
> Any other ideas?
> 
> -Timothy Mann
> 
> On Sun, Oct 21, 2012 at 11:20 PM, Ted Dunning <te...@gmail.com> wrote:
> 
>> Yes.
>> 
>> It seems stupid in retrospect.  Changing these things is very painful,
>> however, because we have no idea how many people will be affected.
>> 
>> On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <mann.timothy@gmail.com
>>> wrote:
>> 
>>> It seems strange to me that the classify method declared in
>>> AbstractVectorClassifier returns a vector with n-1 scores, where n is the
>>> number of categories. I understand that this decision was made for
>>> efficiency reasons, but it seems like classify is the first place where
>>> people will look in the API. Instead classifyFull provides the
>>> implementation that a user may find more intuitive. Furthermore,
>>> classifyFull does not require the assumption that the scores over all
>>> categories represent probabilities that sum to one, and is therefore more
>>> general. In fact, classify is not even implemented for the Naive Bayes
>>> implementations but classifyFull is, which was initially confusing until
>> I
>>> understood what classify actually does. Any thoughts on this?
>>> 
>>> -Timothy Mann
>>> 
>>

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Ted Dunning <te...@gmail.com>.

UOE also sounds like a good idea.  Lately I have adjusted my default method template in IntelliJ to just throw UOE in order to increase the likelihood I remember to adjust things.  

Sent from my iPhone

On Oct 22, 2012, at 12:59 PM, Timothy Mann <ma...@gmail.com> wrote:

> I also plan on adding javadoc comments to methods where classify throws an
> UnsupportedOperationException to indicate this instead of allowing default
> copying of the superclass javadoc comment (which does not indicate that the
> method is unsupported

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Timothy Mann <ma...@gmail.com>.

Yes, I understand. Interfaces are almost impossible to get right the first
time and equally impossible to change once they have been made public. My
interest is in improving the javadoc to help point new users in the right
direction. I was thinking about adding a note in the description of
AbstractVectorClassifier that simply emphasizes the importance of
classifyFull as a starting place.

I also plan on adding javadoc comments to methods where classify throws an
UnsupportedOperationException to indicate this instead of allowing default
copying of the superclass javadoc comment (which does not indicate that the
method is unsupported).

Any other ideas?

-Timothy Mann

On Sun, Oct 21, 2012 at 11:20 PM, Ted Dunning <te...@gmail.com> wrote:

> Yes.
>
> It seems stupid in retrospect.  Changing these things is very painful,
> however, because we have no idea how many people will be affected.
>
> On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <mann.timothy@gmail.com
> >wrote:
>
> > It seems strange to me that the classify method declared in
> > AbstractVectorClassifier returns a vector with n-1 scores, where n is the
> > number of categories. I understand that this decision was made for
> > efficiency reasons, but it seems like classify is the first place where
> > people will look in the API. Instead classifyFull provides the
> > implementation that a user may find more intuitive. Furthermore,
> > classifyFull does not require the assumption that the scores over all
> > categories represent probabilities that sum to one, and is therefore more
> > general. In fact, classify is not even implemented for the Naive Bayes
> > implementations but classifyFull is, which was initially confusing until
> I
> > understood what classify actually does. Any thoughts on this?
> >
> > -Timothy Mann
> >
>

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Ted Dunning <te...@gmail.com>.

Ah.

As a convenience method, it makes sense.  I do see value in resolving the
confusion you pointed out.  I am not sure how much value this brings.

As part of a bunch of small improvements, it might be more persuasive.

On Thu, Oct 25, 2012 at 11:54 PM, Timothy Mann <ma...@gmail.com>wrote:

> My point is just that a method named 'score' may convey the message of what
> 'classifyFull' does better than the name classify and it can easily be
> added to AbstractVectorClassifier without modifying any additional code
> beyond:
>
> public Vector score(Vector instance){
>      return classifyFull(instance);
> }
>
> I'm not necessarily pushing for this, I'm just generating discussion.
>
> -Timothy Mann
>
> On Tue, Oct 23, 2012 at 12:33 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Classification *is* regression.  You can always ask the result for the
> > index of the largest score.
> >
> > On Tue, Oct 23, 2012 at 7:02 AM, Timothy Mann <mann.timothy@gmail.com
> > >wrote:
> >
> > > It also seems strange that the classify method is being used for
> > > regression. To me classification is the act of selecting a category
> > > according to some rule. Here what classification does is calculate
> scores
> > > for an instance in each category. It may make sense to add a method,
> for
> > > example,
> > >
> > > public Vector scores(Vector); or maybe public Vector evaluate(Vector);,
> > > etc.
> > >
> > > Adding a method wouldn't break older code, but it also wouldn't resolve
> > > strange use of classifier.
> > >
> > > -Timothy Mann
> > >
> > > On Tue, Oct 23, 2012 at 5:32 AM, Grant Ingersoll <gsingers@apache.org
> > > >wrote:
> > >
> > > >
> > > > On Oct 22, 2012, at 12:20 AM, Ted Dunning wrote:
> > > >
> > > > > Yes.
> > > > >
> > > > > It seems stupid in retrospect.  Changing these things is very
> > painful,
> > > > > however, because we have no idea how many people will be affected.
> > > >
> > > > That being said, we are still pre 1.0.  Better to change now than to
> > bake
> > > > it in 1.0?
> > > >
> > > > >
> > > > > On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <
> > mann.timothy@gmail.com
> > > > >wrote:
> > > > >
> > > > >> It seems strange to me that the classify method declared in
> > > > >> AbstractVectorClassifier returns a vector with n-1 scores, where n
> > is
> > > > the
> > > > >> number of categories. I understand that this decision was made for
> > > > >> efficiency reasons, but it seems like classify is the first place
> > > where
> > > > >> people will look in the API. Instead classifyFull provides the
> > > > >> implementation that a user may find more intuitive. Furthermore,
> > > > >> classifyFull does not require the assumption that the scores over
> > all
> > > > >> categories represent probabilities that sum to one, and is
> therefore
> > > > more
> > > > >> general. In fact, classify is not even implemented for the Naive
> > Bayes
> > > > >> implementations but classifyFull is, which was initially confusing
> > > > until I
> > > > >> understood what classify actually does. Any thoughts on this?
> > > > >>
> > > > >> -Timothy Mann
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Timothy Mann <ma...@gmail.com>.

My point is just that a method named 'score' may convey the message of what
'classifyFull' does better than the name classify and it can easily be
added to AbstractVectorClassifier without modifying any additional code
beyond:

public Vector score(Vector instance){
     return classifyFull(instance);
}

I'm not necessarily pushing for this, I'm just generating discussion.

-Timothy Mann

On Tue, Oct 23, 2012 at 12:33 PM, Ted Dunning <te...@gmail.com> wrote:

> Classification *is* regression.  You can always ask the result for the
> index of the largest score.
>
> On Tue, Oct 23, 2012 at 7:02 AM, Timothy Mann <mann.timothy@gmail.com
> >wrote:
>
> > It also seems strange that the classify method is being used for
> > regression. To me classification is the act of selecting a category
> > according to some rule. Here what classification does is calculate scores
> > for an instance in each category. It may make sense to add a method, for
> > example,
> >
> > public Vector scores(Vector); or maybe public Vector evaluate(Vector);,
> > etc.
> >
> > Adding a method wouldn't break older code, but it also wouldn't resolve
> > strange use of classifier.
> >
> > -Timothy Mann
> >
> > On Tue, Oct 23, 2012 at 5:32 AM, Grant Ingersoll <gsingers@apache.org
> > >wrote:
> >
> > >
> > > On Oct 22, 2012, at 12:20 AM, Ted Dunning wrote:
> > >
> > > > Yes.
> > > >
> > > > It seems stupid in retrospect.  Changing these things is very
> painful,
> > > > however, because we have no idea how many people will be affected.
> > >
> > > That being said, we are still pre 1.0.  Better to change now than to
> bake
> > > it in 1.0?
> > >
> > > >
> > > > On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <
> mann.timothy@gmail.com
> > > >wrote:
> > > >
> > > >> It seems strange to me that the classify method declared in
> > > >> AbstractVectorClassifier returns a vector with n-1 scores, where n
> is
> > > the
> > > >> number of categories. I understand that this decision was made for
> > > >> efficiency reasons, but it seems like classify is the first place
> > where
> > > >> people will look in the API. Instead classifyFull provides the
> > > >> implementation that a user may find more intuitive. Furthermore,
> > > >> classifyFull does not require the assumption that the scores over
> all
> > > >> categories represent probabilities that sum to one, and is therefore
> > > more
> > > >> general. In fact, classify is not even implemented for the Naive
> Bayes
> > > >> implementations but classifyFull is, which was initially confusing
> > > until I
> > > >> understood what classify actually does. Any thoughts on this?
> > > >>
> > > >> -Timothy Mann
> > > >>
> > >
> > >
> >
>

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Ted Dunning <te...@gmail.com>.

Classification *is* regression.  You can always ask the result for the
index of the largest score.

On Tue, Oct 23, 2012 at 7:02 AM, Timothy Mann <ma...@gmail.com>wrote:

> It also seems strange that the classify method is being used for
> regression. To me classification is the act of selecting a category
> according to some rule. Here what classification does is calculate scores
> for an instance in each category. It may make sense to add a method, for
> example,
>
> public Vector scores(Vector); or maybe public Vector evaluate(Vector);,
> etc.
>
> Adding a method wouldn't break older code, but it also wouldn't resolve
> strange use of classifier.
>
> -Timothy Mann
>
> On Tue, Oct 23, 2012 at 5:32 AM, Grant Ingersoll <gsingers@apache.org
> >wrote:
>
> >
> > On Oct 22, 2012, at 12:20 AM, Ted Dunning wrote:
> >
> > > Yes.
> > >
> > > It seems stupid in retrospect.  Changing these things is very painful,
> > > however, because we have no idea how many people will be affected.
> >
> > That being said, we are still pre 1.0.  Better to change now than to bake
> > it in 1.0?
> >
> > >
> > > On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <mann.timothy@gmail.com
> > >wrote:
> > >
> > >> It seems strange to me that the classify method declared in
> > >> AbstractVectorClassifier returns a vector with n-1 scores, where n is
> > the
> > >> number of categories. I understand that this decision was made for
> > >> efficiency reasons, but it seems like classify is the first place
> where
> > >> people will look in the API. Instead classifyFull provides the
> > >> implementation that a user may find more intuitive. Furthermore,
> > >> classifyFull does not require the assumption that the scores over all
> > >> categories represent probabilities that sum to one, and is therefore
> > more
> > >> general. In fact, classify is not even implemented for the Naive Bayes
> > >> implementations but classifyFull is, which was initially confusing
> > until I
> > >> understood what classify actually does. Any thoughts on this?
> > >>
> > >> -Timothy Mann
> > >>
> >
> >
>

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Timothy Mann <ma...@gmail.com>.

It also seems strange that the classify method is being used for
regression. To me classification is the act of selecting a category
according to some rule. Here what classification does is calculate scores
for an instance in each category. It may make sense to add a method, for
example,

public Vector scores(Vector); or maybe public Vector evaluate(Vector);, etc.

Adding a method wouldn't break older code, but it also wouldn't resolve
strange use of classifier.

-Timothy Mann

On Tue, Oct 23, 2012 at 5:32 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Oct 22, 2012, at 12:20 AM, Ted Dunning wrote:
>
> > Yes.
> >
> > It seems stupid in retrospect.  Changing these things is very painful,
> > however, because we have no idea how many people will be affected.
>
> That being said, we are still pre 1.0.  Better to change now than to bake
> it in 1.0?
>
> >
> > On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <mann.timothy@gmail.com
> >wrote:
> >
> >> It seems strange to me that the classify method declared in
> >> AbstractVectorClassifier returns a vector with n-1 scores, where n is
> the
> >> number of categories. I understand that this decision was made for
> >> efficiency reasons, but it seems like classify is the first place where
> >> people will look in the API. Instead classifyFull provides the
> >> implementation that a user may find more intuitive. Furthermore,
> >> classifyFull does not require the assumption that the scores over all
> >> categories represent probabilities that sum to one, and is therefore
> more
> >> general. In fact, classify is not even implemented for the Naive Bayes
> >> implementations but classifyFull is, which was initially confusing
> until I
> >> understood what classify actually does. Any thoughts on this?
> >>
> >> -Timothy Mann
> >>
>
>

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Grant Ingersoll <gs...@apache.org>.

On Oct 22, 2012, at 12:20 AM, Ted Dunning wrote:

> Yes.
> 
> It seems stupid in retrospect.  Changing these things is very painful,
> however, because we have no idea how many people will be affected.

That being said, we are still pre 1.0.  Better to change now than to bake it in 1.0?

> 
> On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <ma...@gmail.com>wrote:
> 
>> It seems strange to me that the classify method declared in
>> AbstractVectorClassifier returns a vector with n-1 scores, where n is the
>> number of categories. I understand that this decision was made for
>> efficiency reasons, but it seems like classify is the first place where
>> people will look in the API. Instead classifyFull provides the
>> implementation that a user may find more intuitive. Furthermore,
>> classifyFull does not require the assumption that the scores over all
>> categories represent probabilities that sum to one, and is therefore more
>> general. In fact, classify is not even implemented for the Naive Bayes
>> implementations but classifyFull is, which was initially confusing until I
>> understood what classify actually does. Any thoughts on this?
>> 
>> -Timothy Mann
>>

Re: classify vs. classifyFull in AbstractVectorClassifier

Posted by Ted Dunning <te...@gmail.com>.

Yes.

It seems stupid in retrospect.  Changing these things is very painful,
however, because we have no idea how many people will be affected.

On Sun, Oct 21, 2012 at 9:16 PM, Timothy Mann <ma...@gmail.com>wrote:

> It seems strange to me that the classify method declared in
> AbstractVectorClassifier returns a vector with n-1 scores, where n is the
> number of categories. I understand that this decision was made for
> efficiency reasons, but it seems like classify is the first place where
> people will look in the API. Instead classifyFull provides the
> implementation that a user may find more intuitive. Furthermore,
> classifyFull does not require the assumption that the scores over all
> categories represent probabilities that sum to one, and is therefore more
> general. In fact, classify is not even implemented for the Naive Bayes
> implementations but classifyFull is, which was initially confusing until I
> understood what classify actually does. Any thoughts on this?
>
> -Timothy Mann
>