You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2014/08/03 08:23:52 UTC

Re: OnlineLogisticRegression sgd, calculating confidence value

Can you provide your data?




On Wed, Jul 30, 2014 at 8:56 AM, Nicholas Demusz <ni...@gmail.com>
wrote:

> I ran the Iris data set through my code. I'm essentially just running a
> setup of LogisticModelParameters from the examples to keep track of the
> categories and features, for testing  and training I am using the
> CsvRecordFactory from LogisticModelParameters to vectorize the data.
> Values in the classifyFull  return vector do respond how I'd expect them to
> for the Iris data set when I change lambda. I did not have an intercept
> term and have since included one, which made a huge difference in the Iris
> data set but no difference in mine, this is something that I had
> overlooked.
>
> My guess is that this most likely points to a problem with my data, which
> is very possible, it being too noisy or having a 'leak' in it then i should
> probably  re-evaluate my features and try again.
>
> The accuracy of the classifier trained on my data over 330 samples of hold
> out is around 70% correct across the 5 categories, but the classifyFull
> return vector has zero response to lambda's value it is just always a 1 in
> whatever the classifier thinks it should be. Also the logLikelihood() is
> either (+/-) 0.00 or - 100.00 during training unlike the Iris data set
> where logLikelihood varies across the full range from 0.00 to -100.00
>
>
> On Mon, Jul 28, 2014 at 4:39 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> >
> > Your impression is correct for classifyFull. This behavior indicates that
> > the classifier has extremely high confidence.
> >
> > Increasing lambda should eventually make the scores degrade to equal
> > scores for each category.
> >
> > Since that isn't happening I think that there may be something else going
> > on. Have you tested with synthetic data?  Can you post sample code.
> >
> > Sent from my iPhone
> >
> > > On Jul 28, 2014, at 13:53, Nicholas Demusz <ni...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > > I am trying to do some classification with Mahout's
> > > OnlineLogisticRegression, I've built a model and have it trained on 5
> > > categories of interest to me. I however was under the impression that
> the
> > > classify() and classifyFull() methods would return a vector of floats
> > that
> > > totaled to 1.0 .. However I get a vector back and it only has a 1 in
> the
> > > index position of the category that it thinks it's supposed to be in.
> Is
> > > this the normal behavior? I have about 500 training items for each
> > > category. I've played with the value of lambda some but it doesn't
> > change.
> > >
> > > If this is the intended outcome, could someone point me to a way to
> > provide
> > > a confidence value for items that I classify, or should I be looking
> at a
> > > recommender?
> > >
> > > My goal is to have some sort of confidence score to indicate the level
> of
> > > certainty that this is what it says it is, as well as put the exemplar
> > data
> > > into a category.
> >
>