You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Drew Farris (JIRA)" <ji...@apache.org> on 2010/07/22 05:44:51 UTC

[jira] Updated: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques

     [ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Drew Farris updated MAHOUT-228:
-------------------------------

    Attachment: TrainLogisticTest.patch

Played with this a bit tonight to see how it worked. I was able to get the donut example working fine. Had the idea to use the text in ClassifierData.DATA as test input to TrainLogistic al la the BayesClassifierSelfTest. Attached is a patch including the simple test. 

This input has 2 columns, 'label' and 'text' which get assigned to the target and predictors arguments respectively. 'text' is processed by the TextValueEncoder.

I had to modified TextValueEncoder to override setTraceDictionary to pass the dictionary reference to the wordEncoder.

Once did this I could train but I ran into a problem producing the final output. Near line 85 in TrainLogistic the predictorWeight method is called with the original column name 'text', not the predictor names generated by TextValueEncoder. Did you have any thoughts as to the best way to modify the code so that the proper predictor names are used?

Once that's fixed, predictorWeight will need to be modified to properly extract the weight for a predictor generated by WordValueEncoder from the lr's beta matrix. I can tell that the traceDictionary's entry points to the positions in the vector where the word's weight is stored, but I'm not sure where to go from there.


> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-228
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-228
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.4
>
>         Attachments: logP.csv, MAHOUT-228-3.patch, MAHOUT-228.patch, MAHOUT-228.patch, MAHOUT-228.patch, MAHOUT-228.patch, r.csv, sgd-derivation.pdf, sgd-derivation.tex, sgd.csv, TrainLogisticTest.patch
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable learning (see Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a reasonable place to start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.