You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Hector Yee (JIRA)" <ji...@apache.org> on 2011/05/19 04:24:47 UTC
[jira] [Created] (MAHOUT-703) Implement Gradient machine
Implement Gradient machine
--------------------------
Key: MAHOUT-703
URL: https://issues.apache.org/jira/browse/MAHOUT-703
Project: Mahout
Issue Type: New Feature
Components: Classification
Affects Versions: 0.6
Reporter: Hector Yee
Priority: Minor
Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
Training done by stochastic gradient descent (possibly mini-batch later).
Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Hector Yee <he...@gmail.com>.
I would punt it to Mahout 670 then
Sent from my iPad
On May 23, 2011, at 11:55 AM, Sean Owen <sr...@gmail.com> wrote:
> We'll have to be a wee bit careful with that -- I am not clear if
> libsvm's license is Apache-compatible before we could take code or
> tests from. It looks compatible but I'm not a lawyer.
>
> http://www.csie.ntu.edu.tw/~cjlin/libsvm/COPYRIGHT
>
> On Mon, May 23, 2011 at 5:38 PM, Hector Yee <he...@gmail.com> wrote:
>> Libsvm has a few I can check in
>> On May 22, 2011 4:36 PM, "Lance Norskog" <go...@gmail.com> wrote:
>>>
>>> What is a good regression test for this? Not a unit test, but
>>> something that demonstrates the algorithms in action at the amount of
>>> data where they become useful?
>>>
>>> Preferably from a small dataset.
>>
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Ted Dunning <te...@gmail.com>.
Looks like a standard 3-clause BSD license to me. Worth doing a line by
line comparison to be sure.
On Mon, May 23, 2011 at 11:55 AM, Sean Owen <sr...@gmail.com> wrote:
> We'll have to be a wee bit careful with that -- I am not clear if
> libsvm's license is Apache-compatible before we could take code or
> tests from. It looks compatible but I'm not a lawyer.
>
> http://www.csie.ntu.edu.tw/~cjlin/libsvm/COPYRIGHT
>
> On Mon, May 23, 2011 at 5:38 PM, Hector Yee <he...@gmail.com> wrote:
> > Libsvm has a few I can check in
> > On May 22, 2011 4:36 PM, "Lance Norskog" <go...@gmail.com> wrote:
> >>
> >> What is a good regression test for this? Not a unit test, but
> >> something that demonstrates the algorithms in action at the amount of
> >> data where they become useful?
> >>
> >> Preferably from a small dataset.
> >
>
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Sean Owen <sr...@gmail.com>.
We'll have to be a wee bit careful with that -- I am not clear if
libsvm's license is Apache-compatible before we could take code or
tests from. It looks compatible but I'm not a lawyer.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/COPYRIGHT
On Mon, May 23, 2011 at 5:38 PM, Hector Yee <he...@gmail.com> wrote:
> Libsvm has a few I can check in
> On May 22, 2011 4:36 PM, "Lance Norskog" <go...@gmail.com> wrote:
>>
>> What is a good regression test for this? Not a unit test, but
>> something that demonstrates the algorithms in action at the amount of
>> data where they become useful?
>>
>> Preferably from a small dataset.
>
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Hector Yee <he...@gmail.com>.
Libsvm has a few I can check in
On May 22, 2011 4:36 PM, "Lance Norskog" <go...@gmail.com> wrote:
>
> What is a good regression test for this? Not a unit test, but
> something that demonstrates the algorithms in action at the amount of
> data where they become useful?
>
> Preferably from a small dataset.
>
> On 5/21/11, Ted Dunning <te...@gmail.com> wrote:
> > On Sat, May 21, 2011 at 4:25 PM, Hector Yee <he...@gmail.com>
wrote:
> >
> >> Sure, or I can wait till you submit patches before working on the next
> >> one?
> >>
> >
> > I think that submit == commit.
> >
> > But in any case, don't wait for anything. Find ways forward. We are in
the
> > middle of a release cycle right now so nothing new is going to be
committed
> > for a little while (another week, possibly).
> >
> >
> >> How would the github repo work? I just clone the apache git version and
> >> check it in there?
> >>
> >
> > Yes. Exactly. And if you want me to help rebasing to track trunk, give
me
> > a committer bit. That won't be very necessary, of course, while trunk
is
> > frozen.
> >
> > Then periodically, you can use [git diff --no-prefix trunk] to dump a
patch
> > that can be added to the JIRA. That will allow non-git users to track
> > progress as well.
> >
> >
> >
> >>
> >> On Sun, May 22, 2011 at 3:41 AM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>
> >> > Hector,
> >> >
> >> > You are working on a variety of things here that have
interdependencies.
> >> >
> >> > What would you think about a github repo where you can keep track of
> >> > them
> >> > with multiple branches and we can all avoid problems with patches not
> >> > applying.
> >> >
> >> > If you like, I can help out keeping your branches up to date relative
to
> >> > trunk.
> >> >
> >> > On Sat, May 21, 2011 at 1:54 AM, Hector Yee (JIRA) <ji...@apache.org>
> >> > wrote:
> >> >
> >> > >
> >> > > [
> >> > >
> >> >
> >>
https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037289#comment-13037289
> >> > ]
> >> > >
> >> > > Hector Yee commented on MAHOUT-703:
> >> > > -----------------------------------
> >> > >
> >> > > Note: This patch requires 702 for the OnlineBaseTest.
> >> > >
> >> > > > Implement Gradient machine
> >> > > > --------------------------
> >> > > >
> >> > > > Key: MAHOUT-703
> >> > > > URL:
> >> https://issues.apache.org/jira/browse/MAHOUT-703
> >> > > > Project: Mahout
> >> > > > Issue Type: New Feature
> >> > > > Components: Classification
> >> > > > Affects Versions: 0.6
> >> > > > Reporter: Hector Yee
> >> > > > Priority: Minor
> >> > > > Labels: features
> >> > > > Fix For: 0.6
> >> > > >
> >> > > > Attachments: MAHOUT-703.patch
> >> > > >
> >> > > > Original Estimate: 72h
> >> > > > Remaining Estimate: 72h
> >> > > >
> >> > > > Implement a gradient machine (aka 'neural network) that can be
used
> >> for
> >> > > classification or auto-encoding.
> >> > > > It will just have an input layer, identity, sigmoid or tanh
hidden
> >> > layer
> >> > > and an output layer.
> >> > > > Training done by stochastic gradient descent (possibly mini-batch
> >> > later).
> >> > > > Sparsity will be optionally enforced by tweaking the bias in the
> >> hidden
> >> > > unit.
> >> > > > For now it will go in classifier/sgd and the auto-encoder will
wrap
> >> it
> >> > in
> >> > > the filter unit later on.
> >> > >
> >> > > --
> >> > > This message is automatically generated by JIRA.
> >> > > For more information on JIRA, see:
> >> > http://www.atlassian.com/software/jira
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Yee Yang Li Hector
> >> http://hectorgon.blogspot.com/ (tech + travel)
> >> http://hectorgon.com (book reviews)
> >>
> >
>
>
> --
> Lance Norskog
> goksron@gmail.com
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Lance Norskog <go...@gmail.com>.
What is a good regression test for this? Not a unit test, but
something that demonstrates the algorithms in action at the amount of
data where they become useful?
Preferably from a small dataset.
On 5/21/11, Ted Dunning <te...@gmail.com> wrote:
> On Sat, May 21, 2011 at 4:25 PM, Hector Yee <he...@gmail.com> wrote:
>
>> Sure, or I can wait till you submit patches before working on the next
>> one?
>>
>
> I think that submit == commit.
>
> But in any case, don't wait for anything. Find ways forward. We are in the
> middle of a release cycle right now so nothing new is going to be committed
> for a little while (another week, possibly).
>
>
>> How would the github repo work? I just clone the apache git version and
>> check it in there?
>>
>
> Yes. Exactly. And if you want me to help rebasing to track trunk, give me
> a committer bit. That won't be very necessary, of course, while trunk is
> frozen.
>
> Then periodically, you can use [git diff --no-prefix trunk] to dump a patch
> that can be added to the JIRA. That will allow non-git users to track
> progress as well.
>
>
>
>>
>> On Sun, May 22, 2011 at 3:41 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>> > Hector,
>> >
>> > You are working on a variety of things here that have interdependencies.
>> >
>> > What would you think about a github repo where you can keep track of
>> > them
>> > with multiple branches and we can all avoid problems with patches not
>> > applying.
>> >
>> > If you like, I can help out keeping your branches up to date relative to
>> > trunk.
>> >
>> > On Sat, May 21, 2011 at 1:54 AM, Hector Yee (JIRA) <ji...@apache.org>
>> > wrote:
>> >
>> > >
>> > > [
>> > >
>> >
>> https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037289#comment-13037289
>> > ]
>> > >
>> > > Hector Yee commented on MAHOUT-703:
>> > > -----------------------------------
>> > >
>> > > Note: This patch requires 702 for the OnlineBaseTest.
>> > >
>> > > > Implement Gradient machine
>> > > > --------------------------
>> > > >
>> > > > Key: MAHOUT-703
>> > > > URL:
>> https://issues.apache.org/jira/browse/MAHOUT-703
>> > > > Project: Mahout
>> > > > Issue Type: New Feature
>> > > > Components: Classification
>> > > > Affects Versions: 0.6
>> > > > Reporter: Hector Yee
>> > > > Priority: Minor
>> > > > Labels: features
>> > > > Fix For: 0.6
>> > > >
>> > > > Attachments: MAHOUT-703.patch
>> > > >
>> > > > Original Estimate: 72h
>> > > > Remaining Estimate: 72h
>> > > >
>> > > > Implement a gradient machine (aka 'neural network) that can be used
>> for
>> > > classification or auto-encoding.
>> > > > It will just have an input layer, identity, sigmoid or tanh hidden
>> > layer
>> > > and an output layer.
>> > > > Training done by stochastic gradient descent (possibly mini-batch
>> > later).
>> > > > Sparsity will be optionally enforced by tweaking the bias in the
>> hidden
>> > > unit.
>> > > > For now it will go in classifier/sgd and the auto-encoder will wrap
>> it
>> > in
>> > > the filter unit later on.
>> > >
>> > > --
>> > > This message is automatically generated by JIRA.
>> > > For more information on JIRA, see:
>> > http://www.atlassian.com/software/jira
>> > >
>> >
>>
>>
>>
>> --
>> Yee Yang Li Hector
>> http://hectorgon.blogspot.com/ (tech + travel)
>> http://hectorgon.com (book reviews)
>>
>
--
Lance Norskog
goksron@gmail.com
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Ted Dunning <te...@gmail.com>.
On Sat, May 21, 2011 at 4:25 PM, Hector Yee <he...@gmail.com> wrote:
> Sure, or I can wait till you submit patches before working on the next one?
>
I think that submit == commit.
But in any case, don't wait for anything. Find ways forward. We are in the
middle of a release cycle right now so nothing new is going to be committed
for a little while (another week, possibly).
> How would the github repo work? I just clone the apache git version and
> check it in there?
>
Yes. Exactly. And if you want me to help rebasing to track trunk, give me
a committer bit. That won't be very necessary, of course, while trunk is
frozen.
Then periodically, you can use [git diff --no-prefix trunk] to dump a patch
that can be added to the JIRA. That will allow non-git users to track
progress as well.
>
> On Sun, May 22, 2011 at 3:41 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Hector,
> >
> > You are working on a variety of things here that have interdependencies.
> >
> > What would you think about a github repo where you can keep track of them
> > with multiple branches and we can all avoid problems with patches not
> > applying.
> >
> > If you like, I can help out keeping your branches up to date relative to
> > trunk.
> >
> > On Sat, May 21, 2011 at 1:54 AM, Hector Yee (JIRA) <ji...@apache.org>
> > wrote:
> >
> > >
> > > [
> > >
> >
> https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037289#comment-13037289
> > ]
> > >
> > > Hector Yee commented on MAHOUT-703:
> > > -----------------------------------
> > >
> > > Note: This patch requires 702 for the OnlineBaseTest.
> > >
> > > > Implement Gradient machine
> > > > --------------------------
> > > >
> > > > Key: MAHOUT-703
> > > > URL:
> https://issues.apache.org/jira/browse/MAHOUT-703
> > > > Project: Mahout
> > > > Issue Type: New Feature
> > > > Components: Classification
> > > > Affects Versions: 0.6
> > > > Reporter: Hector Yee
> > > > Priority: Minor
> > > > Labels: features
> > > > Fix For: 0.6
> > > >
> > > > Attachments: MAHOUT-703.patch
> > > >
> > > > Original Estimate: 72h
> > > > Remaining Estimate: 72h
> > > >
> > > > Implement a gradient machine (aka 'neural network) that can be used
> for
> > > classification or auto-encoding.
> > > > It will just have an input layer, identity, sigmoid or tanh hidden
> > layer
> > > and an output layer.
> > > > Training done by stochastic gradient descent (possibly mini-batch
> > later).
> > > > Sparsity will be optionally enforced by tweaking the bias in the
> hidden
> > > unit.
> > > > For now it will go in classifier/sgd and the auto-encoder will wrap
> it
> > in
> > > the filter unit later on.
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > For more information on JIRA, see:
> > http://www.atlassian.com/software/jira
> > >
> >
>
>
>
> --
> Yee Yang Li Hector
> http://hectorgon.blogspot.com/ (tech + travel)
> http://hectorgon.com (book reviews)
>
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Hector Yee <he...@gmail.com>.
Sure, or I can wait till you submit patches before working on the next one?
How would the github repo work? I just clone the apache git version and
check it in there?
On Sun, May 22, 2011 at 3:41 AM, Ted Dunning <te...@gmail.com> wrote:
> Hector,
>
> You are working on a variety of things here that have interdependencies.
>
> What would you think about a github repo where you can keep track of them
> with multiple branches and we can all avoid problems with patches not
> applying.
>
> If you like, I can help out keeping your branches up to date relative to
> trunk.
>
> On Sat, May 21, 2011 at 1:54 AM, Hector Yee (JIRA) <ji...@apache.org>
> wrote:
>
> >
> > [
> >
> https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037289#comment-13037289
> ]
> >
> > Hector Yee commented on MAHOUT-703:
> > -----------------------------------
> >
> > Note: This patch requires 702 for the OnlineBaseTest.
> >
> > > Implement Gradient machine
> > > --------------------------
> > >
> > > Key: MAHOUT-703
> > > URL: https://issues.apache.org/jira/browse/MAHOUT-703
> > > Project: Mahout
> > > Issue Type: New Feature
> > > Components: Classification
> > > Affects Versions: 0.6
> > > Reporter: Hector Yee
> > > Priority: Minor
> > > Labels: features
> > > Fix For: 0.6
> > >
> > > Attachments: MAHOUT-703.patch
> > >
> > > Original Estimate: 72h
> > > Remaining Estimate: 72h
> > >
> > > Implement a gradient machine (aka 'neural network) that can be used for
> > classification or auto-encoding.
> > > It will just have an input layer, identity, sigmoid or tanh hidden
> layer
> > and an output layer.
> > > Training done by stochastic gradient descent (possibly mini-batch
> later).
> > > Sparsity will be optionally enforced by tweaking the bias in the hidden
> > unit.
> > > For now it will go in classifier/sgd and the auto-encoder will wrap it
> in
> > the filter unit later on.
> >
> > --
> > This message is automatically generated by JIRA.
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>
--
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)
Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by Ted Dunning <te...@gmail.com>.
Hector,
You are working on a variety of things here that have interdependencies.
What would you think about a github repo where you can keep track of them
with multiple branches and we can all avoid problems with patches not
applying.
If you like, I can help out keeping your branches up to date relative to
trunk.
On Sat, May 21, 2011 at 1:54 AM, Hector Yee (JIRA) <ji...@apache.org> wrote:
>
> [
> https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037289#comment-13037289]
>
> Hector Yee commented on MAHOUT-703:
> -----------------------------------
>
> Note: This patch requires 702 for the OnlineBaseTest.
>
> > Implement Gradient machine
> > --------------------------
> >
> > Key: MAHOUT-703
> > URL: https://issues.apache.org/jira/browse/MAHOUT-703
> > Project: Mahout
> > Issue Type: New Feature
> > Components: Classification
> > Affects Versions: 0.6
> > Reporter: Hector Yee
> > Priority: Minor
> > Labels: features
> > Fix For: 0.6
> >
> > Attachments: MAHOUT-703.patch
> >
> > Original Estimate: 72h
> > Remaining Estimate: 72h
> >
> > Implement a gradient machine (aka 'neural network) that can be used for
> classification or auto-encoding.
> > It will just have an input layer, identity, sigmoid or tanh hidden layer
> and an output layer.
> > Training done by stochastic gradient descent (possibly mini-batch later).
> > Sparsity will be optionally enforced by tweaking the bias in the hidden
> unit.
> > For now it will go in classifier/sgd and the auto-encoder will wrap it in
> the filter unit later on.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
[jira] [Issue Comment Edited] (MAHOUT-703) Implement Gradient
machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044391#comment-13044391 ]
Hector Yee edited comment on MAHOUT-703 at 6/4/11 9:00 PM:
-----------------------------------------------------------
Thanks! I'll fix this and submit a new patch. (Edit: whoops looks like it was committed already scratch that. Thanks for fixing the style).
was (Author: hector.yee):
Thanks! I'll fix this and submit a new patch.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Assignee: Ted Dunning
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036303#comment-13036303 ]
Ted Dunning commented on MAHOUT-703:
------------------------------------
Do you have a reference for this bias tweaking trick?
Is it bias as in the bias unit?
Or bias as in bias-variance?
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-703) Implement Gradient machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hector Yee updated MAHOUT-703:
------------------------------
Attachment: MAHOUT-703.patch
Working ranking neural net, less the sparsity enforcing part. Would appreciate if someone could check the math. Unit tests pass.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042463#comment-13042463 ]
Hector Yee commented on MAHOUT-703:
-----------------------------------
Any news on this patch? I need it to implement an autoencoder.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036636#comment-13036636 ]
Hector Yee commented on MAHOUT-703:
-----------------------------------
Sure its here: http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
Bias as in bias unit.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-703) Implement Gradient machine
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Dunning updated MAHOUT-703:
-------------------------------
Fix Version/s: 0.6
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044365#comment-13044365 ]
Sean Owen commented on MAHOUT-703:
----------------------------------
Another good one Hector and hearing no grunts of objection from Ted let's put it in. I have a few small style points for your patches.
- We'll need to use the standard Apache license header
- Class description can/should go in the class javadoc not above the package statement
- Java var naming syntax is camelCase rather than camel_case
- Careful of the javadoc -- it has to start with /** to be read as such
- Go ahead and use braces and a newline with every control flow statement including ifs
- In train(), outputActivation is not used?
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-703) Implement Gradient machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hector Yee updated MAHOUT-703:
------------------------------
Status: Patch Available (was: Open)
Working ranking neural net with one hidden sigmoid layer.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037289#comment-13037289 ]
Hector Yee commented on MAHOUT-703:
-----------------------------------
Note: This patch requires 702 for the OnlineBaseTest.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-703) Implement Gradient machine
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-703:
-----------------------------
Resolution: Fixed
Assignee: Ted Dunning
Status: Resolved (was: Patch Available)
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Assignee: Ted Dunning
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044391#comment-13044391 ]
Hector Yee commented on MAHOUT-703:
-----------------------------------
Thanks! I'll fix this and submit a new patch.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Assignee: Ted Dunning
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036009#comment-13036009 ]
Ted Dunning commented on MAHOUT-703:
------------------------------------
It comes almost for free with SGD neural net codes to put L1 and L2 penalties in as well. I would recommend it.
The trick is that you can't depend on the gradient being sparse so you can't use the lazy regularization. Leon Botou describes
a stochastic full regularization with an adjusted learning rate which should perform comparably. He mostly talks about weight decay (which is L_2 regularization) which can be handled cleverly by keeping a multiplier and a vector. I think L_1 is important, but it requires something like truncated constant decay which can't be done with a multiplier.
See http://leon.bottou.org/projects/sgd
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042598#comment-13042598 ]
Ted Dunning commented on MAHOUT-703:
------------------------------------
You can make the auto-encoder depend on this bug and just go forward. It might even help to drop things on github
so that you can keep rebasing two branches.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044406#comment-13044406 ]
Hudson commented on MAHOUT-703:
-------------------------------
Integrated in Mahout-Quality #861 (See [https://builds.apache.org/hudson/job/Mahout-Quality/861/])
MAHOUT-703 implement Gradient machine classifier
srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1131481
Files :
* /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/GradientMachine.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/classifier/sgd/GradientMachineTest.java
* /mahout/trunk/math/src/main/java/org/apache/mahout/math/function/Functions.java
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Assignee: Ted Dunning
> Priority: Minor
> Labels: features
> Fix For: 0.6
>
> Attachments: MAHOUT-703.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-703) Implement Gradient machine
Posted by "Hector Yee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036022#comment-13036022 ]
Hector Yee commented on MAHOUT-703:
-----------------------------------
Yeah was planning to do L2 regularization first. L1 can be tricky due to edge cases like crossing / following the simplex, so I'll enforce sparsity with Andrew Ng's bias tweaking trick first.
> Implement Gradient machine
> --------------------------
>
> Key: MAHOUT-703
> URL: https://issues.apache.org/jira/browse/MAHOUT-703
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Labels: features
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding.
> It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer.
> Training done by stochastic gradient descent (possibly mini-batch later).
> Sparsity will be optionally enforced by tweaking the bias in the hidden unit.
> For now it will go in classifier/sgd and the auto-encoder will wrap it in the filter unit later on.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira