You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Christian Herta (Created) (JIRA)" <ji...@apache.org> on 2012/02/07 21:41:03 UTC

[jira] [Created] (MAHOUT-976) Implement Multilayer Perceptron

Implement Multilayer Perceptron
-------------------------------

                 Key: MAHOUT-976
                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
             Project: Mahout
          Issue Type: New Feature
    Affects Versions: 0.7
            Reporter: Christian Herta
            Priority: Minor


Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by numerically gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done in batch learning by:
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Herta, Christian" <Ch...@htw-berlin.de>.

Hello Ted,

thanks for the fast reply.
Maybe I expressed myself not clearly. In the first case (n mutually exclusive
classes) classify and the current implementation ofclassifyFullin
AbstractVectorClassfier make sense. The implementation use the assumption sum_i
p_i = 1. Here the assumption is valid.   

But in the second case (n independent decision) only classifyFull(..) can be
applied, because sum_i p_i = 1 (p_i probability of class i) doesn't apply.
That's what I wanted to express by "makes no sense".
Simultaneouslysolving this kind of problem with one classifier is specific for
mlp neural networks. E.g. solving this kind of problem with support vector
machines (svm) you will train n svms independently.
This kind of problem is interesting for autoencoder or feature learning. The
activation pattern of the last hidden units can be interpreted as a feature
representation of the input pattern. With this feature representation the n
independent problems can be solved with a linear model. The mapping of this
feature representation (last hidden units) to the output units is the same as
logistic regression.       
Here the activation function of the output is sigmoid (logistic function).
In the fist case (n mutually exclusive classes) as activation function softmax
is used. In the softmax formula there exist a normalization factor, which
guarantees that the sum over all outputs is 1 (sum_i p_i = 1).    
For each of the cases there exists a cost function which should be used.  

The boolean flag "mutuallyExclusiveClasses" is just a switch between the two
classification cases. So the user doesn't need to know which activation function
and which cost functions he should use for his problem. Depending on the problem
it will be chosen automatically:          
 cost function and corresponding activation function of the output units
(conjugate link functions) 
 - classification (n independent classes): activation function: Sigmoid
(logistic) - cost function: cross entropy: sum t_i ln y_i +(1-t_i) ln (1-y_i)  
 - classification () : activation function: softmax -- cost function: cross
entropy: sum_i t_i ln y_i      
 - (to be complete) the regression case: cost function: Sum of squared errors;
activation function: identity (no squashing)

I hope now I expressed myself more clearly.   

Cheers
 Christian 

Ted Dunning <te...@gmail.com> hat am 12. Februar 2012 um 15:56
geschrieben:

> On Sun, Feb 12, 2012 at 5:14 AM, Christian Herta (Commented) (JIRA) <
> jira@apache.org> wrote:
>
> > ....
> > The implementation of public Vector classifyFull(Vector r, Vector
> > instance)  in AbstractVectorClassifier assumes that the probabilities of
> > the n elements of the output vector sum to 1. This is only valid if there
> > are n mutually exclusive classes. e.g. for the target vectors like (0 0 1
> > 0), (0 0 0 1), (1 0 0 0), ....
> >
>
> Fine.  That assumption is based on the fact that we only really had
> classifiers that had this property.  Over-ride it and comment that the
> assumption doesn't hold.
>
>
> > The other posibility is, that there are n (here 4)independent targets
> > like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1)
> > Here the method "Vector classify(..)" and the implementation "public
> > Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier
> > makes no sense. Therefore using "Vector classify(..)" should throw an
> > exception and "Vector classifyFull" must be overwritten.
> >
>
> The method classify makes a lot of sense.  ClassifyFull becomes the
> primitive and classify() just adds a maxIndex to find the largest value. It
> is true that finding the largest value doesn't make sense for some
> problems, but you can say the same thing of addition.  The classify()
> method definitely does make sense for some problems.
>
>
> > P.S.: Depending on the "flag" the cost function and the activation
> > function for the output units will be set, to get probabilities as outputs
> > e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2.
>
>
> I am on the road and I don't have my copy of Bishop handy and others
> haven't read it.
>
> Do you mean you will offset the activation function to avoid negative
> values and L_1 normalize the result?
>
>
> > Also, this simplifies the implementation because the natural pairing
> > between cost and activation function yields for the output deltas "y - t".
> >
>
> This sounds like an implementation detail.  Implementation details should
> not be exposed to users, even indirectly.  If there is a user expectation
> of a certain behavior, then it is fine to expose some behavior.  But if the
> user expectation conflicts with the simple implementation then you really
> need to do the translation internally so that the user has the easier time
> of it.

Re: [jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by Ted Dunning <te...@gmail.com>.

Christian,

All of what you say makes reasonable sense, but I think that you put too
much weight on the current uses of the API which are warped by the initial
logistic regression implementation.

The heart is classifyFull.  It returns scores which by convention are large
for the 1-of-n category for 1-of-n problems.  Whether these scores
represent a discrete distribution is not specified.

The rest of the classify* methods are convenience functions which may
allocate less memory or require less code on the part of the caller.  For
instance, in binary logistic regression, returning a single score is
sufficient.  Similarly, getting back n-1 scores may be slightly cheaper
than returning all n (for logistic regression).

With MLP, the classifyFull call makes lots of sense.  Whether you normalize
and return a distribution is your business.  It is nice to be as flexible
as you say.  It is also nice to have the convenience method that picks the
largest score.  If you are providing scores that are probabilities, then it
makes folks lives a bit more familiar if you support the methods that
return n-1 scores, but throwing Unsupported Method is probably just fine as
well.

I really think that you are worrying too much here.


On Sun, Feb 12, 2012 at 8:11 AM, Herta, Christian <
Christian.Herta@htw-berlin.de> wrote:

> Hello Ted,
>
> thanks for the fast reply.
> Maybe I expressed myself not clearly. In the first case (n mutually
> exclusive
> classes) classify and the current implementation ofclassifyFullin
> AbstractVectorClassfier make sense. The implementation use the assumption
> sum_i
> p_i = 1. Here the assumption is valid.
>
> But in the second case (n independent decision) only classifyFull(..) can
> be
> applied, because sum_i p_i = 1 (p_i probability of class i) doesn't apply.
> That's what I wanted to express by "makes no sense".
> Simultaneouslysolving this kind of problem with one classifier is specific
> for
> mlp neural networks. E.g. solving this kind of problem with support vector
> machines (svm) you will train n svms independently.
> This kind of problem is interesting for autoencoder or feature learning.
> The
> activation pattern of the last hidden units can be interpreted as a feature
> representation of the input pattern. With this feature representation the n
> independent problems can be solved with a linear model. The mapping of this
> feature representation (last hidden units) to the output units is the same
> as
> logistic regression.
> Here the activation function of the output is sigmoid (logistic function).
> In the fist case (n mutually exclusive classes) as activation function
> softmax
> is used. In the softmax formula there exist a normalization factor, which
> guarantees that the sum over all outputs is 1 (sum_i p_i = 1).
> For each of the cases there exists a cost function which should be used.
>
> The boolean flag "mutuallyExclusiveClasses" is just a switch between the
> two
> classification cases. So the user doesn't need to know which activation
> function
> and which cost functions he should use for his problem. Depending on the
> problem
> it will be chosen automatically:
>  cost function and corresponding activation function of the output units
> (conjugate link functions)
>  - classification (n independent classes): activation function: Sigmoid
> (logistic) - cost function: cross entropy: sum t_i ln y_i +(1-t_i) ln
> (1-y_i)
>  - classification () : activation function: softmax -- cost function: cross
> entropy: sum_i t_i ln y_i
>  - (to be complete) the regression case: cost function: Sum of squared
> errors;
> activation function: identity (no squashing)
>
> I hope now I expressed myself more clearly.
>
> Cheers
>  Christian
>
>
>
> Ted Dunning <te...@gmail.com> hat am 12. Februar 2012 um 15:56
> geschrieben:
>
> > On Sun, Feb 12, 2012 at 5:14 AM, Christian Herta (Commented) (JIRA) <
> > jira@apache.org> wrote:
> >
> > > ....
> > > The implementation of public Vector classifyFull(Vector r, Vector
> > > instance)  in AbstractVectorClassifier assumes that the probabilities
> of
> > > the n elements of the output vector sum to 1. This is only valid if
> there
> > > are n mutually exclusive classes. e.g. for the target vectors like (0
> 0 1
> > > 0), (0 0 0 1), (1 0 0 0), ....
> > >
> >
> > Fine.  That assumption is based on the fact that we only really had
> > classifiers that had this property.  Over-ride it and comment that the
> > assumption doesn't hold.
> >
> >
> > > The other posibility is, that there are n (here 4)independent targets
> > > like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1)
> > > Here the method "Vector classify(..)" and the implementation "public
> > > Vector classifyFull(Vector r, Vector instance)"  of
> AbstractVectorClassfier
> > > makes no sense. Therefore using "Vector classify(..)" should throw an
> > > exception and "Vector classifyFull" must be overwritten.
> > >
> >
> > The method classify makes a lot of sense.  ClassifyFull becomes the
> > primitive and classify() just adds a maxIndex to find the largest value.
> It
> > is true that finding the largest value doesn't make sense for some
> > problems, but you can say the same thing of addition.  The classify()
> > method definitely does make sense for some problems.
> >
> >
> > > P.S.: Depending on the "flag" the cost function and the activation
> > > function for the output units will be set, to get probabilities as
> outputs
> > > e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter
> 5.2.
> >
> >
> > I am on the road and I don't have my copy of Bishop handy and others
> > haven't read it.
> >
> > Do you mean you will offset the activation function to avoid negative
> > values and L_1 normalize the result?
> >
> >
> > > Also, this simplifies the implementation because the natural pairing
> > > between cost and activation function yields for the output deltas "y -
> t".
> > >
> >
> > This sounds like an implementation detail.  Implementation details should
> > not be exposed to users, even indirectly.  If there is a user expectation
> > of a certain behavior, then it is fine to expose some behavior.  But if
> the
> > user expectation conflicts with the simple implementation then you really
> > need to do the translation internally so that the user has the easier
> time
> > of it.

Re: [jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by Ted Dunning <te...@gmail.com>.

On Sun, Feb 12, 2012 at 5:14 AM, Christian Herta (Commented) (JIRA) <
jira@apache.org> wrote:

> ....
> The implementation of public Vector classifyFull(Vector r, Vector
> instance)  in AbstractVectorClassifier assumes that the probabilities of
> the n elements of the output vector sum to 1. This is only valid if there
> are n mutually exclusive classes. e.g. for the target vectors like (0 0 1
> 0), (0 0 0 1), (1 0 0 0), ....
>

Fine.  That assumption is based on the fact that we only really had
classifiers that had this property.  Over-ride it and comment that the
assumption doesn't hold.

> The other posibility is, that there are n (here 4)independent targets
> like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1)
> Here the method "Vector classify(..)" and the implementation "public
> Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier
> makes no sense. Therefore using "Vector classify(..)" should throw an
> exception and "Vector classifyFull" must be overwritten.
>

The method classify makes a lot of sense.  ClassifyFull becomes the
primitive and classify() just adds a maxIndex to find the largest value. It
is true that finding the largest value doesn't make sense for some
problems, but you can say the same thing of addition.  The classify()
method definitely does make sense for some problems.

> P.S.: Depending on the "flag" the cost function and the activation
> function for the output units will be set, to get probabilities as outputs
> e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2.

I am on the road and I don't have my copy of Bishop handy and others
haven't read it.

Do you mean you will offset the activation function to avoid negative
values and L_1 normalize the result?

> Also, this simplifies the implementation because the natural pairing
> between cost and activation function yields for the output deltas "y - t".
>

This sounds like an implementation detail.  Implementation details should
not be exposed to users, even indirectly.  If there is a user expectation
of a certain behavior, then it is fine to expose some behavior.  But if the
user expectation conflicts with the simple implementation then you really
need to do the translation internally so that the user has the easier time
of it.

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Viktor Gal (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202759#comment-13202759 ] 

Viktor Gal commented on MAHOUT-976:
-----------------------------------

Although it's not the same (but again a NN) and afaik the learning is sequential, but it's worth to check out the restricted boltzmann machine implementation that has been just submitted to MAHOUT-968
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
> Later (new jira issues):
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done in batch learning by:
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Attachment: MAHOUT-976.patch

 - momentum term included
 - read-write mlp (completely untested)
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Attachment: MAHOUT-976.patch

uncomplete and completely untested
should only compile
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Remaining Estimate: 80h  (was: 336h)
     Original Estimate: 80h  (was: 336h)
    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by numerically gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
> Later (new jira issues):
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done in batch learning by:
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240231#comment-13240231 ] 

Christian Herta commented on MAHOUT-976:
----------------------------------------

Until now I have implemented only autoencoder code which is very special for multilayer perceptrons: weight changes for the constraint "sparse autoencoder". 
I agree that it's a good idea to have a general autoencoder framework, e.g. for stacking autoencoders and for training autoencoders.
When I have some time I will have a look how to get rbm and mlp autoencoders together.   
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210206#comment-13210206 ] 

Christian Herta commented on MAHOUT-976:
----------------------------------------

Hallo Ted,

thanks for the hints. I will have a look at the algorithms. 

First, I would suggest to implement a simple distributed batch learning method. This can be a baseline for other procedures and algorithms.

Before implementing a more sophisticated solution (state-of-the-art in distributed learning for mlps) I need to study the literature in detail. 

Further Literature as a starting point:  
 - "On optimazation methods for deep learning" www.icml-2011.org/papers/210_icmlpaper.pdf
 - "Deep Learning via Hessian-free Optimization" www.icml2010.org/papers/458.pdf

Cheers Christian
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206407#comment-13206407 ] 

Christian Herta edited comment on MAHOUT-976 at 2/12/12 1:15 PM:
-----------------------------------------------------------------

The implementation of public Vector classifyFull(Vector r, Vector instance)  in AbstractVectorClassifier assumes that the probabilities of the n elements of the output vector sum to 1. This is only valid if there are n mutually exclusive classes. e.g. for the target vectors like (0 0 1 0), (0 0 0 1), (1 0 0 0), .... 

The other posibility is, that there are n (here 4) independent target classes like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1), (0 0 1 0) 
Here the method "Vector classify(..)" and the implementation "public Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier makes no sense. Therefore using "Vector classify(..)" should throw an exception and "Vector classifyFull" must be overwritten.  
 
P.S.: Depending on the "flag" the cost function and the activation function for the output units will be set, to get probabilities as outputs see e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2. Also, this simplifies the implementation because the natural pairing between cost and activation function yields for the output deltas "y - t".


                
      was (Author: chrisberlin):
    The implementation of public Vector classifyFull(Vector r, Vector instance)  in AbstractVectorClassifier assumes that the probabilities of the n elements of the output vector sum to 1. This is only valid if there are n mutually exclusive classes. e.g. for the target vectors like (0 0 1 0), (0 0 0 1), (1 0 0 0), .... 

The other posibility is, that there are n (here 4) independent target classes like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1), (0 0 1 0) 
Here the method "Vector classify(..)" and the implementation "public Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier makes no sense. Therefore using "Vector classify(..)" should throw an exception and "Vector classifyFull" must be overwritten.  
 
P.S.: Depending on the "flag" the cost function and the activation function for the output units will be set, to get probabilities as outputs e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2. Also, this simplifies the implementation because the natural pairing between cost and activation function yields for the output deltas "y - t".


                  
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209446#comment-13209446 ] 

Ted Dunning commented on MAHOUT-976:
------------------------------------

Also, John has had very good results in Vowpal Wabbit with an allreduce operation in his learning system.  The way that this works is that he launches a map-only learning task which reads inputs repeatedly and propagates the gradient vector every pass over the data using an all-reduce operation.  All reduce applies an associative aggregation to a data structure in a tree structure imposed on the network.  The result of the aggregation is passed back down the tree to all nodes.

This allows fast iteration of learning and could also speed up our k-means codes massively.  Typically, this improves speeds by about 2 orders of magnitude because the horrid costs of Hadoop job starts go away.

Would you be interested in experimenting with this in your parallel implementation here?

                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Dirk Weißenborn (Commented JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218499#comment-13218499 ] 

Dirk Weißenborn commented on MAHOUT-976:
----------------------------------------

I saw that you are planning to implement also autoencoders! If I understood them right, they work exactly like RBMs just that they are not stochastically driven. The RBM implementation I mentioned provides already this functionality (through a method called setProbabilitiesAsActivation() in the Layer interface) and it is only necessary to write a training algorithm. I think it would be easier to take a look there before implementing everything again (MAHOUT-968).
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209440#comment-13209440 ] 

Ted Dunning commented on MAHOUT-976:
------------------------------------

Christian,

Does confidence weighted learning help with MLP's?

Also, should we move forward with a L-BFGS implementation?  I have heard from John Langford that it is very useful to start with stochastic gradient descent (aka back-prop) and use L-BFGS for finishing off the learning.  That same approach should work reasonably well with MLP's as well, although it may take a bit longer to get into the region where BFGS wins.
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Status: Patch Available  (was: Open)

uncomplete and completly untested
should only compile 
  
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209422#comment-13209422 ] 

Christian Herta edited comment on MAHOUT-976 at 2/16/12 3:11 PM:
-----------------------------------------------------------------

new patch available:

 - Basis implementation 
 - simple tests for propagation and learning by backprop
                
      was (Author: chrisberlin):
    16. Feb. 2012
some tests of learning
                  
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Dirk Weißenborn (Issue Comment Edited JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218511#comment-13218511 ] 

Dirk Weißenborn edited comment on MAHOUT-976 at 2/28/12 7:24 PM:
-----------------------------------------------------------------

You can also take a look at the training itself in this patch since it is actually also a batch learning algorithm. I also implemented a none map/reduce based approach using multiple threads. I think you can save a lot of time by reusing already tested code since it is pretty similar to this task.
                
      was (Author: dirk.weissenborn):
    You can also take a look at the training itself in this patch since it is actually also a batch learning algorithm. I also implemented a not map/reduce based approach using multiple threads. I think you can take you can save a lot of time reusing already tested code since it is pretty similar to this task.
                  
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Description: 
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine
 * simple gradient descent 

Later (new jira issues):
 * momentum for better and faster learning  
 * advanced cost minimazation like 2nd order methods, conjugate gradient etc.  
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done by (batch learning):
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

  was:
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done by (batch learning):
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent 
> Later (new jira issues):
>  * momentum for better and faster learning  
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Dirk Weißenborn (Commented JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203084#comment-13203084 ] 

Dirk Weißenborn commented on MAHOUT-976:
----------------------------------------

I already implemented backpropagation for special multilayer anns (deep boltzmann machines). Still, you can use the layer implementations which should provide everything you need for backprop (hopefully), MAHOUT-968 . The backprop is still naive cause it is not the most important part of training, but it would still be nice to have optimized backprop. 
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
> Later (new jira issues):
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206407#comment-13206407 ] 

Christian Herta commented on MAHOUT-976:
----------------------------------------

The implementation of public Vector classifyFull(Vector r, Vector instance)  in AbstractVectorClassifier assumes that the probabilities of the n elements of the output vector sum to 1. This is only valid if there are n mutually exclusive classes. e.g. for the target vectors like (0 0 1 0), (0 0 0 1), (1 0 0 0), .... 

The other posibility is, that there are n (here 4)independent targets like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1) 
Here the method "Vector classify(..)" and the implementation "public Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier makes no sense. Therefore using "Vector classify(..)" should throw an exception and "Vector classifyFull" must be overwritten.  
 
P.S.: Depending on the "flag" the cost function and the activation function for the output units will be set, to get probabilities as outputs e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2. Also, this simplifies the implementation because the natural pairing between cost and activation function yields for the output deltas "y - t".


                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206407#comment-13206407 ] 

Christian Herta edited comment on MAHOUT-976 at 2/12/12 1:14 PM:
-----------------------------------------------------------------

The implementation of public Vector classifyFull(Vector r, Vector instance)  in AbstractVectorClassifier assumes that the probabilities of the n elements of the output vector sum to 1. This is only valid if there are n mutually exclusive classes. e.g. for the target vectors like (0 0 1 0), (0 0 0 1), (1 0 0 0), .... 

The other posibility is, that there are n (here 4) independent target classes like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1), (0 0 1 0) 
Here the method "Vector classify(..)" and the implementation "public Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier makes no sense. Therefore using "Vector classify(..)" should throw an exception and "Vector classifyFull" must be overwritten.  
 
P.S.: Depending on the "flag" the cost function and the activation function for the output units will be set, to get probabilities as outputs e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2. Also, this simplifies the implementation because the natural pairing between cost and activation function yields for the output deltas "y - t".


                
      was (Author: chrisberlin):
    The implementation of public Vector classifyFull(Vector r, Vector instance)  in AbstractVectorClassifier assumes that the probabilities of the n elements of the output vector sum to 1. This is only valid if there are n mutually exclusive classes. e.g. for the target vectors like (0 0 1 0), (0 0 0 1), (1 0 0 0), .... 

The other posibility is, that there are n (here 4)independent targets like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1) 
Here the method "Vector classify(..)" and the implementation "public Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier makes no sense. Therefore using "Vector classify(..)" should throw an exception and "Vector classifyFull" must be overwritten.  
 
P.S.: Depending on the "flag" the cost function and the activation function for the output units will be set, to get probabilities as outputs e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2. Also, this simplifies the implementation because the natural pairing between cost and activation function yields for the output deltas "y - t".


                  
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Attachment: MAHOUT-976.patch

16. Feb. 2012
some tests of learning
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Comment: was deleted

(was: uncomplete and completly untested
should only compile 
  )
    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Description: 
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine
 * simple gradient descent incl. momentum

Later (new jira issues):  
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
 * advanced cost minimazation like 2nd order methods, conjugate gradient etc.

Distribution of learning can be done by (batch learning):
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
 

  was:
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine
 * simple gradient descent 

Later (new jira issues):
 * momentum for better and faster learning  
 * advanced cost minimazation like 2nd order methods, conjugate gradient etc.  
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done by (batch learning):
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Dirk Weißenborn (Commented JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218511#comment-13218511 ] 

Dirk Weißenborn commented on MAHOUT-976:
----------------------------------------

You can also take a look at the training itself in this patch since it is actually also a batch learning algorithm. I also implemented a not map/reduce based approach using multiple threads. I think you can take you can save a lot of time reusing already tested code since it is pretty similar to this task.
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206345#comment-13206345 ] 

Ted Dunning commented on MAHOUT-976:
------------------------------------

I would leave classify as it stands.  If the user wants that sort of function, it can be very convenient.  As you point out, classifyFull is probably more useful for lots of applications.

Why do you need the flag?  Why not just let the user decide how to use their model?
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent 
> Later (new jira issues):
>  * momentum for better and faster learning  
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205471#comment-13205471 ] 

Christian Herta commented on MAHOUT-976:
----------------------------------------

The AbstractVectorClassifier Method classify(..) assumes that in general there are n mutually exclusive classes. This is also the standard characteristics of the convenience function classifyFull(..). For a Multilayer Perceptron this is not necessary the case. In the current work-in -progress implementation this will be configured in the constructor of the MLP by a boolean "mutuallyExclusiveClasses". 

I could overwrite classifyFull and throw a UnsupportedOperationException() if classify is used for "mutuallyExclusiveClasses = false". But I assume that would be confusing for the user.

Is there a better solution? 
                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
> Later (new jira issues):
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Description: 
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 * normalization of the inputs (storeable) as part of the model
 
First:
 * implementation "stocastic gradient descent" like gradient machine
 * simple gradient descent incl. momentum

Later (new jira issues):  
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
 * advanced cost minimazation like 2nd order methods, conjugate gradient etc.

Distribution of learning can be done by (batch learning):
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
 

  was:
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine
 * simple gradient descent incl. momentum

Later (new jira issues):  
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
 * advanced cost minimazation like 2nd order methods, conjugate gradient etc.

Distribution of learning can be done by (batch learning):
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
 

    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Description: 
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done by (batch learning):
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

  was:
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done in batch learning by:
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
> Later (new jira issues):
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Description: 
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done in batch learning by:
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

  was:
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done in batch learning by:
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning) 

    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
> Later (new jira issues):
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done in batch learning by:
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Description: 
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done in batch learning by:
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning) 

  was:
Implement a multi layer perceptron

 * via Matrix Multiplication
 * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
 * arbitrary number of hidden layers (also 0  - just the linear model)
 * connection between proximate layers only 
 * different cost and activation functions (different activation function in each layer) 
 * test of backprop by numerically gradient checking 
 
First:
 * implementation "stocastic gradient descent" like gradient machine

Later (new jira issues):
 * Distributed Batch learning (see below)  
 * "Stacked (Denoising) Autoencoder" - Feature Learning
   

Distribution of learning can be done in batch learning by:
 1 Partioning of the data in x chunks 
 2 Learning the weight changes as matrices in each chunk
 3 Combining the matrixes and update of the weights - back to 2
Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning) 

    
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
> Later (new jira issues):
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>    
> Distribution of learning can be done in batch learning by:
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe is procedure can be done with random parts of the chunks (distributed quasi online learning) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206446#comment-13206446 ] 

Christian Herta edited comment on MAHOUT-976 at 2/12/12 4:37 PM:
-----------------------------------------------------------------

patch MAHOUT-976
uncomplete and completely untested
should only compile
                
      was (Author: chrisberlin):
    uncomplete and completely untested
should only compile
                  
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Herta updated MAHOUT-976:
-----------------------------------

    Attachment: MAHOUT-976.patch

- tests added (incl. gradient checking)
- bug fixed 
- experimental sparse autoencoder (subclass of mlp - not tested) 
- minor improvements

                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira