You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Christian Herta (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/12 14:14:59 UTC

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

    [ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206407#comment-13206407 ] 

Christian Herta edited comment on MAHOUT-976 at 2/12/12 1:14 PM:
-----------------------------------------------------------------

The implementation of public Vector classifyFull(Vector r, Vector instance)  in AbstractVectorClassifier assumes that the probabilities of the n elements of the output vector sum to 1. This is only valid if there are n mutually exclusive classes. e.g. for the target vectors like (0 0 1 0), (0 0 0 1), (1 0 0 0), .... 

The other posibility is, that there are n (here 4) independent target classes like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1), (0 0 1 0) 
Here the method "Vector classify(..)" and the implementation "public Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier makes no sense. Therefore using "Vector classify(..)" should throw an exception and "Vector classifyFull" must be overwritten.  
 
P.S.: Depending on the "flag" the cost function and the activation function for the output units will be set, to get probabilities as outputs e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2. Also, this simplifies the implementation because the natural pairing between cost and activation function yields for the output deltas "y - t".


                
      was (Author: chrisberlin):
    The implementation of public Vector classifyFull(Vector r, Vector instance)  in AbstractVectorClassifier assumes that the probabilities of the n elements of the output vector sum to 1. This is only valid if there are n mutually exclusive classes. e.g. for the target vectors like (0 0 1 0), (0 0 0 1), (1 0 0 0), .... 

The other posibility is, that there are n (here 4)independent targets like: (1 0 0 1), (0 0 0 0), (0 1 1 1), (1 1 1 1) 
Here the method "Vector classify(..)" and the implementation "public Vector classifyFull(Vector r, Vector instance)"  of AbstractVectorClassfier makes no sense. Therefore using "Vector classify(..)" should throw an exception and "Vector classifyFull" must be overwritten.  
 
P.S.: Depending on the "flag" the cost function and the activation function for the output units will be set, to get probabilities as outputs e.g. C. Bishop: "Pattern Recognition and Machine Learning", chapter 5.2. Also, this simplifies the implementation because the natural pairing between cost and activation function yields for the output deltas "y - t".


                  
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in each layer) 
>  * test of backprop by gradient checking 
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira