You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Maciej Mazur <ma...@gmail.com> on 2014/03/16 21:27:04 UTC

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

I have one final question.

I've mixed feelings about this discussion.
You are saying that there is no point in doing mapreduce implementation of
neural netoworks (with pretraining).
Then you are thinking that non map reduce would of substatial interest.
On the other hand you say that it would be easy and it beats the purpose of
doing it of doing it on mahout (because it is not a mr version).
Finally you are saying that building something simple and working is a good
thing.

I do not really know what to think about it.
Could you give me some advice whether I should write a proposal or not?
(And if I should: Should I propose MapReduce or not MapReduce verison?
There is already NN algorithm but without pretraining.)

Thanks,
Maciej Mazur





On Fri, Feb 28, 2014 at 5:44 AM, peng <pc...@uowmail.edu.au> wrote:

> Oh, thanks a lot, I missed that one :)
> +1 on easiest one implemented first. I haven't think about difficulty
> issue, need  to read more about YARN extension.
>
> Yours Peng
>
>
> On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
>
>> Hi, Peng,
>>
>> Do you mean the MultilayerPerceptron? There are three 'train' method, and
>> only one (the one without the parameters trackingKey and groupKey) is
>> implemented. In current implementation, they are not used.
>>
>> Regards,
>> Yexi
>>
>>
>> 2014-02-27 19:31 GMT-05:00 Ted Dunning <te...@gmail.com>:
>>
>>  Generally for training models like this, there is an assumption that
>>> fault
>>> tolerance is not particularly necessary because the low risk of failure
>>> trades against algorithmic speed.  For reasonably small chance of
>>> failure,
>>> simply re-running the training is just fine.  If there is high risk of
>>> failure, simply checkpointing the parameter server is sufficient to allow
>>> restarts without redundancy.
>>>
>>> Sharding the parameter is quite possible and is reasonable when the
>>> parameter vector exceed 10's or 100's of millions of parameters, but
>>> isn't
>>> likely much necessary below that.
>>>
>>> The asymmetry is similarly not a big deal.  The traffic to and from the
>>> parameter server isn't enormous.
>>>
>>>
>>> Building something simple and working first is a good thing.
>>>
>>>
>>> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc...@uowmail.edu.au> wrote:
>>>
>>>  With pleasure! the original downpour paper propose a parameter server
>>>>
>>> from
>>>
>>>> which subnodes download shards of old model and upload gradients. So if
>>>>
>>> the
>>>
>>>> parameter server is down, the process has to be delayed, it also
>>>> requires
>>>> that all model parameters to be stored and atomically updated on (and
>>>> fetched from) a single machine, imposing asymmetric HDD and bandwidth
>>>> requirement. This design is necessary only because each -=delta
>>>> operation
>>>> has to be atomic. Which cannot be ensured across network (e.g. on HDFS).
>>>>
>>>> But it doesn't mean that the operation cannot be decentralized:
>>>>
>>> parameters
>>>
>>>> can be sharded across multiple nodes and multiple accumulator instances
>>>>
>>> can
>>>
>>>> handle parts of the vector subtraction. This should be easy if you
>>>>
>>> create a
>>>
>>>> buffer for the stream of gradient, and allocate proper numbers of
>>>>
>>> producers
>>>
>>>> and consumers on each machine to make sure it doesn't overflow.
>>>> Obviously
>>>> this is far from MR framework, but at least it can be made homogeneous
>>>>
>>> and
>>>
>>>> slightly faster (because sparse data can be distributed in a way to
>>>> minimize their overlapping, so gradients doesn't have to go across the
>>>> network that frequent).
>>>>
>>>> If we instead using a centralized architect. Then there must be >=1
>>>>
>>> backup
>>>
>>>> parameter server for mission critical training.
>>>>
>>>> Yours Peng
>>>>
>>>> e.g. we can simply use a producer/consumer pattern
>>>>
>>>> If we use a producer/consumer pattern for all gradients,
>>>>
>>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
>>>>
>>>>  Peng,
>>>>>
>>>>> Can you provide more details about your thought?
>>>>>
>>>>> Regards,
>>>>>
>>>>>
>>>>> 2014-02-27 16:00 GMT-05:00 peng <pc...@uowmail.edu.au>:
>>>>>
>>>>>   That should be easy. But that defeats the purpose of using mahout as
>>>>>
>>>>>> there
>>>>>> are already enough implementations of single node backpropagation (in
>>>>>> which
>>>>>> case GPU is much faster).
>>>>>>
>>>>>> Yexi:
>>>>>>
>>>>>> Regarding downpour SGD and sandblaster, may I suggest that the
>>>>>> implementation better has no parameter server? It's obviously a single
>>>>>> point of failure and in terms of bandwidth, a bottleneck. I heard that
>>>>>> MLlib on top of Spark has a functional implementation (never read or
>>>>>>
>>>>> test
>>>
>>>> it), and its possible to build the workflow on top of YARN. Non of
>>>>>>
>>>>> those
>>>
>>>> framework has an heterogeneous topology.
>>>>>>
>>>>>> Yours Peng
>>>>>>
>>>>>>
>>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
>>>>>>
>>>>>>
>>>>>>         [ https://issues.apache.org/jira/browse/MAHOUT-1426?page=
>>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
>>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
>>>>>>>
>>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
>>>>>>> ---------------------------------------------------------------
>>>>>>>
>>>>>>> I've read the papers. I didn't think about distributed network. I had
>>>>>>>
>>>>>> in
>>>
>>>> mind network that will fit into memory, but will require significant
>>>>>>> amount
>>>>>>> of computations.
>>>>>>>
>>>>>>> I understand that there are better options for neural networks than
>>>>>>>
>>>>>> map
>>>
>>>> reduce.
>>>>>>> How about non-map-reduce version?
>>>>>>> I see that you think it is something that would make a sense. (Doing
>>>>>>> a
>>>>>>> non-map-reduce neural network in Mahout would be of substantial
>>>>>>> interest.)
>>>>>>> Do you think it will be a valueable contribution?
>>>>>>> Is there a need for this type of algorithm?
>>>>>>> I think about multi-threded batch gradient descent with pretraining
>>>>>>>
>>>>>> (RBM
>>>
>>>> or/and Autoencoders).
>>>>>>>
>>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
>>>>>>> "I would rather like to withdraw that patch, because by the time i
>>>>>>> implemented it i didn't know that the learning algorithm is not
>>>>>>> suited
>>>>>>> for
>>>>>>> MR, so I think there is no point including the patch."
>>>>>>>
>>>>>>>
>>>>>>> was (Author: maciejmazur):
>>>>>>> I've read the papers. I didn't think about distributed network. I had
>>>>>>>
>>>>>> in
>>>
>>>> mind network that will fit into memory, but will require significant
>>>>>>> amount
>>>>>>> of computations.
>>>>>>>
>>>>>>> I understand that there are better options for neural networks than
>>>>>>>
>>>>>> map
>>>
>>>> reduce.
>>>>>>> How about non-map-reduce version?
>>>>>>> I see that you think it is something that would make a sense.
>>>>>>> Do you think it will be a valueable contribution?
>>>>>>> Is there a need for this type of algorithm?
>>>>>>> I think about multi-threded batch gradient descent with pretraining
>>>>>>>
>>>>>> (RBM
>>>
>>>> or/and Autoencoders).
>>>>>>>
>>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
>>>>>>> "I would rather like to withdraw that patch, because by the time i
>>>>>>> implemented it i didn't know that the learning algorithm is not
>>>>>>> suited
>>>>>>> for
>>>>>>> MR, so I think there is no point including the patch."
>>>>>>>
>>>>>>>    GSOC 2013 Neural network algorithms
>>>>>>>
>>>>>>>  -----------------------------------
>>>>>>>>
>>>>>>>>                    Key: MAHOUT-1426
>>>>>>>>                    URL: https://issues.apache.org/
>>>>>>>> jira/browse/MAHOUT-1426
>>>>>>>>                Project: Mahout
>>>>>>>>             Issue Type: Improvement
>>>>>>>>             Components: Classification
>>>>>>>>               Reporter: Maciej Mazur
>>>>>>>>
>>>>>>>> I would like to ask about possibilites of implementing neural
>>>>>>>> network
>>>>>>>> algorithms in mahout during GSOC.
>>>>>>>> There is a classifier.mlp package with neural network.
>>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
>>>>>>>> There is only one word about Autoencoders in NeuralNetwork class.
>>>>>>>> As far as I know Mahout doesn't support convolutional networks.
>>>>>>>> Is it a good idea to implement one of these algorithms?
>>>>>>>> Is it a reasonable amount of work?
>>>>>>>> How hard is it to get GSOC in Mahout?
>>>>>>>> Did anyone succeed last year?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> This message was sent by Atlassian JIRA
>>>>>>> (v6.1.5#6160)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Yexi Jiang <ye...@gmail.com>.
Hi, Ted,

I am currently working on that issue with Suneel.

Yexi


2014-03-19 19:44 GMT-04:00 Ted Dunning <te...@gmail.com>:

> On Wed, Mar 19, 2014 at 3:19 PM, Maciej Mazur <maciejmazurx@gmail.com
> >wrote:
>
> > I'm not going to propose this project.
> > Now this issue can be closed.
> >
>
> Proposing the downpour would be a good thing to do.
>
> It won't be that difficult.
>
> Please don't take my comments as discouraging.
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Mar 19, 2014 at 3:19 PM, Maciej Mazur <ma...@gmail.com>wrote:

> I'm not going to propose this project.
> Now this issue can be closed.
>

Proposing the downpour would be a good thing to do.

It won't be that difficult.

Please don't take my comments as discouraging.

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Maciej Mazur <ma...@gmail.com>.
Ok, I think you are right.
Although it would be a valuable experience, I will have to leave it.
Thanks for your feedback.
I understand that is not the best use of map reduce.
I'm not going to propose this project.
Now this issue can be closed.


On Wed, Mar 19, 2014 at 11:01 PM, Ted Dunning <te...@gmail.com> wrote:

> I really think that a true downpour architecture is actually easier than
> what you suggest and much better for the purpose.
>
>
>
>
> On Wed, Mar 19, 2014 at 1:28 PM, Maciej Mazur <maciejmazurx@gmail.com
> >wrote:
>
> > Any comments?
> > I think it will work. If I will do one long lasting job, hack the file
> > system from mapper in order to repeateadly update weights, perform mini
> > batch GD, and store updates in some folder.
> > In the background I could call small jobs for gathering gradients and
> > updating weights.
> >
> >
> > On Tue, Mar 18, 2014 at 10:11 PM, Maciej Mazur <maciejmazurx@gmail.com
> > >wrote:
> >
> > > I'll say what I think about it.
> > >
> > > I know that mahout is currently heading in different direction. You are
> > > working on refactoring, improving existing api and migrating to Spark.
> I
> > > know that there is a great deal of work to do there. I would also like
> to
> > > help with that.
> > >
> > > I am impressed by results achieved by using Neural Networks. Generally
> > > speaking I think that NN give significant advantage over other methods
> in
> > > wide range of problems. It beats other state of the art algorithms in
> > > various areas. I think that in the future this algorithm will play even
> > > greater role.
> > > That's why I came up with an idea to implement neural networks.
> > >
> > > When it comes to functionality: pretraining (RBM), training
> > (SGD/minibatch
> > > gradient descent + backpropagation + momentum) and classification.
> > >
> > > Unfortunately mapreduce is illsuited for NNs.
> > > The biggest problem is how to reduce the number of iterations.
> > > It is possible to divide data and use momentum applied to edges - it
> > helps
> > > a little, but doesn't solve the problem.
> > >
> > > I've some idea of not exactly mapreduce implementation. But I am not
> sure
> > > whether it is possible using this infrastructure. For sure it is not
> > plain
> > > map reduce.
> > > In other distributed NNs implementation there are asynchronic
> operations.
> > > Is it possible to take adventage of asynchrony?
> > > At first I would separate data, some subset on every node.
> > > On each node I will use a number of files (directories) for storing
> > > weights.
> > > Each machile will use these files to count the cost function and update
> > > gradient.
> > > In the background multiple reduce job will average gradients for some
> > > subsets of weights (one file).
> > > Then asynchronously update some subset of weights (from one file).
> > > In a way this idea is similar to Downpour SGD from
> > >
> >
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf
> > >
> > > There are couple of problems here. Is it a feasible solution?
> > >
> > >
> > > Parallel implementation is very complex. It's hard to design something
> > > that uses mapreduce, but it's not a mapreduce algorithm.
> > > Definitely you are more experienced than me and I'll need a lot of
> help.
> > I
> > > may not be aware of some limitations.
> > >
> > > From my perspective it would be a great experience, even if I could do
> > > something other than NNs. Frankly speaking I think I'll stay here
> > > regardless of whether my propasal will be accepted. It'll be a great
> > > opportunity to learn.
> > >
> > >
> > >
> > >
> > > On Mon, Mar 17, 2014 at 5:27 AM, Suneel Marthi <
> suneel_marthi@yahoo.com
> > >wrote:
> > >
> > >> I would suggest looking at deeplearning4j.org (they went public very
> > >> recently) and see how they had utilized Iterative Reduce for
> > implementing
> > >> Neural Nets.
> > >>
> > >> Not sure given the present state of flux on the project if we should
> > even
> > >> be considering adding any new algorithms. The existing ones can be
> > >> refactored to be more API driven (for both clustering and
> > classification)
> > >> and that's no trivial effort and could definitely use lot of help.
> > >>
> > >> How is what u r proposing gonna be any better than similar existing
> > >> implementations that Mahout
> > >> already has both in terms of functionality and performance, scaling ?
> > >> Are there users who
> > >> would prefer whatever u r proposing as opposed to using what already
> > >> exists in Mahout?
> > >>
> > >> We did purge a lot of the unmaintained and non-functional code for the
> > >> 0.9 release and are down to where we r today. There's still room for
> > >> improvement in what presently exists and the project could definitely
> > use
> > >> some help there.
> > >>
> > >> With the emphasis now on supporting Spark ASAP, any new
> implementations
> > >> would not make the task any easier.  There's still stuff in Mahout
> Math
> > >> that can be redone to be more flexible like the present Named Vector
> > (See
> > >> Mahout-1236). That's a very high priority for the next release, and is
> > >> gonna impact existing implementations once finalized. The present
> > codebase
> > >> is very heavily dependent on M/R, decoupling the relevant pieces from
> MR
> > >> api and being able to offer a potential Mahout user the choice of
> > different
> > >> execution engines (Spark or MR) is no trivial task.
> > >>
> > >> IMO, the emphasis should now be more on stabilizing, refactoring and
> > >> cleaning up the existing implementations (which is technical debt
> that's
> > >> building up) and porting stuff to Spark.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Sunday, March 16, 2014 4:39 PM, Ted Dunning <ted.dunning@gmail.com
> >
> > >> wrote:
> > >>
> > >> OK.
> > >>
> > >> I am confused now as well.
> > >>
> > >> Even so, I would recommend that you propose a non-map-reduce but still
> > >> parallel version.
> > >>
> > >> Some of the confusion may stem from the fact that you can design some
> > >> non-map-reduce programs to run in such a way that a map-reduce
> execution
> > >> framework like Hadoop thinks that they are doing map-reduce.  Instead,
> > >> these programs are doing whatever they feel like and just pretending
> to
> > be
> > >> map-reduce programs in order to get a bunch of processes launched.
> > >>
> > >>
> > >>
> > >>
> > >> On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <maciejmazurx@gmail.com
> > >> >wrote:
> > >>
> > >> > I have
> > >>  one final question.
> > >> >
> > >> > I've mixed feelings about this discussion.
> > >> > You are saying that there is no point in doing mapreduce
> > implementation
> > >> of
> > >> > neural netoworks (with pretraining).
> > >> > Then you are thinking that non map reduce would of substatial
> > interest.
> > >> > On the other hand you say that it would be easy and it beats the
> > >> purpose of
> > >> > doing it of doing it on mahout (because it is not a mr version).
> > >> > Finally you are saying that building something simple and working
> is a
> > >> good
> > >> > thing.
> > >> >
> > >> > I do not really know what to think about it.
> > >> > Could you give me some advice whether I should write a proposal or
> > not?
> > >> > (And if I should: Should I propose MapReduce or not MapReduce
> verison?
> > >> > There is
> > >>  already NN algorithm but
> > >>  without pretraining.)
> > >> >
> > >> > Thanks,
> > >> > Maciej Mazur
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Feb 28, 2014 at 5:44 AM, peng <pc...@uowmail.edu.au> wrote:
> > >> >
> > >> > > Oh, thanks a lot, I missed that one :)
> > >> > > +1 on easiest one implemented first. I haven't think about
> > difficulty
> > >> > > issue, need  to read more about YARN extension.
> > >> > >
> > >> > > Yours Peng
> > >> > >
> > >> > >
> > >> > > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
> > >> > >
> > >> > >> Hi, Peng,
> > >> > >>
> > >> > >> Do you mean the MultilayerPerceptron? There are three 'train'
> > method,
> > >> > and
> > >> > >> only one (the one without the parameters trackingKey and
> groupKey)
> > is
> > >> > >> implemented. In current implementation, they are not used.
> > >> > >>
> > >> > >> Regards,
> > >> > >> Yexi
> > >> > >>
> > >> > >>
> > >> > >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <te...@gmail.com>:
> > >> > >>
> > >> > >>  Generally for training models like this, there is an assumption
> > that
> > >> > >>> fault
> > >> > >>> tolerance is not
> > >>  particularly necessary because the low risk of failure
> > >> > >>> trades against algorithmic speed.  For reasonably small chance
> of
> > >> > >>> failure,
> > >> > >>> simply re-running the training is just fine.  If there is high
> > risk
> > >> of
> > >> > >>> failure, simply checkpointing the parameter server is sufficient
> > to
> > >> > allow
> > >> > >>> restarts without redundancy.
> > >> > >>>
> > >> > >>> Sharding the parameter is quite possible and is reasonable when
> > the
> > >> > >>> parameter vector exceed 10's or 100's of millions of parameters,
> > but
> > >> > >>> isn't
> > >> > >>> likely much necessary below that.
> > >> > >>>
> > >> > >>> The asymmetry is similarly not a big
> > >>  deal.  The traffic to and from the
> > >> >
> > >>  >>> parameter server isn't enormous.
> > >> > >>>
> > >> > >>>
> > >> > >>> Building something simple and working first is a good thing.
> > >> > >>>
> > >> > >>>
> > >> > >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc...@uowmail.edu.au>
> > wrote:
> > >> > >>>
> > >> > >>>  With pleasure! the original downpour paper propose a parameter
> > >> server
> > >> > >>>>
> > >> > >>> from
> > >> > >>>
> > >> > >>>> which subnodes download shards of old model and upload
> gradients.
> > >> So
> > >> > if
> > >> > >>>>
> > >> > >>>
> > >>  the
> > >> >
> > >>  >>>
> > >> > >>>> parameter server is down, the process has to be delayed, it
> also
> > >> > >>>> requires
> > >> > >>>> that all model parameters to be stored and atomically updated
> on
> > >> (and
> > >> > >>>> fetched from) a single machine, imposing asymmetric HDD and
> > >> bandwidth
> > >> > >>>> requirement. This design is necessary only because each -=delta
> > >> > >>>> operation
> > >> > >>>> has to be atomic. Which cannot be ensured across network (e.g.
> on
> > >> > HDFS).
> > >> > >>>>
> > >> > >>>> But it doesn't mean that the operation cannot be decentralized:
> > >> > >>>>
> > >> > >>> parameters
> > >> > >>>
> > >> > >>>> can be
> > >>  sharded across multiple nodes and multiple accumulator
> > >> > instances
> > >> > >>>>
> > >> > >>> can
> > >> > >>>
> > >> > >>>> handle parts of the vector subtraction. This should be easy if
> > you
> > >> > >>>>
> > >> > >>> create a
> > >> > >>>
> > >> > >>>> buffer for the stream of gradient, and allocate proper numbers
> of
> > >> > >>>>
> > >> > >>> producers
> > >> > >>>
> > >> > >>>> and consumers on each machine to make sure it doesn't overflow.
> > >> > >>>> Obviously
> > >> > >>>> this is far from MR framework, but at least it can be made
> > >> homogeneous
> > >> > >>>>
> > >> > >>>
> > >>  and
> > >> > >>>
> > >> > >>>> slightly faster (because sparse data can be distributed in a
> way
> > to
> > >> > >>>> minimize their overlapping, so gradients doesn't have to go
> > across
> > >> the
> > >> > >>>> network that frequent).
> > >> > >>>>
> > >> > >>>> If we instead using a centralized architect. Then there must be
> > >=1
> > >> > >>>>
> > >> > >>> backup
> > >> > >>>
> > >> > >>>> parameter server for mission critical training.
> > >> > >>>>
> > >> > >>>> Yours Peng
> > >> > >>>>
> > >> > >>>> e.g. we can simply use a producer/consumer pattern
> > >> > >>>>
> > >> > >>>> If we use a
> > >>  producer/consumer pattern for all gradients,
> > >> > >>>>
> > >> > >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> > >> > >>>>
> > >> > >>>>  Peng,
> > >> > >>>>>
> > >> > >>>>> Can you provide more details about your thought?
> > >> > >>>>>
> > >> > >>>>> Regards,
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>>> 2014-02-27 16:00 GMT-05:00 peng <pc...@uowmail.edu.au>:
> > >> > >>>>>
> > >> > >>>>>   That should be easy. But that defeats the purpose of using
> > >> mahout
> > >> > as
> > >> > >>>>>
> > >> > >>>>>> there
> > >> > >>>>>> are already enough implementations of single node
> > backpropagation
> > >> > (in
> > >> > >>>>>> which
> > >> > >>>>>> case GPU is much faster).
> > >> > >>>>>>
> > >> > >>>>>> Yexi:
> > >> > >>>>>>
> > >> > >>>>>> Regarding downpour SGD and sandblaster, may I suggest that
> the
> > >> > >>>>>> implementation better has no parameter server? It's
> obviously a
> > >> > single
> > >> > >>>>>> point of failure and in terms of bandwidth, a bottleneck. I
> > heard
> > >> > that
> > >> > >>>>>> MLlib on top of
> > >>  Spark has a functional
> > >>  implementation (never read or
> > >> > >>>>>>
> > >> > >>>>> test
> > >> > >>>
> > >> > >>>> it), and its possible to build the workflow on top of YARN. Non
> > of
> > >> > >>>>>>
> > >> > >>>>> those
> > >> > >>>
> > >> > >>>> framework has an heterogeneous topology.
> > >> > >>>>>>
> > >> > >>>>>> Yours Peng
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA)
> wrote:
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>         [
> > >> https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> > >> > >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> > >> > >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> > >> > >>>>>>>
> > >> > >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41
> PM:
> > >> > >>>>>>>
> > ---------------------------------------------------------------
> > >> > >>>>>>>
> > >> > >>>>>>> I've read the papers. I didn't think about distributed
> > network.
> > >> I
> > >> > had
> > >> > >>>>>>>
> > >> > >>>>>> in
> > >> >
> > >>  >>>
> > >> > >>>> mind network that will fit into memory, but will require
> > >> significant
> > >> > >>>>>>> amount
> > >> > >>>>>>> of computations.
> > >> > >>>>>>>
> > >> > >>>>>>> I understand that there are better options for neural
> networks
> > >> than
> > >> > >>>>>>>
> > >> > >>>>>> map
> > >> > >>>
> > >> > >>>> reduce.
> > >> > >>>>>>> How about non-map-reduce version?
> > >> > >>>>>>> I see that you think it is something that would make a
> sense.
> > >> > (Doing
> > >> > >>>>>>> a
> > >> > >>>>>>> non-map-reduce neural network in Mahout would be
> > >>  of
> > >>  substantial
> > >> > >>>>>>> interest.)
> > >> > >>>>>>> Do you think it will be a valueable contribution?
> > >> > >>>>>>> Is there a need for this type of algorithm?
> > >> > >>>>>>> I think about multi-threded batch gradient descent with
> > >> pretraining
> > >> > >>>>>>>
> > >> > >>>>>> (RBM
> > >> > >>>
> > >> > >>>> or/and Autoencoders).
> > >> > >>>>>>>
> > >> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> > >> > >>>>>>> "I would rather like to withdraw that patch, because by the
> > >> time i
> > >> > >>>>>>> implemented it i didn't know that the learning algorithm is
> > >>  not
> > >> > >>>>>>> suited
> > >> > >>>>>>> for
> > >> > >>>>>>> MR, so I think there is no point including the patch."
> > >> > >>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>> was (Author: maciejmazur):
> > >> > >>>>>>> I've read the papers. I didn't think about distributed
> > network.
> > >> I
> > >> > had
> > >> > >>>>>>>
> > >> > >>>>>> in
> > >> > >>>
> > >> > >>>> mind network that will fit into memory, but will require
> > >> significant
> > >> > >>>>>>> amount
> > >> > >>>>>>> of computations.
> > >> > >>>>>>>
> > >> > >>>>>>> I understand that there are better options for neural
> networks
> > >> than
> > >> > >>>>>>>
> > >> > >>>>>> map
> > >> > >>>
> > >> > >>>> reduce.
> > >> > >>>>>>> How about non-map-reduce version?
> > >> > >>>>>>> I see that you think it is something that would make a
> sense.
> > >> > >>>>>>> Do you think it will be a valueable contribution?
> > >> > >>>>>>> Is there a need for this type of algorithm?
> > >> > >>>>>>> I think about multi-threded batch gradient descent with
> > >> pretraining
> > >> > >>>>>>>
> > >> > >>>>>> (RBM
> > >> > >>>
> > >> >
> > >>  >>>> or/and Autoencoders).
> > >> > >>>>>>>
> > >> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> > >> > >>>>>>> "I would rather like to withdraw that patch, because by the
> > >> time i
> > >> > >>>>>>> implemented it i didn't know that the learning algorithm is
> > not
> > >> > >>>>>>> suited
> > >> > >>>>>>> for
> > >> > >>>>>>> MR, so I think there is no point including the patch."
> > >> > >>>>>>>
> > >> > >>>>>>>    GSOC 2013 Neural network algorithms
> > >> > >>>>>>>
> > >> > >>>>>>>  -----------------------------------
> > >> >
> > >>  >>>>>>>>
> > >> > >>>>>>>>                    Key: MAHOUT-1426
> > >> > >>>>>>>>                    URL: https://issues.apache.org/
> > >> > >>>>>>>> jira/browse/MAHOUT-1426
> > >> > >>>>>>>>                Project: Mahout
> > >> > >>>>>>>>             Issue Type: Improvement
> > >> > >>>>>>>>             Components: Classification
> > >> > >>>>>>>>
> > >>  Reporter: Maciej
> > >>  Mazur
> > >> > >>>>>>>>
> > >> > >>>>>>>> I would like to ask about possibilites of implementing
> neural
> > >> > >>>>>>>> network
> > >> > >>>>>>>> algorithms in mahout during GSOC.
> > >> > >>>>>>>> There is a classifier.mlp package with neural network.
> > >> > >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> > >> > >>>>>>>> There is only one word about Autoencoders in NeuralNetwork
> > >> class.
> > >> > >>>>>>>> As far as I know Mahout doesn't support convolutional
> > networks.
> > >> > >>>>>>>> Is it a good idea to implement one of these algorithms?
> > >> > >>>>>>>> Is it a
> > >>  reasonable amount of work?
> > >> > >>>>>>>> How hard is it to get GSOC in Mahout?
> > >> > >>>>>>>> Did anyone succeed last year?
> > >> > >>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>> --
> > >> > >>>>>>> This message was sent by Atlassian JIRA
> > >> > >>>>>>> (v6.1.5#6160)
> > >> > >>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>
> > >> >
> > >>  >>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> > >
> > >
> >
>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Ted Dunning <te...@gmail.com>.
I really think that a true downpour architecture is actually easier than
what you suggest and much better for the purpose.




On Wed, Mar 19, 2014 at 1:28 PM, Maciej Mazur <ma...@gmail.com>wrote:

> Any comments?
> I think it will work. If I will do one long lasting job, hack the file
> system from mapper in order to repeateadly update weights, perform mini
> batch GD, and store updates in some folder.
> In the background I could call small jobs for gathering gradients and
> updating weights.
>
>
> On Tue, Mar 18, 2014 at 10:11 PM, Maciej Mazur <maciejmazurx@gmail.com
> >wrote:
>
> > I'll say what I think about it.
> >
> > I know that mahout is currently heading in different direction. You are
> > working on refactoring, improving existing api and migrating to Spark. I
> > know that there is a great deal of work to do there. I would also like to
> > help with that.
> >
> > I am impressed by results achieved by using Neural Networks. Generally
> > speaking I think that NN give significant advantage over other methods in
> > wide range of problems. It beats other state of the art algorithms in
> > various areas. I think that in the future this algorithm will play even
> > greater role.
> > That's why I came up with an idea to implement neural networks.
> >
> > When it comes to functionality: pretraining (RBM), training
> (SGD/minibatch
> > gradient descent + backpropagation + momentum) and classification.
> >
> > Unfortunately mapreduce is illsuited for NNs.
> > The biggest problem is how to reduce the number of iterations.
> > It is possible to divide data and use momentum applied to edges - it
> helps
> > a little, but doesn't solve the problem.
> >
> > I've some idea of not exactly mapreduce implementation. But I am not sure
> > whether it is possible using this infrastructure. For sure it is not
> plain
> > map reduce.
> > In other distributed NNs implementation there are asynchronic operations.
> > Is it possible to take adventage of asynchrony?
> > At first I would separate data, some subset on every node.
> > On each node I will use a number of files (directories) for storing
> > weights.
> > Each machile will use these files to count the cost function and update
> > gradient.
> > In the background multiple reduce job will average gradients for some
> > subsets of weights (one file).
> > Then asynchronously update some subset of weights (from one file).
> > In a way this idea is similar to Downpour SGD from
> >
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf
> >
> > There are couple of problems here. Is it a feasible solution?
> >
> >
> > Parallel implementation is very complex. It's hard to design something
> > that uses mapreduce, but it's not a mapreduce algorithm.
> > Definitely you are more experienced than me and I'll need a lot of help.
> I
> > may not be aware of some limitations.
> >
> > From my perspective it would be a great experience, even if I could do
> > something other than NNs. Frankly speaking I think I'll stay here
> > regardless of whether my propasal will be accepted. It'll be a great
> > opportunity to learn.
> >
> >
> >
> >
> > On Mon, Mar 17, 2014 at 5:27 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
> >
> >> I would suggest looking at deeplearning4j.org (they went public very
> >> recently) and see how they had utilized Iterative Reduce for
> implementing
> >> Neural Nets.
> >>
> >> Not sure given the present state of flux on the project if we should
> even
> >> be considering adding any new algorithms. The existing ones can be
> >> refactored to be more API driven (for both clustering and
> classification)
> >> and that's no trivial effort and could definitely use lot of help.
> >>
> >> How is what u r proposing gonna be any better than similar existing
> >> implementations that Mahout
> >> already has both in terms of functionality and performance, scaling ?
> >> Are there users who
> >> would prefer whatever u r proposing as opposed to using what already
> >> exists in Mahout?
> >>
> >> We did purge a lot of the unmaintained and non-functional code for the
> >> 0.9 release and are down to where we r today. There's still room for
> >> improvement in what presently exists and the project could definitely
> use
> >> some help there.
> >>
> >> With the emphasis now on supporting Spark ASAP, any new implementations
> >> would not make the task any easier.  There's still stuff in Mahout Math
> >> that can be redone to be more flexible like the present Named Vector
> (See
> >> Mahout-1236). That's a very high priority for the next release, and is
> >> gonna impact existing implementations once finalized. The present
> codebase
> >> is very heavily dependent on M/R, decoupling the relevant pieces from MR
> >> api and being able to offer a potential Mahout user the choice of
> different
> >> execution engines (Spark or MR) is no trivial task.
> >>
> >> IMO, the emphasis should now be more on stabilizing, refactoring and
> >> cleaning up the existing implementations (which is technical debt that's
> >> building up) and porting stuff to Spark.
> >>
> >>
> >>
> >>
> >>
> >> On Sunday, March 16, 2014 4:39 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>
> >> OK.
> >>
> >> I am confused now as well.
> >>
> >> Even so, I would recommend that you propose a non-map-reduce but still
> >> parallel version.
> >>
> >> Some of the confusion may stem from the fact that you can design some
> >> non-map-reduce programs to run in such a way that a map-reduce execution
> >> framework like Hadoop thinks that they are doing map-reduce.  Instead,
> >> these programs are doing whatever they feel like and just pretending to
> be
> >> map-reduce programs in order to get a bunch of processes launched.
> >>
> >>
> >>
> >>
> >> On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <maciejmazurx@gmail.com
> >> >wrote:
> >>
> >> > I have
> >>  one final question.
> >> >
> >> > I've mixed feelings about this discussion.
> >> > You are saying that there is no point in doing mapreduce
> implementation
> >> of
> >> > neural netoworks (with pretraining).
> >> > Then you are thinking that non map reduce would of substatial
> interest.
> >> > On the other hand you say that it would be easy and it beats the
> >> purpose of
> >> > doing it of doing it on mahout (because it is not a mr version).
> >> > Finally you are saying that building something simple and working is a
> >> good
> >> > thing.
> >> >
> >> > I do not really know what to think about it.
> >> > Could you give me some advice whether I should write a proposal or
> not?
> >> > (And if I should: Should I propose MapReduce or not MapReduce verison?
> >> > There is
> >>  already NN algorithm but
> >>  without pretraining.)
> >> >
> >> > Thanks,
> >> > Maciej Mazur
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Feb 28, 2014 at 5:44 AM, peng <pc...@uowmail.edu.au> wrote:
> >> >
> >> > > Oh, thanks a lot, I missed that one :)
> >> > > +1 on easiest one implemented first. I haven't think about
> difficulty
> >> > > issue, need  to read more about YARN extension.
> >> > >
> >> > > Yours Peng
> >> > >
> >> > >
> >> > > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
> >> > >
> >> > >> Hi, Peng,
> >> > >>
> >> > >> Do you mean the MultilayerPerceptron? There are three 'train'
> method,
> >> > and
> >> > >> only one (the one without the parameters trackingKey and groupKey)
> is
> >> > >> implemented. In current implementation, they are not used.
> >> > >>
> >> > >> Regards,
> >> > >> Yexi
> >> > >>
> >> > >>
> >> > >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <te...@gmail.com>:
> >> > >>
> >> > >>  Generally for training models like this, there is an assumption
> that
> >> > >>> fault
> >> > >>> tolerance is not
> >>  particularly necessary because the low risk of failure
> >> > >>> trades against algorithmic speed.  For reasonably small chance of
> >> > >>> failure,
> >> > >>> simply re-running the training is just fine.  If there is high
> risk
> >> of
> >> > >>> failure, simply checkpointing the parameter server is sufficient
> to
> >> > allow
> >> > >>> restarts without redundancy.
> >> > >>>
> >> > >>> Sharding the parameter is quite possible and is reasonable when
> the
> >> > >>> parameter vector exceed 10's or 100's of millions of parameters,
> but
> >> > >>> isn't
> >> > >>> likely much necessary below that.
> >> > >>>
> >> > >>> The asymmetry is similarly not a big
> >>  deal.  The traffic to and from the
> >> >
> >>  >>> parameter server isn't enormous.
> >> > >>>
> >> > >>>
> >> > >>> Building something simple and working first is a good thing.
> >> > >>>
> >> > >>>
> >> > >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc...@uowmail.edu.au>
> wrote:
> >> > >>>
> >> > >>>  With pleasure! the original downpour paper propose a parameter
> >> server
> >> > >>>>
> >> > >>> from
> >> > >>>
> >> > >>>> which subnodes download shards of old model and upload gradients.
> >> So
> >> > if
> >> > >>>>
> >> > >>>
> >>  the
> >> >
> >>  >>>
> >> > >>>> parameter server is down, the process has to be delayed, it also
> >> > >>>> requires
> >> > >>>> that all model parameters to be stored and atomically updated on
> >> (and
> >> > >>>> fetched from) a single machine, imposing asymmetric HDD and
> >> bandwidth
> >> > >>>> requirement. This design is necessary only because each -=delta
> >> > >>>> operation
> >> > >>>> has to be atomic. Which cannot be ensured across network (e.g. on
> >> > HDFS).
> >> > >>>>
> >> > >>>> But it doesn't mean that the operation cannot be decentralized:
> >> > >>>>
> >> > >>> parameters
> >> > >>>
> >> > >>>> can be
> >>  sharded across multiple nodes and multiple accumulator
> >> > instances
> >> > >>>>
> >> > >>> can
> >> > >>>
> >> > >>>> handle parts of the vector subtraction. This should be easy if
> you
> >> > >>>>
> >> > >>> create a
> >> > >>>
> >> > >>>> buffer for the stream of gradient, and allocate proper numbers of
> >> > >>>>
> >> > >>> producers
> >> > >>>
> >> > >>>> and consumers on each machine to make sure it doesn't overflow.
> >> > >>>> Obviously
> >> > >>>> this is far from MR framework, but at least it can be made
> >> homogeneous
> >> > >>>>
> >> > >>>
> >>  and
> >> > >>>
> >> > >>>> slightly faster (because sparse data can be distributed in a way
> to
> >> > >>>> minimize their overlapping, so gradients doesn't have to go
> across
> >> the
> >> > >>>> network that frequent).
> >> > >>>>
> >> > >>>> If we instead using a centralized architect. Then there must be
> >=1
> >> > >>>>
> >> > >>> backup
> >> > >>>
> >> > >>>> parameter server for mission critical training.
> >> > >>>>
> >> > >>>> Yours Peng
> >> > >>>>
> >> > >>>> e.g. we can simply use a producer/consumer pattern
> >> > >>>>
> >> > >>>> If we use a
> >>  producer/consumer pattern for all gradients,
> >> > >>>>
> >> > >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> >> > >>>>
> >> > >>>>  Peng,
> >> > >>>>>
> >> > >>>>> Can you provide more details about your thought?
> >> > >>>>>
> >> > >>>>> Regards,
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> 2014-02-27 16:00 GMT-05:00 peng <pc...@uowmail.edu.au>:
> >> > >>>>>
> >> > >>>>>   That should be easy. But that defeats the purpose of using
> >> mahout
> >> > as
> >> > >>>>>
> >> > >>>>>> there
> >> > >>>>>> are already enough implementations of single node
> backpropagation
> >> > (in
> >> > >>>>>> which
> >> > >>>>>> case GPU is much faster).
> >> > >>>>>>
> >> > >>>>>> Yexi:
> >> > >>>>>>
> >> > >>>>>> Regarding downpour SGD and sandblaster, may I suggest that the
> >> > >>>>>> implementation better has no parameter server? It's obviously a
> >> > single
> >> > >>>>>> point of failure and in terms of bandwidth, a bottleneck. I
> heard
> >> > that
> >> > >>>>>> MLlib on top of
> >>  Spark has a functional
> >>  implementation (never read or
> >> > >>>>>>
> >> > >>>>> test
> >> > >>>
> >> > >>>> it), and its possible to build the workflow on top of YARN. Non
> of
> >> > >>>>>>
> >> > >>>>> those
> >> > >>>
> >> > >>>> framework has an heterogeneous topology.
> >> > >>>>>>
> >> > >>>>>> Yours Peng
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>         [
> >> https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> >> > >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >> > >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> >> > >>>>>>>
> >> > >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
> >> > >>>>>>>
> ---------------------------------------------------------------
> >> > >>>>>>>
> >> > >>>>>>> I've read the papers. I didn't think about distributed
> network.
> >> I
> >> > had
> >> > >>>>>>>
> >> > >>>>>> in
> >> >
> >>  >>>
> >> > >>>> mind network that will fit into memory, but will require
> >> significant
> >> > >>>>>>> amount
> >> > >>>>>>> of computations.
> >> > >>>>>>>
> >> > >>>>>>> I understand that there are better options for neural networks
> >> than
> >> > >>>>>>>
> >> > >>>>>> map
> >> > >>>
> >> > >>>> reduce.
> >> > >>>>>>> How about non-map-reduce version?
> >> > >>>>>>> I see that you think it is something that would make a sense.
> >> > (Doing
> >> > >>>>>>> a
> >> > >>>>>>> non-map-reduce neural network in Mahout would be
> >>  of
> >>  substantial
> >> > >>>>>>> interest.)
> >> > >>>>>>> Do you think it will be a valueable contribution?
> >> > >>>>>>> Is there a need for this type of algorithm?
> >> > >>>>>>> I think about multi-threded batch gradient descent with
> >> pretraining
> >> > >>>>>>>
> >> > >>>>>> (RBM
> >> > >>>
> >> > >>>> or/and Autoencoders).
> >> > >>>>>>>
> >> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >> > >>>>>>> "I would rather like to withdraw that patch, because by the
> >> time i
> >> > >>>>>>> implemented it i didn't know that the learning algorithm is
> >>  not
> >> > >>>>>>> suited
> >> > >>>>>>> for
> >> > >>>>>>> MR, so I think there is no point including the patch."
> >> > >>>>>>>
> >> > >>>>>>>
> >> > >>>>>>> was (Author: maciejmazur):
> >> > >>>>>>> I've read the papers. I didn't think about distributed
> network.
> >> I
> >> > had
> >> > >>>>>>>
> >> > >>>>>> in
> >> > >>>
> >> > >>>> mind network that will fit into memory, but will require
> >> significant
> >> > >>>>>>> amount
> >> > >>>>>>> of computations.
> >> > >>>>>>>
> >> > >>>>>>> I understand that there are better options for neural networks
> >> than
> >> > >>>>>>>
> >> > >>>>>> map
> >> > >>>
> >> > >>>> reduce.
> >> > >>>>>>> How about non-map-reduce version?
> >> > >>>>>>> I see that you think it is something that would make a sense.
> >> > >>>>>>> Do you think it will be a valueable contribution?
> >> > >>>>>>> Is there a need for this type of algorithm?
> >> > >>>>>>> I think about multi-threded batch gradient descent with
> >> pretraining
> >> > >>>>>>>
> >> > >>>>>> (RBM
> >> > >>>
> >> >
> >>  >>>> or/and Autoencoders).
> >> > >>>>>>>
> >> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >> > >>>>>>> "I would rather like to withdraw that patch, because by the
> >> time i
> >> > >>>>>>> implemented it i didn't know that the learning algorithm is
> not
> >> > >>>>>>> suited
> >> > >>>>>>> for
> >> > >>>>>>> MR, so I think there is no point including the patch."
> >> > >>>>>>>
> >> > >>>>>>>    GSOC 2013 Neural network algorithms
> >> > >>>>>>>
> >> > >>>>>>>  -----------------------------------
> >> >
> >>  >>>>>>>>
> >> > >>>>>>>>                    Key: MAHOUT-1426
> >> > >>>>>>>>                    URL: https://issues.apache.org/
> >> > >>>>>>>> jira/browse/MAHOUT-1426
> >> > >>>>>>>>                Project: Mahout
> >> > >>>>>>>>             Issue Type: Improvement
> >> > >>>>>>>>             Components: Classification
> >> > >>>>>>>>
> >>  Reporter: Maciej
> >>  Mazur
> >> > >>>>>>>>
> >> > >>>>>>>> I would like to ask about possibilites of implementing neural
> >> > >>>>>>>> network
> >> > >>>>>>>> algorithms in mahout during GSOC.
> >> > >>>>>>>> There is a classifier.mlp package with neural network.
> >> > >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> >> > >>>>>>>> There is only one word about Autoencoders in NeuralNetwork
> >> class.
> >> > >>>>>>>> As far as I know Mahout doesn't support convolutional
> networks.
> >> > >>>>>>>> Is it a good idea to implement one of these algorithms?
> >> > >>>>>>>> Is it a
> >>  reasonable amount of work?
> >> > >>>>>>>> How hard is it to get GSOC in Mahout?
> >> > >>>>>>>> Did anyone succeed last year?
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>
> >> > >>>>>>> --
> >> > >>>>>>> This message was sent by Atlassian JIRA
> >> > >>>>>>> (v6.1.5#6160)
> >> > >>>>>>>
> >> > >>>>>>>
> >> > >>>>>>>
> >> > >>>>>>
> >> > >>>>>
> >> > >>>>>
> >> > >>>
> >> >
> >>  >>
> >> > >>
> >> > >>
> >> >
> >>
> >
> >
>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Maciej Mazur <ma...@gmail.com>.
Any comments?
I think it will work. If I will do one long lasting job, hack the file
system from mapper in order to repeateadly update weights, perform mini
batch GD, and store updates in some folder.
In the background I could call small jobs for gathering gradients and
updating weights.


On Tue, Mar 18, 2014 at 10:11 PM, Maciej Mazur <ma...@gmail.com>wrote:

> I'll say what I think about it.
>
> I know that mahout is currently heading in different direction. You are
> working on refactoring, improving existing api and migrating to Spark. I
> know that there is a great deal of work to do there. I would also like to
> help with that.
>
> I am impressed by results achieved by using Neural Networks. Generally
> speaking I think that NN give significant advantage over other methods in
> wide range of problems. It beats other state of the art algorithms in
> various areas. I think that in the future this algorithm will play even
> greater role.
> That's why I came up with an idea to implement neural networks.
>
> When it comes to functionality: pretraining (RBM), training (SGD/minibatch
> gradient descent + backpropagation + momentum) and classification.
>
> Unfortunately mapreduce is illsuited for NNs.
> The biggest problem is how to reduce the number of iterations.
> It is possible to divide data and use momentum applied to edges - it helps
> a little, but doesn't solve the problem.
>
> I've some idea of not exactly mapreduce implementation. But I am not sure
> whether it is possible using this infrastructure. For sure it is not plain
> map reduce.
> In other distributed NNs implementation there are asynchronic operations.
> Is it possible to take adventage of asynchrony?
> At first I would separate data, some subset on every node.
> On each node I will use a number of files (directories) for storing
> weights.
> Each machile will use these files to count the cost function and update
> gradient.
> In the background multiple reduce job will average gradients for some
> subsets of weights (one file).
> Then asynchronously update some subset of weights (from one file).
> In a way this idea is similar to Downpour SGD from
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf
>
> There are couple of problems here. Is it a feasible solution?
>
>
> Parallel implementation is very complex. It's hard to design something
> that uses mapreduce, but it's not a mapreduce algorithm.
> Definitely you are more experienced than me and I'll need a lot of help. I
> may not be aware of some limitations.
>
> From my perspective it would be a great experience, even if I could do
> something other than NNs. Frankly speaking I think I'll stay here
> regardless of whether my propasal will be accepted. It'll be a great
> opportunity to learn.
>
>
>
>
> On Mon, Mar 17, 2014 at 5:27 AM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> I would suggest looking at deeplearning4j.org (they went public very
>> recently) and see how they had utilized Iterative Reduce for implementing
>> Neural Nets.
>>
>> Not sure given the present state of flux on the project if we should even
>> be considering adding any new algorithms. The existing ones can be
>> refactored to be more API driven (for both clustering and classification)
>> and that's no trivial effort and could definitely use lot of help.
>>
>> How is what u r proposing gonna be any better than similar existing
>> implementations that Mahout
>> already has both in terms of functionality and performance, scaling ?
>> Are there users who
>> would prefer whatever u r proposing as opposed to using what already
>> exists in Mahout?
>>
>> We did purge a lot of the unmaintained and non-functional code for the
>> 0.9 release and are down to where we r today. There's still room for
>> improvement in what presently exists and the project could definitely use
>> some help there.
>>
>> With the emphasis now on supporting Spark ASAP, any new implementations
>> would not make the task any easier.  There's still stuff in Mahout Math
>> that can be redone to be more flexible like the present Named Vector (See
>> Mahout-1236). That's a very high priority for the next release, and is
>> gonna impact existing implementations once finalized. The present codebase
>> is very heavily dependent on M/R, decoupling the relevant pieces from MR
>> api and being able to offer a potential Mahout user the choice of different
>> execution engines (Spark or MR) is no trivial task.
>>
>> IMO, the emphasis should now be more on stabilizing, refactoring and
>> cleaning up the existing implementations (which is technical debt that's
>> building up) and porting stuff to Spark.
>>
>>
>>
>>
>>
>> On Sunday, March 16, 2014 4:39 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>> OK.
>>
>> I am confused now as well.
>>
>> Even so, I would recommend that you propose a non-map-reduce but still
>> parallel version.
>>
>> Some of the confusion may stem from the fact that you can design some
>> non-map-reduce programs to run in such a way that a map-reduce execution
>> framework like Hadoop thinks that they are doing map-reduce.  Instead,
>> these programs are doing whatever they feel like and just pretending to be
>> map-reduce programs in order to get a bunch of processes launched.
>>
>>
>>
>>
>> On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <maciejmazurx@gmail.com
>> >wrote:
>>
>> > I have
>>  one final question.
>> >
>> > I've mixed feelings about this discussion.
>> > You are saying that there is no point in doing mapreduce implementation
>> of
>> > neural netoworks (with pretraining).
>> > Then you are thinking that non map reduce would of substatial interest.
>> > On the other hand you say that it would be easy and it beats the
>> purpose of
>> > doing it of doing it on mahout (because it is not a mr version).
>> > Finally you are saying that building something simple and working is a
>> good
>> > thing.
>> >
>> > I do not really know what to think about it.
>> > Could you give me some advice whether I should write a proposal or not?
>> > (And if I should: Should I propose MapReduce or not MapReduce verison?
>> > There is
>>  already NN algorithm but
>>  without pretraining.)
>> >
>> > Thanks,
>> > Maciej Mazur
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Feb 28, 2014 at 5:44 AM, peng <pc...@uowmail.edu.au> wrote:
>> >
>> > > Oh, thanks a lot, I missed that one :)
>> > > +1 on easiest one implemented first. I haven't think about difficulty
>> > > issue, need  to read more about YARN extension.
>> > >
>> > > Yours Peng
>> > >
>> > >
>> > > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
>> > >
>> > >> Hi, Peng,
>> > >>
>> > >> Do you mean the MultilayerPerceptron? There are three 'train' method,
>> > and
>> > >> only one (the one without the parameters trackingKey and groupKey) is
>> > >> implemented. In current implementation, they are not used.
>> > >>
>> > >> Regards,
>> > >> Yexi
>> > >>
>> > >>
>> > >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <te...@gmail.com>:
>> > >>
>> > >>  Generally for training models like this, there is an assumption that
>> > >>> fault
>> > >>> tolerance is not
>>  particularly necessary because the low risk of failure
>> > >>> trades against algorithmic speed.  For reasonably small chance of
>> > >>> failure,
>> > >>> simply re-running the training is just fine.  If there is high risk
>> of
>> > >>> failure, simply checkpointing the parameter server is sufficient to
>> > allow
>> > >>> restarts without redundancy.
>> > >>>
>> > >>> Sharding the parameter is quite possible and is reasonable when the
>> > >>> parameter vector exceed 10's or 100's of millions of parameters, but
>> > >>> isn't
>> > >>> likely much necessary below that.
>> > >>>
>> > >>> The asymmetry is similarly not a big
>>  deal.  The traffic to and from the
>> >
>>  >>> parameter server isn't enormous.
>> > >>>
>> > >>>
>> > >>> Building something simple and working first is a good thing.
>> > >>>
>> > >>>
>> > >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc...@uowmail.edu.au> wrote:
>> > >>>
>> > >>>  With pleasure! the original downpour paper propose a parameter
>> server
>> > >>>>
>> > >>> from
>> > >>>
>> > >>>> which subnodes download shards of old model and upload gradients.
>> So
>> > if
>> > >>>>
>> > >>>
>>  the
>> >
>>  >>>
>> > >>>> parameter server is down, the process has to be delayed, it also
>> > >>>> requires
>> > >>>> that all model parameters to be stored and atomically updated on
>> (and
>> > >>>> fetched from) a single machine, imposing asymmetric HDD and
>> bandwidth
>> > >>>> requirement. This design is necessary only because each -=delta
>> > >>>> operation
>> > >>>> has to be atomic. Which cannot be ensured across network (e.g. on
>> > HDFS).
>> > >>>>
>> > >>>> But it doesn't mean that the operation cannot be decentralized:
>> > >>>>
>> > >>> parameters
>> > >>>
>> > >>>> can be
>>  sharded across multiple nodes and multiple accumulator
>> > instances
>> > >>>>
>> > >>> can
>> > >>>
>> > >>>> handle parts of the vector subtraction. This should be easy if you
>> > >>>>
>> > >>> create a
>> > >>>
>> > >>>> buffer for the stream of gradient, and allocate proper numbers of
>> > >>>>
>> > >>> producers
>> > >>>
>> > >>>> and consumers on each machine to make sure it doesn't overflow.
>> > >>>> Obviously
>> > >>>> this is far from MR framework, but at least it can be made
>> homogeneous
>> > >>>>
>> > >>>
>>  and
>> > >>>
>> > >>>> slightly faster (because sparse data can be distributed in a way to
>> > >>>> minimize their overlapping, so gradients doesn't have to go across
>> the
>> > >>>> network that frequent).
>> > >>>>
>> > >>>> If we instead using a centralized architect. Then there must be >=1
>> > >>>>
>> > >>> backup
>> > >>>
>> > >>>> parameter server for mission critical training.
>> > >>>>
>> > >>>> Yours Peng
>> > >>>>
>> > >>>> e.g. we can simply use a producer/consumer pattern
>> > >>>>
>> > >>>> If we use a
>>  producer/consumer pattern for all gradients,
>> > >>>>
>> > >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
>> > >>>>
>> > >>>>  Peng,
>> > >>>>>
>> > >>>>> Can you provide more details about your thought?
>> > >>>>>
>> > >>>>> Regards,
>> > >>>>>
>> > >>>>>
>> > >>>>> 2014-02-27 16:00 GMT-05:00 peng <pc...@uowmail.edu.au>:
>> > >>>>>
>> > >>>>>   That should be easy. But that defeats the purpose of using
>> mahout
>> > as
>> > >>>>>
>> > >>>>>> there
>> > >>>>>> are already enough implementations of single node backpropagation
>> > (in
>> > >>>>>> which
>> > >>>>>> case GPU is much faster).
>> > >>>>>>
>> > >>>>>> Yexi:
>> > >>>>>>
>> > >>>>>> Regarding downpour SGD and sandblaster, may I suggest that the
>> > >>>>>> implementation better has no parameter server? It's obviously a
>> > single
>> > >>>>>> point of failure and in terms of bandwidth, a bottleneck. I heard
>> > that
>> > >>>>>> MLlib on top of
>>  Spark has a functional
>>  implementation (never read or
>> > >>>>>>
>> > >>>>> test
>> > >>>
>> > >>>> it), and its possible to build the workflow on top of YARN. Non of
>> > >>>>>>
>> > >>>>> those
>> > >>>
>> > >>>> framework has an heterogeneous topology.
>> > >>>>>>
>> > >>>>>> Yours Peng
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>         [
>> https://issues.apache.org/jira/browse/MAHOUT-1426?page=
>> > >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> > >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
>> > >>>>>>>
>> > >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
>> > >>>>>>> ---------------------------------------------------------------
>> > >>>>>>>
>> > >>>>>>> I've read the papers. I didn't think about distributed network.
>> I
>> > had
>> > >>>>>>>
>> > >>>>>> in
>> >
>>  >>>
>> > >>>> mind network that will fit into memory, but will require
>> significant
>> > >>>>>>> amount
>> > >>>>>>> of computations.
>> > >>>>>>>
>> > >>>>>>> I understand that there are better options for neural networks
>> than
>> > >>>>>>>
>> > >>>>>> map
>> > >>>
>> > >>>> reduce.
>> > >>>>>>> How about non-map-reduce version?
>> > >>>>>>> I see that you think it is something that would make a sense.
>> > (Doing
>> > >>>>>>> a
>> > >>>>>>> non-map-reduce neural network in Mahout would be
>>  of
>>  substantial
>> > >>>>>>> interest.)
>> > >>>>>>> Do you think it will be a valueable contribution?
>> > >>>>>>> Is there a need for this type of algorithm?
>> > >>>>>>> I think about multi-threded batch gradient descent with
>> pretraining
>> > >>>>>>>
>> > >>>>>> (RBM
>> > >>>
>> > >>>> or/and Autoencoders).
>> > >>>>>>>
>> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
>> > >>>>>>> "I would rather like to withdraw that patch, because by the
>> time i
>> > >>>>>>> implemented it i didn't know that the learning algorithm is
>>  not
>> > >>>>>>> suited
>> > >>>>>>> for
>> > >>>>>>> MR, so I think there is no point including the patch."
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> was (Author: maciejmazur):
>> > >>>>>>> I've read the papers. I didn't think about distributed network.
>> I
>> > had
>> > >>>>>>>
>> > >>>>>> in
>> > >>>
>> > >>>> mind network that will fit into memory, but will require
>> significant
>> > >>>>>>> amount
>> > >>>>>>> of computations.
>> > >>>>>>>
>> > >>>>>>> I understand that there are better options for neural networks
>> than
>> > >>>>>>>
>> > >>>>>> map
>> > >>>
>> > >>>> reduce.
>> > >>>>>>> How about non-map-reduce version?
>> > >>>>>>> I see that you think it is something that would make a sense.
>> > >>>>>>> Do you think it will be a valueable contribution?
>> > >>>>>>> Is there a need for this type of algorithm?
>> > >>>>>>> I think about multi-threded batch gradient descent with
>> pretraining
>> > >>>>>>>
>> > >>>>>> (RBM
>> > >>>
>> >
>>  >>>> or/and Autoencoders).
>> > >>>>>>>
>> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
>> > >>>>>>> "I would rather like to withdraw that patch, because by the
>> time i
>> > >>>>>>> implemented it i didn't know that the learning algorithm is not
>> > >>>>>>> suited
>> > >>>>>>> for
>> > >>>>>>> MR, so I think there is no point including the patch."
>> > >>>>>>>
>> > >>>>>>>    GSOC 2013 Neural network algorithms
>> > >>>>>>>
>> > >>>>>>>  -----------------------------------
>> >
>>  >>>>>>>>
>> > >>>>>>>>                    Key: MAHOUT-1426
>> > >>>>>>>>                    URL: https://issues.apache.org/
>> > >>>>>>>> jira/browse/MAHOUT-1426
>> > >>>>>>>>                Project: Mahout
>> > >>>>>>>>             Issue Type: Improvement
>> > >>>>>>>>             Components: Classification
>> > >>>>>>>>
>>  Reporter: Maciej
>>  Mazur
>> > >>>>>>>>
>> > >>>>>>>> I would like to ask about possibilites of implementing neural
>> > >>>>>>>> network
>> > >>>>>>>> algorithms in mahout during GSOC.
>> > >>>>>>>> There is a classifier.mlp package with neural network.
>> > >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
>> > >>>>>>>> There is only one word about Autoencoders in NeuralNetwork
>> class.
>> > >>>>>>>> As far as I know Mahout doesn't support convolutional networks.
>> > >>>>>>>> Is it a good idea to implement one of these algorithms?
>> > >>>>>>>> Is it a
>>  reasonable amount of work?
>> > >>>>>>>> How hard is it to get GSOC in Mahout?
>> > >>>>>>>> Did anyone succeed last year?
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>
>> > >>>>>>> --
>> > >>>>>>> This message was sent by Atlassian JIRA
>> > >>>>>>> (v6.1.5#6160)
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>>>
>> > >>>
>> >
>>  >>
>> > >>
>> > >>
>> >
>>
>
>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Maciej Mazur <ma...@gmail.com>.
I'll say what I think about it.

I know that mahout is currently heading in different direction. You are
working on refactoring, improving existing api and migrating to Spark. I
know that there is a great deal of work to do there. I would also like to
help with that.

I am impressed by results achieved by using Neural Networks. Generally
speaking I think that NN give significant advantage over other methods in
wide range of problems. It beats other state of the art algorithms in
various areas. I think that in the future this algorithm will play even
greater role.
That's why I came up with an idea to implement neural networks.

When it comes to functionality: pretraining (RBM), training (SGD/minibatch
gradient descent + backpropagation + momentum) and classification.

Unfortunately mapreduce is illsuited for NNs.
The biggest problem is how to reduce the number of iterations.
It is possible to divide data and use momentum applied to edges - it helps
a little, but doesn't solve the problem.

I've some idea of not exactly mapreduce implementation. But I am not sure
whether it is possible using this infrastructure. For sure it is not plain
map reduce.
In other distributed NNs implementation there are asynchronic operations.
Is it possible to take adventage of asynchrony?
At first I would separate data, some subset on every node.
On each node I will use a number of files (directories) for storing
weights.
Each machile will use these files to count the cost function and update
gradient.
In the background multiple reduce job will average gradients for some
subsets of weights (one file).
Then asynchronously update some subset of weights (from one file).
In a way this idea is similar to Downpour SGD from
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf

There are couple of problems here. Is it a feasible solution?


Parallel implementation is very complex. It's hard to design something that
uses mapreduce, but it's not a mapreduce algorithm.
Definitely you are more experienced than me and I'll need a lot of help. I
may not be aware of some limitations.

>From my perspective it would be a great experience, even if I could do
something other than NNs. Frankly speaking I think I'll stay here
regardless of whether my propasal will be accepted. It'll be a great
opportunity to learn.




On Mon, Mar 17, 2014 at 5:27 AM, Suneel Marthi <su...@yahoo.com>wrote:

> I would suggest looking at deeplearning4j.org (they went public very
> recently) and see how they had utilized Iterative Reduce for implementing
> Neural Nets.
>
> Not sure given the present state of flux on the project if we should even
> be considering adding any new algorithms. The existing ones can be
> refactored to be more API driven (for both clustering and classification)
> and that's no trivial effort and could definitely use lot of help.
>
> How is what u r proposing gonna be any better than similar existing
> implementations that Mahout
> already has both in terms of functionality and performance, scaling ?  Are
> there users who
> would prefer whatever u r proposing as opposed to using what already
> exists in Mahout?
>
> We did purge a lot of the unmaintained and non-functional code for the 0.9
> release and are down to where we r today. There's still room for
> improvement in what presently exists and the project could definitely use
> some help there.
>
> With the emphasis now on supporting Spark ASAP, any new implementations
> would not make the task any easier.  There's still stuff in Mahout Math
> that can be redone to be more flexible like the present Named Vector (See
> Mahout-1236). That's a very high priority for the next release, and is
> gonna impact existing implementations once finalized. The present codebase
> is very heavily dependent on M/R, decoupling the relevant pieces from MR
> api and being able to offer a potential Mahout user the choice of different
> execution engines (Spark or MR) is no trivial task.
>
> IMO, the emphasis should now be more on stabilizing, refactoring and
> cleaning up the existing implementations (which is technical debt that's
> building up) and porting stuff to Spark.
>
>
>
>
>
> On Sunday, March 16, 2014 4:39 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> OK.
>
> I am confused now as well.
>
> Even so, I would recommend that you propose a non-map-reduce but still
> parallel version.
>
> Some of the confusion may stem from the fact that you can design some
> non-map-reduce programs to run in such a way that a map-reduce execution
> framework like Hadoop thinks that they are doing map-reduce.  Instead,
> these programs are doing whatever they feel like and just pretending to be
> map-reduce programs in order to get a bunch of processes launched.
>
>
>
>
> On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <maciejmazurx@gmail.com
> >wrote:
>
> > I have
>  one final question.
> >
> > I've mixed feelings about this discussion.
> > You are saying that there is no point in doing mapreduce implementation
> of
> > neural netoworks (with pretraining).
> > Then you are thinking that non map reduce would of substatial interest.
> > On the other hand you say that it would be easy and it beats the purpose
> of
> > doing it of doing it on mahout (because it is not a mr version).
> > Finally you are saying that building something simple and working is a
> good
> > thing.
> >
> > I do not really know what to think about it.
> > Could you give me some advice whether I should write a proposal or not?
> > (And if I should: Should I propose MapReduce or not MapReduce verison?
> > There is
>  already NN algorithm but
>  without pretraining.)
> >
> > Thanks,
> > Maciej Mazur
> >
> >
> >
> >
> >
> > On Fri, Feb 28, 2014 at 5:44 AM, peng <pc...@uowmail.edu.au> wrote:
> >
> > > Oh, thanks a lot, I missed that one :)
> > > +1 on easiest one implemented first. I haven't think about difficulty
> > > issue, need  to read more about YARN extension.
> > >
> > > Yours Peng
> > >
> > >
> > > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
> > >
> > >> Hi, Peng,
> > >>
> > >> Do you mean the MultilayerPerceptron? There are three 'train' method,
> > and
> > >> only one (the one without the parameters trackingKey and groupKey) is
> > >> implemented. In current implementation, they are not used.
> > >>
> > >> Regards,
> > >> Yexi
> > >>
> > >>
> > >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <te...@gmail.com>:
> > >>
> > >>  Generally for training models like this, there is an assumption that
> > >>> fault
> > >>> tolerance is not
>  particularly necessary because the low risk of failure
> > >>> trades against algorithmic speed.  For reasonably small chance of
> > >>> failure,
> > >>> simply re-running the training is just fine.  If there is high risk
> of
> > >>> failure, simply checkpointing the parameter server is sufficient to
> > allow
> > >>> restarts without redundancy.
> > >>>
> > >>> Sharding the parameter is quite possible and is reasonable when the
> > >>> parameter vector exceed 10's or 100's of millions of parameters, but
> > >>> isn't
> > >>> likely much necessary below that.
> > >>>
> > >>> The asymmetry is similarly not a big
>  deal.  The traffic to and from the
> >
>  >>> parameter server isn't enormous.
> > >>>
> > >>>
> > >>> Building something simple and working first is a good thing.
> > >>>
> > >>>
> > >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc...@uowmail.edu.au> wrote:
> > >>>
> > >>>  With pleasure! the original downpour paper propose a parameter
> server
> > >>>>
> > >>> from
> > >>>
> > >>>> which subnodes download shards of old model and upload gradients. So
> > if
> > >>>>
> > >>>
>  the
> >
>  >>>
> > >>>> parameter server is down, the process has to be delayed, it also
> > >>>> requires
> > >>>> that all model parameters to be stored and atomically updated on
> (and
> > >>>> fetched from) a single machine, imposing asymmetric HDD and
> bandwidth
> > >>>> requirement. This design is necessary only because each -=delta
> > >>>> operation
> > >>>> has to be atomic. Which cannot be ensured across network (e.g. on
> > HDFS).
> > >>>>
> > >>>> But it doesn't mean that the operation cannot be decentralized:
> > >>>>
> > >>> parameters
> > >>>
> > >>>> can be
>  sharded across multiple nodes and multiple accumulator
> > instances
> > >>>>
> > >>> can
> > >>>
> > >>>> handle parts of the vector subtraction. This should be easy if you
> > >>>>
> > >>> create a
> > >>>
> > >>>> buffer for the stream of gradient, and allocate proper numbers of
> > >>>>
> > >>> producers
> > >>>
> > >>>> and consumers on each machine to make sure it doesn't overflow.
> > >>>> Obviously
> > >>>> this is far from MR framework, but at least it can be made
> homogeneous
> > >>>>
> > >>>
>  and
> > >>>
> > >>>> slightly faster (because sparse data can be distributed in a way to
> > >>>> minimize their overlapping, so gradients doesn't have to go across
> the
> > >>>> network that frequent).
> > >>>>
> > >>>> If we instead using a centralized architect. Then there must be >=1
> > >>>>
> > >>> backup
> > >>>
> > >>>> parameter server for mission critical training.
> > >>>>
> > >>>> Yours Peng
> > >>>>
> > >>>> e.g. we can simply use a producer/consumer pattern
> > >>>>
> > >>>> If we use a
>  producer/consumer pattern for all gradients,
> > >>>>
> > >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> > >>>>
> > >>>>  Peng,
> > >>>>>
> > >>>>> Can you provide more details about your thought?
> > >>>>>
> > >>>>> Regards,
> > >>>>>
> > >>>>>
> > >>>>> 2014-02-27 16:00 GMT-05:00 peng <pc...@uowmail.edu.au>:
> > >>>>>
> > >>>>>   That should be easy. But that defeats the purpose of using mahout
> > as
> > >>>>>
> > >>>>>> there
> > >>>>>> are already enough implementations of single node backpropagation
> > (in
> > >>>>>> which
> > >>>>>> case GPU is much faster).
> > >>>>>>
> > >>>>>> Yexi:
> > >>>>>>
> > >>>>>> Regarding downpour SGD and sandblaster, may I suggest that the
> > >>>>>> implementation better has no parameter server? It's obviously a
> > single
> > >>>>>> point of failure and in terms of bandwidth, a bottleneck. I heard
> > that
> > >>>>>> MLlib on top of
>  Spark has a functional
>  implementation (never read or
> > >>>>>>
> > >>>>> test
> > >>>
> > >>>> it), and its possible to build the workflow on top of YARN. Non of
> > >>>>>>
> > >>>>> those
> > >>>
> > >>>> framework has an heterogeneous topology.
> > >>>>>>
> > >>>>>> Yours Peng
> > >>>>>>
> > >>>>>>
> > >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>         [ https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> > >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> > >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> > >>>>>>>
> > >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
> > >>>>>>> ---------------------------------------------------------------
> > >>>>>>>
> > >>>>>>> I've read the papers. I didn't think about distributed network. I
> > had
> > >>>>>>>
> > >>>>>> in
> >
>  >>>
> > >>>> mind network that will fit into memory, but will require significant
> > >>>>>>> amount
> > >>>>>>> of computations.
> > >>>>>>>
> > >>>>>>> I understand that there are better options for neural networks
> than
> > >>>>>>>
> > >>>>>> map
> > >>>
> > >>>> reduce.
> > >>>>>>> How about non-map-reduce version?
> > >>>>>>> I see that you think it is something that would make a sense.
> > (Doing
> > >>>>>>> a
> > >>>>>>> non-map-reduce neural network in Mahout would be
>  of
>  substantial
> > >>>>>>> interest.)
> > >>>>>>> Do you think it will be a valueable contribution?
> > >>>>>>> Is there a need for this type of algorithm?
> > >>>>>>> I think about multi-threded batch gradient descent with
> pretraining
> > >>>>>>>
> > >>>>>> (RBM
> > >>>
> > >>>> or/and Autoencoders).
> > >>>>>>>
> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> > >>>>>>> "I would rather like to withdraw that patch, because by the time
> i
> > >>>>>>> implemented it i didn't know that the learning algorithm is
>  not
> > >>>>>>> suited
> > >>>>>>> for
> > >>>>>>> MR, so I think there is no point including the patch."
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> was (Author: maciejmazur):
> > >>>>>>> I've read the papers. I didn't think about distributed network. I
> > had
> > >>>>>>>
> > >>>>>> in
> > >>>
> > >>>> mind network that will fit into memory, but will require significant
> > >>>>>>> amount
> > >>>>>>> of computations.
> > >>>>>>>
> > >>>>>>> I understand that there are better options for neural networks
> than
> > >>>>>>>
> > >>>>>> map
> > >>>
> > >>>> reduce.
> > >>>>>>> How about non-map-reduce version?
> > >>>>>>> I see that you think it is something that would make a sense.
> > >>>>>>> Do you think it will be a valueable contribution?
> > >>>>>>> Is there a need for this type of algorithm?
> > >>>>>>> I think about multi-threded batch gradient descent with
> pretraining
> > >>>>>>>
> > >>>>>> (RBM
> > >>>
> >
>  >>>> or/and Autoencoders).
> > >>>>>>>
> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> > >>>>>>> "I would rather like to withdraw that patch, because by the time
> i
> > >>>>>>> implemented it i didn't know that the learning algorithm is not
> > >>>>>>> suited
> > >>>>>>> for
> > >>>>>>> MR, so I think there is no point including the patch."
> > >>>>>>>
> > >>>>>>>    GSOC 2013 Neural network algorithms
> > >>>>>>>
> > >>>>>>>  -----------------------------------
> >
>  >>>>>>>>
> > >>>>>>>>                    Key: MAHOUT-1426
> > >>>>>>>>                    URL: https://issues.apache.org/
> > >>>>>>>> jira/browse/MAHOUT-1426
> > >>>>>>>>                Project: Mahout
> > >>>>>>>>             Issue Type: Improvement
> > >>>>>>>>             Components: Classification
> > >>>>>>>>
>  Reporter: Maciej
>  Mazur
> > >>>>>>>>
> > >>>>>>>> I would like to ask about possibilites of implementing neural
> > >>>>>>>> network
> > >>>>>>>> algorithms in mahout during GSOC.
> > >>>>>>>> There is a classifier.mlp package with neural network.
> > >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> > >>>>>>>> There is only one word about Autoencoders in NeuralNetwork
> class.
> > >>>>>>>> As far as I know Mahout doesn't support convolutional networks.
> > >>>>>>>> Is it a good idea to implement one of these algorithms?
> > >>>>>>>> Is it a
>  reasonable amount of work?
> > >>>>>>>> How hard is it to get GSOC in Mahout?
> > >>>>>>>> Did anyone succeed last year?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> This message was sent by Atlassian JIRA
> > >>>>>>> (v6.1.5#6160)
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>
> >
>  >>
> > >>
> > >>
> >
>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Suneel Marthi <su...@yahoo.com>.
I would suggest looking at deeplearning4j.org (they went public very recently) and see how they had utilized Iterative Reduce for implementing Neural Nets.

Not sure given the present state of flux on the project if we should even be considering adding any new algorithms. The existing ones can be refactored to be more API driven (for both clustering and classification) and that's no trivial effort and could definitely use lot of help.

How is what u r proposing gonna be any better than similar existing implementations that Mahout 
already has both in terms of functionality and performance, scaling ?  Are there users who 
would prefer whatever u r proposing as opposed to using what already exists in Mahout?  

We did purge a lot of the unmaintained and non-functional code for the 0.9 release and are down to where we r today. There's still room for improvement in what presently exists and the project could definitely use some help there.  

With the emphasis now on supporting Spark ASAP, any new implementations would not make the task any easier.  There's still stuff in Mahout Math that can be redone to be more flexible like the present Named Vector (See Mahout-1236). That's a very high priority for the next release, and is gonna impact existing implementations once finalized. The present codebase is very heavily dependent on M/R, decoupling the relevant pieces from MR api and being able to offer a potential Mahout user the choice of different execution engines (Spark or MR) is no trivial task.

IMO, the emphasis should now be more on stabilizing, refactoring and cleaning up the existing implementations (which is technical debt that's building up) and porting stuff to Spark. 





On Sunday, March 16, 2014 4:39 PM, Ted Dunning <te...@gmail.com> wrote:
 
OK.

I am confused now as well.

Even so, I would recommend that you propose a non-map-reduce but still
parallel version.

Some of the confusion may stem from the fact that you can design some
non-map-reduce programs to run in such a way that a map-reduce execution
framework like Hadoop thinks that they are doing map-reduce.  Instead,
these programs are doing whatever they feel like and just pretending to be
map-reduce programs in order to get a bunch of processes launched.




On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <ma...@gmail.com>wrote:

> I have
 one final question.
>
> I've mixed feelings about this discussion.
> You are saying that there is no point in doing mapreduce implementation of
> neural netoworks (with pretraining).
> Then you are thinking that non map reduce would of substatial interest.
> On the other hand you say that it would be easy and it beats the purpose of
> doing it of doing it on mahout (because it is not a mr version).
> Finally you are saying that building something simple and working is a good
> thing.
>
> I do not really know what to think about it.
> Could you give me some advice whether I should write a proposal or not?
> (And if I should: Should I propose MapReduce or not MapReduce verison?
> There is
 already NN algorithm but
 without pretraining.)
>
> Thanks,
> Maciej Mazur
>
>
>
>
>
> On Fri, Feb 28, 2014 at 5:44 AM, peng <pc...@uowmail.edu.au> wrote:
>
> > Oh, thanks a lot, I missed that one :)
> > +1 on easiest one implemented first. I haven't think about difficulty
> > issue, need  to read more about YARN extension.
> >
> > Yours Peng
> >
> >
> > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
> >
> >> Hi, Peng,
> >>
> >> Do you mean the MultilayerPerceptron? There are three 'train' method,
> and
> >> only one (the one without the parameters trackingKey and groupKey) is
> >> implemented. In current implementation, they are not used.
> >>
> >> Regards,
> >> Yexi
> >>
> >>
> >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <te...@gmail.com>:
> >>
> >>  Generally for training models like this, there is an assumption that
> >>> fault
> >>> tolerance is not
 particularly necessary because the low risk of failure
> >>> trades against algorithmic speed.  For reasonably small chance of
> >>> failure,
> >>> simply re-running the training is just fine.  If there is high risk of
> >>> failure, simply checkpointing the parameter server is sufficient to
> allow
> >>> restarts without redundancy.
> >>>
> >>> Sharding the parameter is quite possible and is reasonable when the
> >>> parameter vector exceed 10's or 100's of millions of parameters, but
> >>> isn't
> >>> likely much necessary below that.
> >>>
> >>> The asymmetry is similarly not a big
 deal.  The traffic to and from the
>
 >>> parameter server isn't enormous.
> >>>
> >>>
> >>> Building something simple and working first is a good thing.
> >>>
> >>>
> >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc...@uowmail.edu.au> wrote:
> >>>
> >>>  With pleasure! the original downpour paper propose a parameter server
> >>>>
> >>> from
> >>>
> >>>> which subnodes download shards of old model and upload gradients. So
> if
> >>>>
> >>>
 the
>
 >>>
> >>>> parameter server is down, the process has to be delayed, it also
> >>>> requires
> >>>> that all model parameters to be stored and atomically updated on (and
> >>>> fetched from) a single machine, imposing asymmetric HDD and bandwidth
> >>>> requirement. This design is necessary only because each -=delta
> >>>> operation
> >>>> has to be atomic. Which cannot be ensured across network (e.g. on
> HDFS).
> >>>>
> >>>> But it doesn't mean that the operation cannot be decentralized:
> >>>>
> >>> parameters
> >>>
> >>>> can be
 sharded across multiple nodes and multiple accumulator
> instances
> >>>>
> >>> can
> >>>
> >>>> handle parts of the vector subtraction. This should be easy if you
> >>>>
> >>> create a
> >>>
> >>>> buffer for the stream of gradient, and allocate proper numbers of
> >>>>
> >>> producers
> >>>
> >>>> and consumers on each machine to make sure it doesn't overflow.
> >>>> Obviously
> >>>> this is far from MR framework, but at least it can be made homogeneous
> >>>>
> >>>
 and
> >>>
> >>>> slightly faster (because sparse data can be distributed in a way to
> >>>> minimize their overlapping, so gradients doesn't have to go across the
> >>>> network that frequent).
> >>>>
> >>>> If we instead using a centralized architect. Then there must be >=1
> >>>>
> >>> backup
> >>>
> >>>> parameter server for mission critical training.
> >>>>
> >>>> Yours Peng
> >>>>
> >>>> e.g. we can simply use a producer/consumer pattern
> >>>>
> >>>> If we use a
 producer/consumer pattern for all gradients,
> >>>>
> >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> >>>>
> >>>>  Peng,
> >>>>>
> >>>>> Can you provide more details about your thought?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>>
> >>>>> 2014-02-27 16:00 GMT-05:00 peng <pc...@uowmail.edu.au>:
> >>>>>
> >>>>>   That should be easy. But that defeats the purpose of using mahout
> as
> >>>>>
> >>>>>> there
> >>>>>> are already enough implementations of single node backpropagation
> (in
> >>>>>> which
> >>>>>> case GPU is much faster).
> >>>>>>
> >>>>>> Yexi:
> >>>>>>
> >>>>>> Regarding downpour SGD and sandblaster, may I suggest that the
> >>>>>> implementation better has no parameter server? It's obviously a
> single
> >>>>>> point of failure and in terms of bandwidth, a bottleneck. I heard
> that
> >>>>>> MLlib on top of
 Spark has a functional
 implementation (never read or
> >>>>>>
> >>>>> test
> >>>
> >>>> it), and its possible to build the workflow on top of YARN. Non of
> >>>>>>
> >>>>> those
> >>>
> >>>> framework has an heterogeneous topology.
> >>>>>>
> >>>>>> Yours Peng
> >>>>>>
> >>>>>>
> >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
> >>>>>>
> >>>>>>
> >>>>>>         [ https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> >>>>>>>
> >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
> >>>>>>> ---------------------------------------------------------------
> >>>>>>>
> >>>>>>> I've read the papers. I didn't think about distributed network. I
> had
> >>>>>>>
> >>>>>> in
>
 >>>
> >>>> mind network that will fit into memory, but will require significant
> >>>>>>> amount
> >>>>>>> of computations.
> >>>>>>>
> >>>>>>> I understand that there are better options for neural networks than
> >>>>>>>
> >>>>>> map
> >>>
> >>>> reduce.
> >>>>>>> How about non-map-reduce version?
> >>>>>>> I see that you think it is something that would make a sense.
> (Doing
> >>>>>>> a
> >>>>>>> non-map-reduce neural network in Mahout would be
 of
 substantial
> >>>>>>> interest.)
> >>>>>>> Do you think it will be a valueable contribution?
> >>>>>>> Is there a need for this type of algorithm?
> >>>>>>> I think about multi-threded batch gradient descent with pretraining
> >>>>>>>
> >>>>>> (RBM
> >>>
> >>>> or/and Autoencoders).
> >>>>>>>
> >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>>>>> "I would rather like to withdraw that patch, because by the time i
> >>>>>>> implemented it i didn't know that the learning algorithm is
 not
> >>>>>>> suited
> >>>>>>> for
> >>>>>>> MR, so I think there is no point including the patch."
> >>>>>>>
> >>>>>>>
> >>>>>>> was (Author: maciejmazur):
> >>>>>>> I've read the papers. I didn't think about distributed network. I
> had
> >>>>>>>
> >>>>>> in
> >>>
> >>>> mind network that will fit into memory, but will require significant
> >>>>>>> amount
> >>>>>>> of computations.
> >>>>>>>
> >>>>>>> I understand that there are better options for neural networks than
> >>>>>>>
> >>>>>> map
> >>>
> >>>> reduce.
> >>>>>>> How about non-map-reduce version?
> >>>>>>> I see that you think it is something that would make a sense.
> >>>>>>> Do you think it will be a valueable contribution?
> >>>>>>> Is there a need for this type of algorithm?
> >>>>>>> I think about multi-threded batch gradient descent with pretraining
> >>>>>>>
> >>>>>> (RBM
> >>>
>
 >>>> or/and Autoencoders).
> >>>>>>>
> >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>>>>> "I would rather like to withdraw that patch, because by the time i
> >>>>>>> implemented it i didn't know that the learning algorithm is not
> >>>>>>> suited
> >>>>>>> for
> >>>>>>> MR, so I think there is no point including the patch."
> >>>>>>>
> >>>>>>>    GSOC 2013 Neural network algorithms
> >>>>>>>
> >>>>>>>  -----------------------------------
>
 >>>>>>>>
> >>>>>>>>                    Key: MAHOUT-1426
> >>>>>>>>                    URL: https://issues.apache.org/
> >>>>>>>> jira/browse/MAHOUT-1426
> >>>>>>>>                Project: Mahout
> >>>>>>>>             Issue Type: Improvement
> >>>>>>>>             Components: Classification
> >>>>>>>>              
 Reporter: Maciej
 Mazur
> >>>>>>>>
> >>>>>>>> I would like to ask about possibilites of implementing neural
> >>>>>>>> network
> >>>>>>>> algorithms in mahout during GSOC.
> >>>>>>>> There is a classifier.mlp package with neural network.
> >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> >>>>>>>> There is only one word about Autoencoders in NeuralNetwork class.
> >>>>>>>> As far as I know Mahout doesn't support convolutional networks.
> >>>>>>>> Is it a good idea to implement one of these algorithms?
> >>>>>>>> Is it a
 reasonable amount of work?
> >>>>>>>> How hard is it to get GSOC in Mahout?
> >>>>>>>> Did anyone succeed last year?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> This message was sent by Atlassian JIRA
> >>>>>>> (v6.1.5#6160)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
>
 >>
> >>
> >>
>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Posted by Ted Dunning <te...@gmail.com>.
OK.

I am confused now as well.

Even so, I would recommend that you propose a non-map-reduce but still
parallel version.

Some of the confusion may stem from the fact that you can design some
non-map-reduce programs to run in such a way that a map-reduce execution
framework like Hadoop thinks that they are doing map-reduce.  Instead,
these programs are doing whatever they feel like and just pretending to be
map-reduce programs in order to get a bunch of processes launched.



On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <ma...@gmail.com>wrote:

> I have one final question.
>
> I've mixed feelings about this discussion.
> You are saying that there is no point in doing mapreduce implementation of
> neural netoworks (with pretraining).
> Then you are thinking that non map reduce would of substatial interest.
> On the other hand you say that it would be easy and it beats the purpose of
> doing it of doing it on mahout (because it is not a mr version).
> Finally you are saying that building something simple and working is a good
> thing.
>
> I do not really know what to think about it.
> Could you give me some advice whether I should write a proposal or not?
> (And if I should: Should I propose MapReduce or not MapReduce verison?
> There is already NN algorithm but without pretraining.)
>
> Thanks,
> Maciej Mazur
>
>
>
>
>
> On Fri, Feb 28, 2014 at 5:44 AM, peng <pc...@uowmail.edu.au> wrote:
>
> > Oh, thanks a lot, I missed that one :)
> > +1 on easiest one implemented first. I haven't think about difficulty
> > issue, need  to read more about YARN extension.
> >
> > Yours Peng
> >
> >
> > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
> >
> >> Hi, Peng,
> >>
> >> Do you mean the MultilayerPerceptron? There are three 'train' method,
> and
> >> only one (the one without the parameters trackingKey and groupKey) is
> >> implemented. In current implementation, they are not used.
> >>
> >> Regards,
> >> Yexi
> >>
> >>
> >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <te...@gmail.com>:
> >>
> >>  Generally for training models like this, there is an assumption that
> >>> fault
> >>> tolerance is not particularly necessary because the low risk of failure
> >>> trades against algorithmic speed.  For reasonably small chance of
> >>> failure,
> >>> simply re-running the training is just fine.  If there is high risk of
> >>> failure, simply checkpointing the parameter server is sufficient to
> allow
> >>> restarts without redundancy.
> >>>
> >>> Sharding the parameter is quite possible and is reasonable when the
> >>> parameter vector exceed 10's or 100's of millions of parameters, but
> >>> isn't
> >>> likely much necessary below that.
> >>>
> >>> The asymmetry is similarly not a big deal.  The traffic to and from the
> >>> parameter server isn't enormous.
> >>>
> >>>
> >>> Building something simple and working first is a good thing.
> >>>
> >>>
> >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc...@uowmail.edu.au> wrote:
> >>>
> >>>  With pleasure! the original downpour paper propose a parameter server
> >>>>
> >>> from
> >>>
> >>>> which subnodes download shards of old model and upload gradients. So
> if
> >>>>
> >>> the
> >>>
> >>>> parameter server is down, the process has to be delayed, it also
> >>>> requires
> >>>> that all model parameters to be stored and atomically updated on (and
> >>>> fetched from) a single machine, imposing asymmetric HDD and bandwidth
> >>>> requirement. This design is necessary only because each -=delta
> >>>> operation
> >>>> has to be atomic. Which cannot be ensured across network (e.g. on
> HDFS).
> >>>>
> >>>> But it doesn't mean that the operation cannot be decentralized:
> >>>>
> >>> parameters
> >>>
> >>>> can be sharded across multiple nodes and multiple accumulator
> instances
> >>>>
> >>> can
> >>>
> >>>> handle parts of the vector subtraction. This should be easy if you
> >>>>
> >>> create a
> >>>
> >>>> buffer for the stream of gradient, and allocate proper numbers of
> >>>>
> >>> producers
> >>>
> >>>> and consumers on each machine to make sure it doesn't overflow.
> >>>> Obviously
> >>>> this is far from MR framework, but at least it can be made homogeneous
> >>>>
> >>> and
> >>>
> >>>> slightly faster (because sparse data can be distributed in a way to
> >>>> minimize their overlapping, so gradients doesn't have to go across the
> >>>> network that frequent).
> >>>>
> >>>> If we instead using a centralized architect. Then there must be >=1
> >>>>
> >>> backup
> >>>
> >>>> parameter server for mission critical training.
> >>>>
> >>>> Yours Peng
> >>>>
> >>>> e.g. we can simply use a producer/consumer pattern
> >>>>
> >>>> If we use a producer/consumer pattern for all gradients,
> >>>>
> >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> >>>>
> >>>>  Peng,
> >>>>>
> >>>>> Can you provide more details about your thought?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>>
> >>>>> 2014-02-27 16:00 GMT-05:00 peng <pc...@uowmail.edu.au>:
> >>>>>
> >>>>>   That should be easy. But that defeats the purpose of using mahout
> as
> >>>>>
> >>>>>> there
> >>>>>> are already enough implementations of single node backpropagation
> (in
> >>>>>> which
> >>>>>> case GPU is much faster).
> >>>>>>
> >>>>>> Yexi:
> >>>>>>
> >>>>>> Regarding downpour SGD and sandblaster, may I suggest that the
> >>>>>> implementation better has no parameter server? It's obviously a
> single
> >>>>>> point of failure and in terms of bandwidth, a bottleneck. I heard
> that
> >>>>>> MLlib on top of Spark has a functional implementation (never read or
> >>>>>>
> >>>>> test
> >>>
> >>>> it), and its possible to build the workflow on top of YARN. Non of
> >>>>>>
> >>>>> those
> >>>
> >>>> framework has an heterogeneous topology.
> >>>>>>
> >>>>>> Yours Peng
> >>>>>>
> >>>>>>
> >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
> >>>>>>
> >>>>>>
> >>>>>>         [ https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> >>>>>>>
> >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
> >>>>>>> ---------------------------------------------------------------
> >>>>>>>
> >>>>>>> I've read the papers. I didn't think about distributed network. I
> had
> >>>>>>>
> >>>>>> in
> >>>
> >>>> mind network that will fit into memory, but will require significant
> >>>>>>> amount
> >>>>>>> of computations.
> >>>>>>>
> >>>>>>> I understand that there are better options for neural networks than
> >>>>>>>
> >>>>>> map
> >>>
> >>>> reduce.
> >>>>>>> How about non-map-reduce version?
> >>>>>>> I see that you think it is something that would make a sense.
> (Doing
> >>>>>>> a
> >>>>>>> non-map-reduce neural network in Mahout would be of substantial
> >>>>>>> interest.)
> >>>>>>> Do you think it will be a valueable contribution?
> >>>>>>> Is there a need for this type of algorithm?
> >>>>>>> I think about multi-threded batch gradient descent with pretraining
> >>>>>>>
> >>>>>> (RBM
> >>>
> >>>> or/and Autoencoders).
> >>>>>>>
> >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>>>>> "I would rather like to withdraw that patch, because by the time i
> >>>>>>> implemented it i didn't know that the learning algorithm is not
> >>>>>>> suited
> >>>>>>> for
> >>>>>>> MR, so I think there is no point including the patch."
> >>>>>>>
> >>>>>>>
> >>>>>>> was (Author: maciejmazur):
> >>>>>>> I've read the papers. I didn't think about distributed network. I
> had
> >>>>>>>
> >>>>>> in
> >>>
> >>>> mind network that will fit into memory, but will require significant
> >>>>>>> amount
> >>>>>>> of computations.
> >>>>>>>
> >>>>>>> I understand that there are better options for neural networks than
> >>>>>>>
> >>>>>> map
> >>>
> >>>> reduce.
> >>>>>>> How about non-map-reduce version?
> >>>>>>> I see that you think it is something that would make a sense.
> >>>>>>> Do you think it will be a valueable contribution?
> >>>>>>> Is there a need for this type of algorithm?
> >>>>>>> I think about multi-threded batch gradient descent with pretraining
> >>>>>>>
> >>>>>> (RBM
> >>>
> >>>> or/and Autoencoders).
> >>>>>>>
> >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>>>>> "I would rather like to withdraw that patch, because by the time i
> >>>>>>> implemented it i didn't know that the learning algorithm is not
> >>>>>>> suited
> >>>>>>> for
> >>>>>>> MR, so I think there is no point including the patch."
> >>>>>>>
> >>>>>>>    GSOC 2013 Neural network algorithms
> >>>>>>>
> >>>>>>>  -----------------------------------
> >>>>>>>>
> >>>>>>>>                    Key: MAHOUT-1426
> >>>>>>>>                    URL: https://issues.apache.org/
> >>>>>>>> jira/browse/MAHOUT-1426
> >>>>>>>>                Project: Mahout
> >>>>>>>>             Issue Type: Improvement
> >>>>>>>>             Components: Classification
> >>>>>>>>               Reporter: Maciej Mazur
> >>>>>>>>
> >>>>>>>> I would like to ask about possibilites of implementing neural
> >>>>>>>> network
> >>>>>>>> algorithms in mahout during GSOC.
> >>>>>>>> There is a classifier.mlp package with neural network.
> >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> >>>>>>>> There is only one word about Autoencoders in NeuralNetwork class.
> >>>>>>>> As far as I know Mahout doesn't support convolutional networks.
> >>>>>>>> Is it a good idea to implement one of these algorithms?
> >>>>>>>> Is it a reasonable amount of work?
> >>>>>>>> How hard is it to get GSOC in Mahout?
> >>>>>>>> Did anyone succeed last year?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> This message was sent by Atlassian JIRA
> >>>>>>> (v6.1.5#6160)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >>
>