You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Bert Greevenbosch <Be...@huawei.com> on 2014/06/27 02:46:29 UTC

Artificial Neural Network in Spark?

Hello all,

I was wondering whether Spark/mllib supports Artificial Neural Networks (ANNs)?

If not, I am currently working on an implementation of it. I re-use the code for linear regression and gradient descent as much as possible.

Would the community be interested in such implementation? Or maybe somebody is already working on it?

Best regards,
Bert

RE: Artificial Neural Network in Spark?

Posted by Bert Greevenbosch <Be...@huawei.com>.
Hi Debasish, all,

Thanks for your feedback. I have submitted the code to GitHub and created a Jira ticket (links below).

The ANN uses back-propagation with the Steepest Gradient Descent (SGD) method.

Best regards,
Bert

https://github.com/apache/spark/pull/1290
https://issues.apache.org/jira/browse/SPARK-2352


 
> -----Original Message-----
> From: Debasish Das [mailto:debasish.das83@gmail.com]
> Sent: 01 July 2014 12:21
> To: dev@spark.apache.org
> Subject: Re: Artificial Neural Network in Spark?
> 
> I will let Xiangrui to comment on the PR process to add the code in
> mllib
> but I would love to look into your initial version if you push it to
> github...
> 
> As far as I remember Quoc got his best ANN results using back-
> propagation
> algorithm and solved using CG...do you have those features or you are
> using
> SGD style update....
> 
> 
> 
> On Mon, Jun 30, 2014 at 8:13 PM, Bert Greevenbosch <
> Bert.Greevenbosch@huawei.com> wrote:
> 
> > Hi Debasish, Alexander, all,
> >
> > Indeed I found the OpenDL project through the Powered by Spark page.
> I'll
> > need some time to look into the code, but on the first sight it looks
> quite
> > well-developed. I'll contact the author about this too.
> >
> > My own implementation (in Scala) works for multiple inputs and
> multiple
> > outputs. It implements a single hidden layer, the number of nodes in
> it can
> > be specified.
> >
> > The implementation is a general ANN implementation. As such, it
> should be
> > useable for an autoencoder too, since that is just an ANN with some
> special
> > input/output constraints.
> >
> > As said before, the implementation is built upon the linear
> regression
> > model and gradient descent implementation. However it did require
> some
> > tweaks:
> >
> > - The linear regression model only supports a single output "label"
> (as
> > Double). Since the ANN can have multiple outputs, it ignores the
> "label"
> > attribute, but for training divides the input vector into two parts,
> the
> > first part being the genuine input vector, the second the target
> output
> > vector.
> >
> > - The concatenation of input and target output vectors is only
> internally,
> > the training function takes as input an RDD with tuples of two
> Vectors, one
> > for each input and output.
> >
> > - The GradientDescend optimizer is re-used without modification.
> >
> > - I have made an even simpler updater than the SimpleUpdater, leaving
> out
> > the division by the square root of the number of iterations. The
> > SimpleUpdater can also be used, but I created this simpler one
> because I
> > like to plot the result every now and then, and then continue the
> > calculations. For this, I also wrote a training function with as
> input the
> > weights from the previous training session.
> >
> > - I created a ParallelANNModel similar to the LinearRegressionModel.
> >
> > - I created a new GeneralizedSteepestDescendAlgorithm class similar
> to the
> > GeneralizedLinearAlgorithm class.
> >
> > - Created some example code to test with 2D (1 input 1 output), 3D (2
> > inputs 1 output) and 4D (1 input 3 outputs) functions.
> >
> > If there is interest, I would be happy to release the code. What
> would be
> > the best way to do this? Is there some kind of review process?
> >
> > Best regards,
> > Bert
> >
> >
> > > -----Original Message-----
> > > From: Debasish Das [mailto:debasish.das83@gmail.com]
> > > Sent: 27 June 2014 14:02
> > > To: dev@spark.apache.org
> > > Subject: Re: Artificial Neural Network in Spark?
> > >
> > > Look into Powered by Spark page...I found a project there which
> used
> > > autoencoder functions...It's not updated for a long time now !
> > >
> > > On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander
> > > <alexander.ulanov@hp.com
> > > > wrote:
> > >
> > > > Hi Bert,
> > > >
> > > > It would be extremely interesting. Do you plan to implement
> > > autoencoder as
> > > > well? It would be great to have deep learning in Spark.
> > > >
> > > > Best regards, Alexander
> > > >
> > > > 27.06.2014, в 4:47, "Bert Greevenbosch"
> <Be...@huawei.com>
> > > > написал(а):
> > > >
> > > > > Hello all,
> > > > >
> > > > > I was wondering whether Spark/mllib supports Artificial Neural
> > > Networks
> > > > (ANNs)?
> > > > >
> > > > > If not, I am currently working on an implementation of it. I
> re-use
> > > the
> > > > code for linear regression and gradient descent as much as
> possible.
> > > > >
> > > > > Would the community be interested in such implementation? Or
> maybe
> > > > somebody is already working on it?
> > > > >
> > > > > Best regards,
> > > > > Bert
> > > >
> >

Re: Artificial Neural Network in Spark?

Posted by Debasish Das <de...@gmail.com>.
I will let Xiangrui to comment on the PR process to add the code in mllib
but I would love to look into your initial version if you push it to
github...

As far as I remember Quoc got his best ANN results using back-propagation
algorithm and solved using CG...do you have those features or you are using
SGD style update....



On Mon, Jun 30, 2014 at 8:13 PM, Bert Greevenbosch <
Bert.Greevenbosch@huawei.com> wrote:

> Hi Debasish, Alexander, all,
>
> Indeed I found the OpenDL project through the Powered by Spark page. I'll
> need some time to look into the code, but on the first sight it looks quite
> well-developed. I'll contact the author about this too.
>
> My own implementation (in Scala) works for multiple inputs and multiple
> outputs. It implements a single hidden layer, the number of nodes in it can
> be specified.
>
> The implementation is a general ANN implementation. As such, it should be
> useable for an autoencoder too, since that is just an ANN with some special
> input/output constraints.
>
> As said before, the implementation is built upon the linear regression
> model and gradient descent implementation. However it did require some
> tweaks:
>
> - The linear regression model only supports a single output "label" (as
> Double). Since the ANN can have multiple outputs, it ignores the "label"
> attribute, but for training divides the input vector into two parts, the
> first part being the genuine input vector, the second the target output
> vector.
>
> - The concatenation of input and target output vectors is only internally,
> the training function takes as input an RDD with tuples of two Vectors, one
> for each input and output.
>
> - The GradientDescend optimizer is re-used without modification.
>
> - I have made an even simpler updater than the SimpleUpdater, leaving out
> the division by the square root of the number of iterations. The
> SimpleUpdater can also be used, but I created this simpler one because I
> like to plot the result every now and then, and then continue the
> calculations. For this, I also wrote a training function with as input the
> weights from the previous training session.
>
> - I created a ParallelANNModel similar to the LinearRegressionModel.
>
> - I created a new GeneralizedSteepestDescendAlgorithm class similar to the
> GeneralizedLinearAlgorithm class.
>
> - Created some example code to test with 2D (1 input 1 output), 3D (2
> inputs 1 output) and 4D (1 input 3 outputs) functions.
>
> If there is interest, I would be happy to release the code. What would be
> the best way to do this? Is there some kind of review process?
>
> Best regards,
> Bert
>
>
> > -----Original Message-----
> > From: Debasish Das [mailto:debasish.das83@gmail.com]
> > Sent: 27 June 2014 14:02
> > To: dev@spark.apache.org
> > Subject: Re: Artificial Neural Network in Spark?
> >
> > Look into Powered by Spark page...I found a project there which used
> > autoencoder functions...It's not updated for a long time now !
> >
> > On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander
> > <alexander.ulanov@hp.com
> > > wrote:
> >
> > > Hi Bert,
> > >
> > > It would be extremely interesting. Do you plan to implement
> > autoencoder as
> > > well? It would be great to have deep learning in Spark.
> > >
> > > Best regards, Alexander
> > >
> > > 27.06.2014, в 4:47, "Bert Greevenbosch" <Be...@huawei.com>
> > > написал(а):
> > >
> > > > Hello all,
> > > >
> > > > I was wondering whether Spark/mllib supports Artificial Neural
> > Networks
> > > (ANNs)?
> > > >
> > > > If not, I am currently working on an implementation of it. I re-use
> > the
> > > code for linear regression and gradient descent as much as possible.
> > > >
> > > > Would the community be interested in such implementation? Or maybe
> > > somebody is already working on it?
> > > >
> > > > Best regards,
> > > > Bert
> > >
>

RE: Artificial Neural Network in Spark?

Posted by Bert Greevenbosch <Be...@huawei.com>.
Hi Alexander, all,

I now have uploaded the code (see links below), and look forward to learn about the outcome of your experiments!
 
Best regards,
Bert

---
https://github.com/apache/spark/pull/1290
https://issues.apache.org/jira/browse/SPARK-2352


> -----Original Message-----
> From: Ulanov, Alexander [mailto:alexander.ulanov@hp.com]
> Sent: 01 July 2014 18:17
> To: dev@spark.apache.org
> Subject: RE: Artificial Neural Network in Spark?
> 
> Hi Bert,
> 
> There is a specific process of pull request if you wish to share the
> code
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> 
> I would be glad to benchmark your ANN implementation by means of
> running some experiments that we run with the other ANN toolkits. I am
> also interested in Autoencoder and have plans to implement it for MLLib
> in the near future.
> 
> Best regards, Alexander
> 
> -----Original Message-----
> From: Bert Greevenbosch [mailto:Bert.Greevenbosch@huawei.com]
> Sent: Tuesday, July 01, 2014 7:14 AM
> To: dev@spark.apache.org
> Subject: RE: Artificial Neural Network in Spark?
> 
> Hi Debasish, Alexander, all,
> 
> Indeed I found the OpenDL project through the Powered by Spark page.
> I'll need some time to look into the code, but on the first sight it
> looks quite well-developed. I'll contact the author about this too.
> 
> My own implementation (in Scala) works for multiple inputs and multiple
> outputs. It implements a single hidden layer, the number of nodes in it
> can be specified.
> 
> The implementation is a general ANN implementation. As such, it should
> be useable for an autoencoder too, since that is just an ANN with some
> special input/output constraints.
> 
> As said before, the implementation is built upon the linear regression
> model and gradient descent implementation. However it did require some
> tweaks:
> 
> - The linear regression model only supports a single output "label" (as
> Double). Since the ANN can have multiple outputs, it ignores the
> "label" attribute, but for training divides the input vector into two
> parts, the first part being the genuine input vector, the second the
> target output vector.
> 
> - The concatenation of input and target output vectors is only
> internally, the training function takes as input an RDD with tuples of
> two Vectors, one for each input and output.
> 
> - The GradientDescend optimizer is re-used without modification.
> 
> - I have made an even simpler updater than the SimpleUpdater, leaving
> out the division by the square root of the number of iterations. The
> SimpleUpdater can also be used, but I created this simpler one because
> I like to plot the result every now and then, and then continue the
> calculations. For this, I also wrote a training function with as input
> the weights from the previous training session.
> 
> - I created a ParallelANNModel similar to the LinearRegressionModel.
> 
> - I created a new GeneralizedSteepestDescendAlgorithm class similar to
> the GeneralizedLinearAlgorithm class.
> 
> - Created some example code to test with 2D (1 input 1 output), 3D (2
> inputs 1 output) and 4D (1 input 3 outputs) functions.
> 
> If there is interest, I would be happy to release the code. What would
> be the best way to do this? Is there some kind of review process?
> 
> Best regards,
> Bert
> 
> 
> > -----Original Message-----
> > From: Debasish Das [mailto:debasish.das83@gmail.com]
> > Sent: 27 June 2014 14:02
> > To: dev@spark.apache.org
> > Subject: Re: Artificial Neural Network in Spark?
> >
> > Look into Powered by Spark page...I found a project there which used
> > autoencoder functions...It's not updated for a long time now !
> >
> > On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander
> > <alexander.ulanov@hp.com
> > > wrote:
> >
> > > Hi Bert,
> > >
> > > It would be extremely interesting. Do you plan to implement
> > autoencoder as
> > > well? It would be great to have deep learning in Spark.
> > >
> > > Best regards, Alexander
> > >
> > > 27.06.2014, в 4:47, "Bert Greevenbosch"
> > > <Be...@huawei.com>
> > > написал(а):
> > >
> > > > Hello all,
> > > >
> > > > I was wondering whether Spark/mllib supports Artificial Neural
> > Networks
> > > (ANNs)?
> > > >
> > > > If not, I am currently working on an implementation of it. I
> > > > re-use
> > the
> > > code for linear regression and gradient descent as much as possible.
> > > >
> > > > Would the community be interested in such implementation? Or
> maybe
> > > somebody is already working on it?
> > > >
> > > > Best regards,
> > > > Bert
> > >

RE: Artificial Neural Network in Spark?

Posted by "Ulanov, Alexander" <al...@hp.com>.
Hi Bert,

There is a specific process of pull request if you wish to share the code https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

I would be glad to benchmark your ANN implementation by means of running some experiments that we run with the other ANN toolkits. I am also interested in Autoencoder and have plans to implement it for MLLib in the near future. 

Best regards, Alexander

-----Original Message-----
From: Bert Greevenbosch [mailto:Bert.Greevenbosch@huawei.com] 
Sent: Tuesday, July 01, 2014 7:14 AM
To: dev@spark.apache.org
Subject: RE: Artificial Neural Network in Spark?

Hi Debasish, Alexander, all,

Indeed I found the OpenDL project through the Powered by Spark page. I'll need some time to look into the code, but on the first sight it looks quite well-developed. I'll contact the author about this too.

My own implementation (in Scala) works for multiple inputs and multiple outputs. It implements a single hidden layer, the number of nodes in it can be specified.

The implementation is a general ANN implementation. As such, it should be useable for an autoencoder too, since that is just an ANN with some special input/output constraints.

As said before, the implementation is built upon the linear regression model and gradient descent implementation. However it did require some tweaks:

- The linear regression model only supports a single output "label" (as Double). Since the ANN can have multiple outputs, it ignores the "label" attribute, but for training divides the input vector into two parts, the first part being the genuine input vector, the second the target output vector.

- The concatenation of input and target output vectors is only internally, the training function takes as input an RDD with tuples of two Vectors, one for each input and output.

- The GradientDescend optimizer is re-used without modification.

- I have made an even simpler updater than the SimpleUpdater, leaving out the division by the square root of the number of iterations. The SimpleUpdater can also be used, but I created this simpler one because I like to plot the result every now and then, and then continue the calculations. For this, I also wrote a training function with as input the weights from the previous training session.

- I created a ParallelANNModel similar to the LinearRegressionModel.

- I created a new GeneralizedSteepestDescendAlgorithm class similar to the GeneralizedLinearAlgorithm class.

- Created some example code to test with 2D (1 input 1 output), 3D (2 inputs 1 output) and 4D (1 input 3 outputs) functions.

If there is interest, I would be happy to release the code. What would be the best way to do this? Is there some kind of review process?

Best regards,
Bert


> -----Original Message-----
> From: Debasish Das [mailto:debasish.das83@gmail.com]
> Sent: 27 June 2014 14:02
> To: dev@spark.apache.org
> Subject: Re: Artificial Neural Network in Spark?
> 
> Look into Powered by Spark page...I found a project there which used 
> autoencoder functions...It's not updated for a long time now !
> 
> On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander 
> <alexander.ulanov@hp.com
> > wrote:
> 
> > Hi Bert,
> >
> > It would be extremely interesting. Do you plan to implement
> autoencoder as
> > well? It would be great to have deep learning in Spark.
> >
> > Best regards, Alexander
> >
> > 27.06.2014, в 4:47, "Bert Greevenbosch" 
> > <Be...@huawei.com>
> > написал(а):
> >
> > > Hello all,
> > >
> > > I was wondering whether Spark/mllib supports Artificial Neural
> Networks
> > (ANNs)?
> > >
> > > If not, I am currently working on an implementation of it. I 
> > > re-use
> the
> > code for linear regression and gradient descent as much as possible.
> > >
> > > Would the community be interested in such implementation? Or maybe
> > somebody is already working on it?
> > >
> > > Best regards,
> > > Bert
> >

RE: Artificial Neural Network in Spark?

Posted by Bert Greevenbosch <Be...@huawei.com>.
Hi Debasish, Alexander, all,

Indeed I found the OpenDL project through the Powered by Spark page. I'll need some time to look into the code, but on the first sight it looks quite well-developed. I'll contact the author about this too.

My own implementation (in Scala) works for multiple inputs and multiple outputs. It implements a single hidden layer, the number of nodes in it can be specified.

The implementation is a general ANN implementation. As such, it should be useable for an autoencoder too, since that is just an ANN with some special input/output constraints.

As said before, the implementation is built upon the linear regression model and gradient descent implementation. However it did require some tweaks:

- The linear regression model only supports a single output "label" (as Double). Since the ANN can have multiple outputs, it ignores the "label" attribute, but for training divides the input vector into two parts, the first part being the genuine input vector, the second the target output vector.

- The concatenation of input and target output vectors is only internally, the training function takes as input an RDD with tuples of two Vectors, one for each input and output.

- The GradientDescend optimizer is re-used without modification.

- I have made an even simpler updater than the SimpleUpdater, leaving out the division by the square root of the number of iterations. The SimpleUpdater can also be used, but I created this simpler one because I like to plot the result every now and then, and then continue the calculations. For this, I also wrote a training function with as input the weights from the previous training session.

- I created a ParallelANNModel similar to the LinearRegressionModel.

- I created a new GeneralizedSteepestDescendAlgorithm class similar to the GeneralizedLinearAlgorithm class.

- Created some example code to test with 2D (1 input 1 output), 3D (2 inputs 1 output) and 4D (1 input 3 outputs) functions.

If there is interest, I would be happy to release the code. What would be the best way to do this? Is there some kind of review process?

Best regards,
Bert


> -----Original Message-----
> From: Debasish Das [mailto:debasish.das83@gmail.com]
> Sent: 27 June 2014 14:02
> To: dev@spark.apache.org
> Subject: Re: Artificial Neural Network in Spark?
> 
> Look into Powered by Spark page...I found a project there which used
> autoencoder functions...It's not updated for a long time now !
> 
> On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander
> <alexander.ulanov@hp.com
> > wrote:
> 
> > Hi Bert,
> >
> > It would be extremely interesting. Do you plan to implement
> autoencoder as
> > well? It would be great to have deep learning in Spark.
> >
> > Best regards, Alexander
> >
> > 27.06.2014, в 4:47, "Bert Greevenbosch" <Be...@huawei.com>
> > написал(а):
> >
> > > Hello all,
> > >
> > > I was wondering whether Spark/mllib supports Artificial Neural
> Networks
> > (ANNs)?
> > >
> > > If not, I am currently working on an implementation of it. I re-use
> the
> > code for linear regression and gradient descent as much as possible.
> > >
> > > Would the community be interested in such implementation? Or maybe
> > somebody is already working on it?
> > >
> > > Best regards,
> > > Bert
> >

Re: Artificial Neural Network in Spark?

Posted by Debasish Das <de...@gmail.com>.
Look into Powered by Spark page...I found a project there which used
autoencoder functions...It's not updated for a long time now !

On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander <alexander.ulanov@hp.com
> wrote:

> Hi Bert,
>
> It would be extremely interesting. Do you plan to implement autoencoder as
> well? It would be great to have deep learning in Spark.
>
> Best regards, Alexander
>
> 27.06.2014, в 4:47, "Bert Greevenbosch" <Be...@huawei.com>
> написал(а):
>
> > Hello all,
> >
> > I was wondering whether Spark/mllib supports Artificial Neural Networks
> (ANNs)?
> >
> > If not, I am currently working on an implementation of it. I re-use the
> code for linear regression and gradient descent as much as possible.
> >
> > Would the community be interested in such implementation? Or maybe
> somebody is already working on it?
> >
> > Best regards,
> > Bert
>

Re: Artificial Neural Network in Spark?

Posted by "Ulanov, Alexander" <al...@hp.com>.
Hi Bert,

It would be extremely interesting. Do you plan to implement autoencoder as well? It would be great to have deep learning in Spark.

Best regards, Alexander

27.06.2014, в 4:47, "Bert Greevenbosch" <Be...@huawei.com> написал(а):

> Hello all,
> 
> I was wondering whether Spark/mllib supports Artificial Neural Networks (ANNs)?
> 
> If not, I am currently working on an implementation of it. I re-use the code for linear regression and gradient descent as much as possible.
> 
> Would the community be interested in such implementation? Or maybe somebody is already working on it?
> 
> Best regards,
> Bert