You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by n1kt0 <ni...@googlemail.com> on 2017/02/23 12:23:01 UTC

Re: Implementation of RNN/LSTM in Spark

Hi,
can anyone tell me what the current status about RNNs in Spark is?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Implementation of RNN/LSTM in Spark

Posted by Michael Allman <mi...@videoamp.com>.
Hi Yuhao,

BigDL looks very promising and it's a framework we're considering using. It seems the general approach to high performance DL is via GPUs. Your project mentions performance on a Xeon comparable to that of a GPU, but where does this claim come from? Can you provide benchmarks?

Thanks,

Michael

> On Feb 27, 2017, at 11:11 PM, Yuhao Yang <hh...@gmail.com> wrote:
> 
> Welcome to try and contribute to our BigDL: https://github.com/intel-analytics/BigDL <https://github.com/intel-analytics/BigDL> 
> 
> It's native on Spark and fast by leveraging Intel MKL. 
> 
> 2017-02-23 4:51 GMT-08:00 Joeri Hermans <joeri.raymond.e.hermans@cern.ch <ma...@cern.ch>>:
> Hi Nikita,
> 
> We are actively working on this: https://github.com/cerndb/dist-keras <https://github.com/cerndb/dist-keras> This will allow you to run Keras on Spark (with distributed optimization algorithms) through pyspark. I recommend you to check the examples https://github.com/cerndb/dist-keras/tree/master/examples <https://github.com/cerndb/dist-keras/tree/master/examples>. However, you need to be aware that distributed optimization is a research topic, and has several approaches and caveats you need to be aware of. I wrote a blog post on this if you like to have some additional information on this topic https://db-blog.web.cern.ch/blog/joeri-hermans/2017-01-distributed-deep-learning-apache-spark-and-keras <https://db-blog.web.cern.ch/blog/joeri-hermans/2017-01-distributed-deep-learning-apache-spark-and-keras>
> 
> However, if you don't want to use a distributed optimization algorithm, we also support a "sequential trainer" which allows you to train a model on Spark dataframes.
> 
> Kind regards,
> 
> Joeri
> ________________________________________.
> From: Nick Pentreath [nick.pentreath@gmail.com <ma...@gmail.com>]
> Sent: 23 February 2017 13:39
> To: dev@spark.apache.org <ma...@spark.apache.org>
> Subject: Re: Implementation of RNN/LSTM in Spark
> 
> The short answer is there is none and highly unlikely to be inside of Spark MLlib any time in the near future.
> 
> The best bets are to look at other DL libraries - for JVM there is Deeplearning4J and BigDL (there are others but these seem to be the most comprehensive I have come across) - that run on Spark. Also there are various flavours of TensorFlow / Caffe on Spark. And of course the libs such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java or Scala APIs and some form of Spark integration out there in the community (in varying states of development).
> 
> Integrations with Spark are a bit patchy currently but include the "XOnSpark" flavours mentioned above and TensorFrames (again, there may be others).
> 
> On Thu, 23 Feb 2017 at 14:23 n1kt0 <nikita.balyschew@googlemail.com <ma...@googlemail.com><mailto:nikita.balyschew@googlemail.com <ma...@googlemail.com>>> wrote:
> Hi,
> can anyone tell me what the current status about RNNs in Spark is?
> 
> 
> 
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html <http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html>
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org><mailto:dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
> 


Re: Implementation of RNN/LSTM in Spark

Posted by Yuhao Yang <hh...@gmail.com>.
Welcome to try and contribute to our BigDL:
https://github.com/intel-analytics/BigDL

It's native on Spark and fast by leveraging Intel MKL.

2017-02-23 4:51 GMT-08:00 Joeri Hermans <jo...@cern.ch>:

> Hi Nikita,
>
> We are actively working on this: https://github.com/cerndb/dist-keras
> This will allow you to run Keras on Spark (with distributed optimization
> algorithms) through pyspark. I recommend you to check the examples
> https://github.com/cerndb/dist-keras/tree/master/examples. However, you
> need to be aware that distributed optimization is a research topic, and has
> several approaches and caveats you need to be aware of. I wrote a blog post
> on this if you like to have some additional information on this topic
> https://db-blog.web.cern.ch/blog/joeri-hermans/2017-01-
> distributed-deep-learning-apache-spark-and-keras
>
> However, if you don't want to use a distributed optimization algorithm, we
> also support a "sequential trainer" which allows you to train a model on
> Spark dataframes.
>
> Kind regards,
>
> Joeri
> ________________________________________.
> From: Nick Pentreath [nick.pentreath@gmail.com]
> Sent: 23 February 2017 13:39
> To: dev@spark.apache.org
> Subject: Re: Implementation of RNN/LSTM in Spark
>
> The short answer is there is none and highly unlikely to be inside of
> Spark MLlib any time in the near future.
>
> The best bets are to look at other DL libraries - for JVM there is
> Deeplearning4J and BigDL (there are others but these seem to be the most
> comprehensive I have come across) - that run on Spark. Also there are
> various flavours of TensorFlow / Caffe on Spark. And of course the libs
> such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java
> or Scala APIs and some form of Spark integration out there in the community
> (in varying states of development).
>
> Integrations with Spark are a bit patchy currently but include the
> "XOnSpark" flavours mentioned above and TensorFrames (again, there may be
> others).
>
> On Thu, 23 Feb 2017 at 14:23 n1kt0 <nikita.balyschew@googlemail.com
> <ma...@googlemail.com>> wrote:
> Hi,
> can anyone tell me what the current status about RNNs in Spark is?
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-
> tp14866p21060.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<mailto:
> dev-unsubscribe@spark.apache.org>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

RE: Implementation of RNN/LSTM in Spark

Posted by Joeri Hermans <jo...@cern.ch>.
Hi Nikita,

We are actively working on this: https://github.com/cerndb/dist-keras This will allow you to run Keras on Spark (with distributed optimization algorithms) through pyspark. I recommend you to check the examples https://github.com/cerndb/dist-keras/tree/master/examples. However, you need to be aware that distributed optimization is a research topic, and has several approaches and caveats you need to be aware of. I wrote a blog post on this if you like to have some additional information on this topic https://db-blog.web.cern.ch/blog/joeri-hermans/2017-01-distributed-deep-learning-apache-spark-and-keras

However, if you don't want to use a distributed optimization algorithm, we also support a "sequential trainer" which allows you to train a model on Spark dataframes.

Kind regards,

Joeri
________________________________________.
From: Nick Pentreath [nick.pentreath@gmail.com]
Sent: 23 February 2017 13:39
To: dev@spark.apache.org
Subject: Re: Implementation of RNN/LSTM in Spark

The short answer is there is none and highly unlikely to be inside of Spark MLlib any time in the near future.

The best bets are to look at other DL libraries - for JVM there is Deeplearning4J and BigDL (there are others but these seem to be the most comprehensive I have come across) - that run on Spark. Also there are various flavours of TensorFlow / Caffe on Spark. And of course the libs such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java or Scala APIs and some form of Spark integration out there in the community (in varying states of development).

Integrations with Spark are a bit patchy currently but include the "XOnSpark" flavours mentioned above and TensorFrames (again, there may be others).

On Thu, 23 Feb 2017 at 14:23 n1kt0 <ni...@googlemail.com>> wrote:
Hi,
can anyone tell me what the current status about RNNs in Spark is?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org>


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Implementation of RNN/LSTM in Spark

Posted by Liang-Chi Hsieh <vi...@gmail.com>.
Yeah, I'd agree with Nick.

To have an implementation of RNN/LSTM in Spark, you may need a comprehensive
abstraction of neural networks which is general enough to represent the
computation (think of Torch, Keras, Tensorflow, MXNet, Caffe, etc.), and
modify current computation engine to work with various computing units such
as GPU. I don't think we will have such thing to be in Spark in the near
future.

There are many efforts to integrate Spark and the specialized frameworks
doing well in this abstraction and parallel computation. The best approach I
think is to look at this efforts and contribute to them if possible.


Nick Pentreath wrote
> The short answer is there is none and highly unlikely to be inside of
> Spark
> MLlib any time in the near future.
> 
> The best bets are to look at other DL libraries - for JVM there is
> Deeplearning4J and BigDL (there are others but these seem to be the most
> comprehensive I have come across) - that run on Spark. Also there are
> various flavours of TensorFlow / Caffe on Spark. And of course the libs
> such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java
> or Scala APIs and some form of Spark integration out there in the
> community
> (in varying states of development).
> 
> Integrations with Spark are a bit patchy currently but include the
> "XOnSpark" flavours mentioned above and TensorFrames (again, there may be
> others).
> 
> On Thu, 23 Feb 2017 at 14:23 n1kt0 &lt;

> nikita.balyschew@

> &gt; wrote:
> 
>> Hi,
>> can anyone tell me what the current status about RNNs in Spark is?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: 

> dev-unsubscribe@.apache

>>
>>





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21094.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Implementation of RNN/LSTM in Spark

Posted by Nick Pentreath <ni...@gmail.com>.
The short answer is there is none and highly unlikely to be inside of Spark
MLlib any time in the near future.

The best bets are to look at other DL libraries - for JVM there is
Deeplearning4J and BigDL (there are others but these seem to be the most
comprehensive I have come across) - that run on Spark. Also there are
various flavours of TensorFlow / Caffe on Spark. And of course the libs
such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java
or Scala APIs and some form of Spark integration out there in the community
(in varying states of development).

Integrations with Spark are a bit patchy currently but include the
"XOnSpark" flavours mentioned above and TensorFrames (again, there may be
others).

On Thu, 23 Feb 2017 at 14:23 n1kt0 <ni...@googlemail.com> wrote:

> Hi,
> can anyone tell me what the current status about RNNs in Spark is?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>