You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Andy Davidson <An...@SantaCruzIntegration.com> on 2017/11/22 18:02:43 UTC

does "Deep Learning Pipelines" scale out linearly?

I am starting a new deep learning project currently we do all of our work on
a single machine using a combination of Keras and Tensor flow.
https://databricks.github.io/spark-deep-learning/site/index.html looks very
promising. Any idea how performance is likely to improve as I add machines
to my my cluster?

Kind regards

Andy


P.s. Is user@spark.apache.org the best place to ask questions about this
package?

Re: does "Deep Learning Pipelines" scale out linearly?

Posted by Nick Pentreath <ni...@gmail.com>.

For that package specifically it’s best to see if they have a mailing list
and if not perhaps ask on github issues.

Having said that perhaps the folks involved in that package will reply here
too.

On Wed, 22 Nov 2017 at 20:03, Andy Davidson <An...@santacruzintegration.com>
wrote:

> I am starting a new deep learning project currently we do all of our work
> on a single machine using a combination of Keras and Tensor flow.
> https://databricks.github.io/spark-deep-learning/site/index.html looks
> very promising. Any idea how performance is likely to improve as I add
> machines to my my cluster?
>
> Kind regards
>
> Andy
>
>
> P.s. Is user@spark.apache.org the best place to ask questions about this
> package?
>
>
>

Re: does "Deep Learning Pipelines" scale out linearly?

Posted by Tim Hunter <ti...@databricks.com>.

Hello Andy,
regarding your question, this will depend a lot on the specific task:
 - for tasks that are "easy" to distribute such as inference
(scoring), hyper-parameter tuning or cross-validation, these tasks
will take full advantage of the cluster and the performance should
improve more or less linearly
 - for training the same model with multiple machines, and a
distributed dataset, then you are currently better off with a
dedicated solution such as TensorFlowOnSpark or dist-keras. We are
working on addressing this issue in a future release.

Also, we opened a mailing list dedicated to Deep Learning Pipelines,
to which I will copy this answer. Feel free to answer there:

https://groups.google.com/forum/#!forum/dl-pipelines-users/


Tim


On November 22, 2017 at 10:02:59 AM, Andy Davidson
(andy@santacruzintegration.com) wrote:
> I am starting a new deep learning project currently we do all of our work on
> a single machine using a combination of Keras and Tensor flow.
> https://databricks.github.io/spark-deep-learning/site/index.html looks very
> promising. Any idea how performance is likely to improve as I add machines
> to my my cluster?
>
> Kind regards
>
> Andy
>
>
> P.s. Is user@spark.apache.org the best place to ask questions about this
> package?
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org