You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "second_comet@yahoo.com.INVALID" <se...@yahoo.com.INVALID> on 2023/04/29 12:30:10 UTC

Tensorflow on Spark CPU

Anyone successfully run native tensorflow on Spark ? i tested example at https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor  on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any speed up in training time by setting number of slot from1 to 10. The time taken to train is still the same. Anyone tested tensorflow training on Spark distributed workers with CPUs ?  Can share your working example?

 




Re: Tensorflow on Spark CPU

Posted by Sean Owen <sr...@gmail.com>.
There is a large overhead to distributing this type of workload. I imagine
that for a small problem, the overhead dominates. You do not nearly need to
distribute a problem of this size, so more workers is probalby just worse.

On Sun, Apr 30, 2023 at 1:46 AM second_comet@yahoo.com <
second_comet@yahoo.com> wrote:

> I re-test with cifar10 example and below is the result .  can advice why
> lesser num_slot is faster compared with more slots?
>
> num_slots=20
>
> 231 seconds
>
>
> num_slots=5
>
> 52 seconds
>
>
> num_slot=1
>
> 34 seconds
>
> the code is at below
> https://gist.github.com/cometta/240bbc549155e22f80f6ba670c9a2e32
>
> Do you have an example of tensorflow+big dataset that I can test?
>
>
>
>
>
>
>
> On Saturday, April 29, 2023 at 08:44:04 PM GMT+8, Sean Owen <
> srowen@gmail.com> wrote:
>
>
> You don't want to use CPUs with Tensorflow.
> If it's not scaling, you may have a problem that is far too small to
> distribute.
>
> On Sat, Apr 29, 2023 at 7:30 AM second_comet@yahoo.com.INVALID
> <se...@yahoo.com.invalid> wrote:
>
> Anyone successfully run native tensorflow on Spark ? i tested example at
> https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
> on Kubernetes CPU . By running in on multiple workers CPUs. I do not see
> any speed up in training time by setting number of slot from1 to 10. The
> time taken to train is still the same. Anyone tested tensorflow training on
> Spark distributed workers with CPUs ?  Can share your working example?
>
>
>
>
>
>

Re: Tensorflow on Spark CPU

Posted by "second_comet@yahoo.com.INVALID" <se...@yahoo.com.INVALID>.
 I re-test with cifar10 example and below is the result .  can advice why lesser num_slot is faster compared with more slots?
num_slots=20     231 seconds
num_slots=5 52 seconds
num_slot=134 seconds

the code is at below 
https://gist.github.com/cometta/240bbc549155e22f80f6ba670c9a2e32
Do you have an example of tensorflow+big dataset that I can test?






    On Saturday, April 29, 2023 at 08:44:04 PM GMT+8, Sean Owen <sr...@gmail.com> wrote:  
 
 You don't want to use CPUs with Tensorflow.If it's not scaling, you may have a problem that is far too small to distribute.
On Sat, Apr 29, 2023 at 7:30 AM second_comet@yahoo.com.INVALID <se...@yahoo.com.invalid> wrote:

Anyone successfully run native tensorflow on Spark ? i tested example at https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor  on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any speed up in training time by setting number of slot from1 to 10. The time taken to train is still the same. Anyone tested tensorflow training on Spark distributed workers with CPUs ?  Can share your working example?

 




  

Re: Tensorflow on Spark CPU

Posted by Sean Owen <sr...@gmail.com>.
You don't want to use CPUs with Tensorflow.
If it's not scaling, you may have a problem that is far too small to
distribute.

On Sat, Apr 29, 2023 at 7:30 AM second_comet@yahoo.com.INVALID
<se...@yahoo.com.invalid> wrote:

> Anyone successfully run native tensorflow on Spark ? i tested example at
> https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
> on Kubernetes CPU . By running in on multiple workers CPUs. I do not see
> any speed up in training time by setting number of slot from1 to 10. The
> time taken to train is still the same. Anyone tested tensorflow training on
> Spark distributed workers with CPUs ?  Can share your working example?
>
>
>
>
>
>