You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vijayant Kumar <Vi...@mavenir.com.INVALID> on 2022/02/23 03:27:04 UTC

RE: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Thanks Sean for your response. !!

Want to add some more background here.

I am using Spark3.0+ version with Tensorflow 2.0+.
My use case is not for the image data but for the Time-series data where I am using LSTM and transformers to forecast.

I evaluated SparkFlow and spark_tensorflow_distributor libraries, and there has been no major development recently on those libraries. I faced the issue of version dependencies on those and had a hard time fixing the library compatibilities. Hence a couple of below doubts:-


  *   Does Horovod have any dependencies?
  *   Any other library which is suitable for my use case.?
  *   Any example code would really be of great help to understand.

Thanks,
Vijayant

From: Sean Owen <sr...@gmail.com>
Sent: Wednesday, February 23, 2022 8:40 AM
To: Vijayant Kumar <Vi...@mavenir.com.invalid>
Cc: user @spark <us...@spark.apache.org>
Subject: [E] COMMERCIAL BULK: Re: TensorFlow on Spark


Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) : Beware of Phishing Scams, Report questionable emails to spam@mavenir.com<ma...@mavenir.com>
Sure, Horovod is commonly used on Spark for this:
https://horovod.readthedocs.io/en/stable/spark_include.html

On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <Vi...@mavenir.com.invalid>> wrote:
Hi All,

Anyone using Apache spark with TensorFlow for building models. My requirement is to use TensorFlow distributed model training across the Spark executors.
Please help me with some resources or some sample code.

Thanks,
Vijayant
________________________________
This e-mail message may contain confidential or proprietary information of Mavenir Systems, Inc. or its affiliates and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this message, you are hereby notified that any review, use or distribution of this information is absolutely prohibited and we request that you delete all copies in your control and contact us by e-mailing to security@mavenir.com<ma...@mavenir.com>. This message contains the views of its author and may not necessarily reflect the views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor email messages, but make no representation that such messages are authorized, secure, uncompromised, or free from computer viruses, malware, or other defects. Thank You
________________________________
This e-mail message may contain confidential or proprietary information of Mavenir Systems, Inc. or its affiliates and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this message, you are hereby notified that any review, use or distribution of this information is absolutely prohibited and we request that you delete all copies in your control and contact us by e-mailing to security@mavenir.com. This message contains the views of its author and may not necessarily reflect the views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor email messages, but make no representation that such messages are authorized, secure, uncompromised, or free from computer viruses, malware, or other defects. Thank You

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Sean Owen <sr...@gmail.com>.
Dependencies? Sure like any python library.  What are you asking about
there?

I don't know of a modern alternative on Spark.

Did you read the docs or search? Plenty of examples

On Tue, Feb 22, 2022, 9:27 PM Vijayant Kumar <Vi...@mavenir.com>
wrote:

> Thanks Sean for your response. !!
>
>
>
> Want to add some more background here.
>
>
>
> I am using Spark3.0+ version with Tensorflow 2.0+.
>
> My use case is not for the image data but for the Time-series data where I
> am using LSTM and transformers to forecast.
>
>
>
> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
> there has been no major development recently on those libraries. I faced
> the issue of version dependencies on those and had a hard time fixing the
> library compatibilities. Hence a couple of below doubts:-
>
>
>
>    - Does *Horovod* have any dependencies?
>    - Any other library which is suitable for my use case.?
>    - Any example code would really be of great help to understand.
>
>
>
> Thanks,
>
> Vijayant
>
>
>
> *From:* Sean Owen <sr...@gmail.com>
> *Sent:* Wednesday, February 23, 2022 8:40 AM
> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
> *Cc:* user @spark <us...@spark.apache.org>
> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>
>
>
> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware of
> Phishing Scams, Report questionable emails to spam@mavenir.com
>
> Sure, Horovod is commonly used on Spark for this:
>
> https://horovod.readthedocs.io/en/stable/spark_include.html
>
>
>
> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
> Vijayant.Kumar@mavenir.com.invalid> wrote:
>
> Hi All,
>
>
>
> Anyone using Apache spark with TensorFlow for building models. My
> requirement is to use TensorFlow distributed model training across the
> Spark executors.
>
> Please help me with some resources or some sample code.
>
>
>
> Thanks,
>
> Vijayant
> ------------------------------
>
> This e-mail message may contain confidential or proprietary information of
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use
> of the intended recipient(s). If you are not the intended recipient of this
> message, you are hereby notified that any review, use or distribution of
> this information is absolutely prohibited and we request that you delete
> all copies in your control and contact us by e-mailing to
> security@mavenir.com. This message contains the views of its author and
> may not necessarily reflect the views of Mavenir Systems, Inc. or its
> affiliates, who employ systems to monitor email messages, but make no
> representation that such messages are authorized, secure, uncompromised, or
> free from computer viruses, malware, or other defects. Thank You
>
> ------------------------------
>
> This e-mail message may contain confidential or proprietary information of
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use
> of the intended recipient(s). If you are not the intended recipient of this
> message, you are hereby notified that any review, use or distribution of
> this information is absolutely prohibited and we request that you delete
> all copies in your control and contact us by e-mailing to
> security@mavenir.com. This message contains the views of its author and
> may not necessarily reflect the views of Mavenir Systems, Inc. or its
> affiliates, who employ systems to monitor email messages, but make no
> representation that such messages are authorized, secure, uncompromised, or
> free from computer viruses, malware, or other defects. Thank You
>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Gourav Sengupta <go...@gmail.com>.
Dear Sean,

I do agree with you to a certain extent, makes sense. Perhaps I am wrong in
asking for native integrations and not depending on over engineered
external solutions which have their own performance issues, and bottlenecks
in live production environment. But asking and stating ones opinion should
be fine I think.

Just like inspite of having Pandas UDF we went for Koalas, similarly SPARK
native integrations which are light weight and easy to use and extend to
deep learning frameworks perhaps makes sense according to me.

Regards,
Gourav Sengupta

Regards,
Gourav Sengupta

On Thu, Feb 24, 2022 at 2:06 PM Sean Owen <sr...@gmail.com> wrote:

> On the contrary, distributed deep learning is not data parallel. It's
> dominated by the need to share parameters across workers.
> Gourav, I don't understand what you're looking for. Have you looked at
> Petastorm and Horovod? they _use Spark_, not another platform like Ray. Why
> recreate this which has worked for years? what would it matter if it were
> in the Spark project? I think you're on a limb there.
> One goal of Spark is very much not to build in everything that could exist
> as a library, and distributed deep learning remains an important but niche
> use case. Instead it provides the infra for these things, like barrier mode.
>
> On Thu, Feb 24, 2022 at 7:21 AM Bitfox <bi...@bitfox.top> wrote:
>
>> I have been using tensorflow for a long time, it's not hard to implement
>> a distributed training job at all, either by model parallelization or data
>> parallelization. I don't think there is much need to develop spark to
>> support tensorflow jobs. Just my thoughts...
>>
>>
>> On Thu, Feb 24, 2022 at 4:36 PM Gourav Sengupta <
>> gourav.sengupta@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I do not think that there is any reason for using over engineered
>>> platforms like Petastorm and Ray, except for certain use cases.
>>>
>>> What Ray is doing, except for certain use cases, could have been easily
>>> done by SPARK, I think, had the open source community got that steer. But
>>> maybe I am wrong and someone should be able to explain why the SPARK open
>>> source community cannot develop the capabilities which are so natural to
>>> almost all use cases of data processing in SPARK where the data gets
>>> consumed by deep learning frameworks and we are asked to use Ray or
>>> Petastorm?
>>>
>>> For those of us who are asking what does native integrations means
>>> please try to compare delta between release 2.x and 3.x and koalas before
>>> 3.2 and after 3.2.
>>>
>>> I am sure that the SPARK community can push for extending the dataframes
>>> from SPARK to deep learning and other frameworks by natively integrating
>>> them.
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Sean Owen <sr...@gmail.com>.
On the contrary, distributed deep learning is not data parallel. It's
dominated by the need to share parameters across workers.
Gourav, I don't understand what you're looking for. Have you looked at
Petastorm and Horovod? they _use Spark_, not another platform like Ray. Why
recreate this which has worked for years? what would it matter if it were
in the Spark project? I think you're on a limb there.
One goal of Spark is very much not to build in everything that could exist
as a library, and distributed deep learning remains an important but niche
use case. Instead it provides the infra for these things, like barrier mode.

On Thu, Feb 24, 2022 at 7:21 AM Bitfox <bi...@bitfox.top> wrote:

> I have been using tensorflow for a long time, it's not hard to implement a
> distributed training job at all, either by model parallelization or data
> parallelization. I don't think there is much need to develop spark to
> support tensorflow jobs. Just my thoughts...
>
>
> On Thu, Feb 24, 2022 at 4:36 PM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I do not think that there is any reason for using over engineered
>> platforms like Petastorm and Ray, except for certain use cases.
>>
>> What Ray is doing, except for certain use cases, could have been easily
>> done by SPARK, I think, had the open source community got that steer. But
>> maybe I am wrong and someone should be able to explain why the SPARK open
>> source community cannot develop the capabilities which are so natural to
>> almost all use cases of data processing in SPARK where the data gets
>> consumed by deep learning frameworks and we are asked to use Ray or
>> Petastorm?
>>
>> For those of us who are asking what does native integrations means please
>> try to compare delta between release 2.x and 3.x and koalas before 3.2 and
>> after 3.2.
>>
>> I am sure that the SPARK community can push for extending the dataframes
>> from SPARK to deep learning and other frameworks by natively integrating
>> them.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Bitfox,

yes distributed training using Pytorch and Tensorflow is really superb and
great and you are spot on. There is actually absolutely no need for
solutions like Ray/ Petastorm etc...

But in case I want to pre process data in SPARK and push the results to
these deep learning libraries, then what do we do? Because creating
professional quality data loaders is a very big job, therefore, these
solutions try to occupy that space as an entry point.


Regards,
Gourav Sengupta



On Thu, Feb 24, 2022 at 1:21 PM Bitfox <bi...@bitfox.top> wrote:

> I have been using tensorflow for a long time, it's not hard to implement a
> distributed training job at all, either by model parallelization or data
> parallelization. I don't think there is much need to develop spark to
> support tensorflow jobs. Just my thoughts...
>
>
> On Thu, Feb 24, 2022 at 4:36 PM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I do not think that there is any reason for using over engineered
>> platforms like Petastorm and Ray, except for certain use cases.
>>
>> What Ray is doing, except for certain use cases, could have been easily
>> done by SPARK, I think, had the open source community got that steer. But
>> maybe I am wrong and someone should be able to explain why the SPARK open
>> source community cannot develop the capabilities which are so natural to
>> almost all use cases of data processing in SPARK where the data gets
>> consumed by deep learning frameworks and we are asked to use Ray or
>> Petastorm?
>>
>> For those of us who are asking what does native integrations means please
>> try to compare delta between release 2.x and 3.x and koalas before 3.2 and
>> after 3.2.
>>
>> I am sure that the SPARK community can push for extending the dataframes
>> from SPARK to deep learning and other frameworks by natively integrating
>> them.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>>
>> On Wed, Feb 23, 2022 at 4:42 PM Dennis Suhari <d....@icloud.com.invalid>
>> wrote:
>>
>>> Currently we are trying AnalyticsZoo and Ray
>>>
>>>
>>> Von meinem iPhone gesendet
>>>
>>> Am 23.02.2022 um 04:53 schrieb Bitfox <bi...@bitfox.top>:
>>>
>>> 
>>> tensorflow itself can implement the distributed computing via a
>>> parameter server. Why did you want spark here?
>>>
>>> regards.
>>>
>>> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
>>> <Vi...@mavenir.com.invalid> wrote:
>>>
>>>> Thanks Sean for your response. !!
>>>>
>>>>
>>>>
>>>> Want to add some more background here.
>>>>
>>>>
>>>>
>>>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>>>
>>>> My use case is not for the image data but for the Time-series data
>>>> where I am using LSTM and transformers to forecast.
>>>>
>>>>
>>>>
>>>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>>>> there has been no major development recently on those libraries. I faced
>>>> the issue of version dependencies on those and had a hard time fixing the
>>>> library compatibilities. Hence a couple of below doubts:-
>>>>
>>>>
>>>>
>>>>    - Does *Horovod* have any dependencies?
>>>>    - Any other library which is suitable for my use case.?
>>>>    - Any example code would really be of great help to understand.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Vijayant
>>>>
>>>>
>>>>
>>>> *From:* Sean Owen <sr...@gmail.com>
>>>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>>>> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
>>>> *Cc:* user @spark <us...@spark.apache.org>
>>>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>>>
>>>>
>>>>
>>>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware
>>>> of Phishing Scams, Report questionable emails to spam@mavenir.com
>>>>
>>>> Sure, Horovod is commonly used on Spark for this:
>>>>
>>>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>>>
>>>>
>>>>
>>>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>>>> Vijayant.Kumar@mavenir.com.invalid> wrote:
>>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> Anyone using Apache spark with TensorFlow for building models. My
>>>> requirement is to use TensorFlow distributed model training across the
>>>> Spark executors.
>>>>
>>>> Please help me with some resources or some sample code.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Vijayant
>>>> ------------------------------
>>>>
>>>> This e-mail message may contain confidential or proprietary information
>>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>>> use of the intended recipient(s). If you are not the intended recipient of
>>>> this message, you are hereby notified that any review, use or distribution
>>>> of this information is absolutely prohibited and we request that you delete
>>>> all copies in your control and contact us by e-mailing to
>>>> security@mavenir.com. This message contains the views of its author
>>>> and may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>>> affiliates, who employ systems to monitor email messages, but make no
>>>> representation that such messages are authorized, secure, uncompromised, or
>>>> free from computer viruses, malware, or other defects. Thank You
>>>>
>>>> ------------------------------
>>>>
>>>> This e-mail message may contain confidential or proprietary information
>>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>>> use of the intended recipient(s). If you are not the intended recipient of
>>>> this message, you are hereby notified that any review, use or distribution
>>>> of this information is absolutely prohibited and we request that you delete
>>>> all copies in your control and contact us by e-mailing to
>>>> security@mavenir.com. This message contains the views of its author
>>>> and may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>>> affiliates, who employ systems to monitor email messages, but make no
>>>> representation that such messages are authorized, secure, uncompromised, or
>>>> free from computer viruses, malware, or other defects. Thank You
>>>>
>>>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Bitfox <bi...@bitfox.top>.
I have been using tensorflow for a long time, it's not hard to implement a
distributed training job at all, either by model parallelization or data
parallelization. I don't think there is much need to develop spark to
support tensorflow jobs. Just my thoughts...


On Thu, Feb 24, 2022 at 4:36 PM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
>
> I do not think that there is any reason for using over engineered
> platforms like Petastorm and Ray, except for certain use cases.
>
> What Ray is doing, except for certain use cases, could have been easily
> done by SPARK, I think, had the open source community got that steer. But
> maybe I am wrong and someone should be able to explain why the SPARK open
> source community cannot develop the capabilities which are so natural to
> almost all use cases of data processing in SPARK where the data gets
> consumed by deep learning frameworks and we are asked to use Ray or
> Petastorm?
>
> For those of us who are asking what does native integrations means please
> try to compare delta between release 2.x and 3.x and koalas before 3.2 and
> after 3.2.
>
> I am sure that the SPARK community can push for extending the dataframes
> from SPARK to deep learning and other frameworks by natively integrating
> them.
>
>
> Regards,
> Gourav Sengupta
>
>
> On Wed, Feb 23, 2022 at 4:42 PM Dennis Suhari <d....@icloud.com.invalid>
> wrote:
>
>> Currently we are trying AnalyticsZoo and Ray
>>
>>
>> Von meinem iPhone gesendet
>>
>> Am 23.02.2022 um 04:53 schrieb Bitfox <bi...@bitfox.top>:
>>
>> 
>> tensorflow itself can implement the distributed computing via a
>> parameter server. Why did you want spark here?
>>
>> regards.
>>
>> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
>> <Vi...@mavenir.com.invalid> wrote:
>>
>>> Thanks Sean for your response. !!
>>>
>>>
>>>
>>> Want to add some more background here.
>>>
>>>
>>>
>>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>>
>>> My use case is not for the image data but for the Time-series data where
>>> I am using LSTM and transformers to forecast.
>>>
>>>
>>>
>>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>>> there has been no major development recently on those libraries. I faced
>>> the issue of version dependencies on those and had a hard time fixing the
>>> library compatibilities. Hence a couple of below doubts:-
>>>
>>>
>>>
>>>    - Does *Horovod* have any dependencies?
>>>    - Any other library which is suitable for my use case.?
>>>    - Any example code would really be of great help to understand.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Vijayant
>>>
>>>
>>>
>>> *From:* Sean Owen <sr...@gmail.com>
>>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>>> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
>>> *Cc:* user @spark <us...@spark.apache.org>
>>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>>
>>>
>>>
>>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware of
>>> Phishing Scams, Report questionable emails to spam@mavenir.com
>>>
>>> Sure, Horovod is commonly used on Spark for this:
>>>
>>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>>
>>>
>>>
>>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>>> Vijayant.Kumar@mavenir.com.invalid> wrote:
>>>
>>> Hi All,
>>>
>>>
>>>
>>> Anyone using Apache spark with TensorFlow for building models. My
>>> requirement is to use TensorFlow distributed model training across the
>>> Spark executors.
>>>
>>> Please help me with some resources or some sample code.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Vijayant
>>> ------------------------------
>>>
>>> This e-mail message may contain confidential or proprietary information
>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>> use of the intended recipient(s). If you are not the intended recipient of
>>> this message, you are hereby notified that any review, use or distribution
>>> of this information is absolutely prohibited and we request that you delete
>>> all copies in your control and contact us by e-mailing to
>>> security@mavenir.com. This message contains the views of its author and
>>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>> affiliates, who employ systems to monitor email messages, but make no
>>> representation that such messages are authorized, secure, uncompromised, or
>>> free from computer viruses, malware, or other defects. Thank You
>>>
>>> ------------------------------
>>>
>>> This e-mail message may contain confidential or proprietary information
>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>> use of the intended recipient(s). If you are not the intended recipient of
>>> this message, you are hereby notified that any review, use or distribution
>>> of this information is absolutely prohibited and we request that you delete
>>> all copies in your control and contact us by e-mailing to
>>> security@mavenir.com. This message contains the views of its author and
>>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>> affiliates, who employ systems to monitor email messages, but make no
>>> representation that such messages are authorized, secure, uncompromised, or
>>> free from computer viruses, malware, or other defects. Thank You
>>>
>>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

I do not think that there is any reason for using over engineered platforms
like Petastorm and Ray, except for certain use cases.

What Ray is doing, except for certain use cases, could have been easily
done by SPARK, I think, had the open source community got that steer. But
maybe I am wrong and someone should be able to explain why the SPARK open
source community cannot develop the capabilities which are so natural to
almost all use cases of data processing in SPARK where the data gets
consumed by deep learning frameworks and we are asked to use Ray or
Petastorm?

For those of us who are asking what does native integrations means please
try to compare delta between release 2.x and 3.x and koalas before 3.2 and
after 3.2.

I am sure that the SPARK community can push for extending the dataframes
from SPARK to deep learning and other frameworks by natively integrating
them.


Regards,
Gourav Sengupta


On Wed, Feb 23, 2022 at 4:42 PM Dennis Suhari <d....@icloud.com.invalid>
wrote:

> Currently we are trying AnalyticsZoo and Ray
>
>
> Von meinem iPhone gesendet
>
> Am 23.02.2022 um 04:53 schrieb Bitfox <bi...@bitfox.top>:
>
> 
> tensorflow itself can implement the distributed computing via a
> parameter server. Why did you want spark here?
>
> regards.
>
> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
> <Vi...@mavenir.com.invalid> wrote:
>
>> Thanks Sean for your response. !!
>>
>>
>>
>> Want to add some more background here.
>>
>>
>>
>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>
>> My use case is not for the image data but for the Time-series data where
>> I am using LSTM and transformers to forecast.
>>
>>
>>
>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>> there has been no major development recently on those libraries. I faced
>> the issue of version dependencies on those and had a hard time fixing the
>> library compatibilities. Hence a couple of below doubts:-
>>
>>
>>
>>    - Does *Horovod* have any dependencies?
>>    - Any other library which is suitable for my use case.?
>>    - Any example code would really be of great help to understand.
>>
>>
>>
>> Thanks,
>>
>> Vijayant
>>
>>
>>
>> *From:* Sean Owen <sr...@gmail.com>
>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
>> *Cc:* user @spark <us...@spark.apache.org>
>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>
>>
>>
>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware of
>> Phishing Scams, Report questionable emails to spam@mavenir.com
>>
>> Sure, Horovod is commonly used on Spark for this:
>>
>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>
>>
>>
>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>> Vijayant.Kumar@mavenir.com.invalid> wrote:
>>
>> Hi All,
>>
>>
>>
>> Anyone using Apache spark with TensorFlow for building models. My
>> requirement is to use TensorFlow distributed model training across the
>> Spark executors.
>>
>> Please help me with some resources or some sample code.
>>
>>
>>
>> Thanks,
>>
>> Vijayant
>> ------------------------------
>>
>> This e-mail message may contain confidential or proprietary information
>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>> use of the intended recipient(s). If you are not the intended recipient of
>> this message, you are hereby notified that any review, use or distribution
>> of this information is absolutely prohibited and we request that you delete
>> all copies in your control and contact us by e-mailing to
>> security@mavenir.com. This message contains the views of its author and
>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>> affiliates, who employ systems to monitor email messages, but make no
>> representation that such messages are authorized, secure, uncompromised, or
>> free from computer viruses, malware, or other defects. Thank You
>>
>> ------------------------------
>>
>> This e-mail message may contain confidential or proprietary information
>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>> use of the intended recipient(s). If you are not the intended recipient of
>> this message, you are hereby notified that any review, use or distribution
>> of this information is absolutely prohibited and we request that you delete
>> all copies in your control and contact us by e-mailing to
>> security@mavenir.com. This message contains the views of its author and
>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>> affiliates, who employ systems to monitor email messages, but make no
>> representation that such messages are authorized, secure, uncompromised, or
>> free from computer viruses, malware, or other defects. Thank You
>>
>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Dennis Suhari <d....@icloud.com.INVALID>.
Currently we are trying AnalyticsZoo and Ray


Von meinem iPhone gesendet

> Am 23.02.2022 um 04:53 schrieb Bitfox <bi...@bitfox.top>:
> 
> 
> tensorflow itself can implement the distributed computing via a parameter server. Why did you want spark here?
> 
> regards.
> 
>> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar <Vi...@mavenir.com.invalid> wrote:
>> Thanks Sean for your response. !!
>> 
>>  
>> 
>> Want to add some more background here.
>> 
>>  
>> 
>> I am using Spark3.0+ version with Tensorflow 2.0+.
>> 
>> My use case is not for the image data but for the Time-series data where I am using LSTM and transformers to forecast.
>> 
>>  
>> 
>> I evaluated SparkFlow and spark_tensorflow_distributor libraries, and there has been no major development recently on those libraries. I faced the issue of version dependencies on those and had a hard time fixing the library compatibilities. Hence a couple of below doubts:-
>> 
>>  
>> 
>> Does Horovod have any dependencies?
>> Any other library which is suitable for my use case.?
>> Any example code would really be of great help to understand.
>>  
>> 
>> Thanks,
>> 
>> Vijayant
>> 
>>  
>> 
>> From: Sean Owen <sr...@gmail.com> 
>> Sent: Wednesday, February 23, 2022 8:40 AM
>> To: Vijayant Kumar <Vi...@mavenir.com.invalid>
>> Cc: user @spark <us...@spark.apache.org>
>> Subject: [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>> 
>>  
>> 
>> Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) : Beware of Phishing Scams, Report questionable emails to spam@mavenir.com
>> 
>> Sure, Horovod is commonly used on Spark for this:
>> 
>> https://horovod.readthedocs.io/en/stable/spark_include.html
>> 
>>  
>> 
>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <Vi...@mavenir.com.invalid> wrote:
>> 
>> Hi All,
>> 
>>  
>> 
>> Anyone using Apache spark with TensorFlow for building models. My requirement is to use TensorFlow distributed model training across the Spark executors.
>> 
>> Please help me with some resources or some sample code.
>> 
>>  
>> 
>> Thanks,
>> 
>> Vijayant
>> 
>> This e-mail message may contain confidential or proprietary information of Mavenir Systems, Inc. or its affiliates and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this message, you are hereby notified that any review, use or distribution of this information is absolutely prohibited and we request that you delete all copies in your control and contact us by e-mailing to security@mavenir.com. This message contains the views of its author and may not necessarily reflect the views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor email messages, but make no representation that such messages are authorized, secure, uncompromised, or free from computer viruses, malware, or other defects. Thank You
>> 
>> This e-mail message may contain confidential or proprietary information of Mavenir Systems, Inc. or its affiliates and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this message, you are hereby notified that any review, use or distribution of this information is absolutely prohibited and we request that you delete all copies in your control and contact us by e-mailing to security@mavenir.com. This message contains the views of its author and may not necessarily reflect the views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor email messages, but make no representation that such messages are authorized, secure, uncompromised, or free from computer viruses, malware, or other defects. Thank You

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Sean Owen <sr...@gmail.com>.
Petastorm does that https://github.com/uber/petastorm in the sense that
it feeds Spark DFs to those frameworks in distributed training.
I'm not sure what you mean by native integration that is different? these
tools do just what you are talking about and have for a while.

On Wed, Feb 23, 2022 at 7:06 AM Gourav Sengupta <
gourav.sengupta.developer@gmail.com> wrote:

> Hi,
>
> I am sure those who have actually built a data processing pipeline whose
> contents have to be then delivered to tensorflow or pytorch (not for POC,
> or writing a blog to get clicks, or resolving symptomatic bugs, but in real
> life end-to-end application), will perhaps understand some of  the issues
> because SPARK dataframes do not natively integrate with tensorflow/
> pytorch.
>
> But perhaps I am wrong.
>
> My point of mentioning Ray is simple, it is based on the fact that if
> SPARK were to be able to natively scale out and distribute data to
> tensorflow, or pytorch then there will be competition between Ray and SPARK.
>
> Regards,
> Gourav Sengupta
>
> On Wed, Feb 23, 2022 at 12:35 PM Sean Owen <sr...@gmail.com> wrote:
>
>> Spark does do distributed ML, but not Tensorflow. Barrier execution mode
>> is an element that things like Horovod uses. Not sure what you are getting
>> at?
>> Ray is not Spark.
>> As I say -- Horovod does this already. The upside over TF distributed is
>> that Spark sets up and manages the daemon processes rather than doing it by
>> hand.
>>
>>
>> On Wed, Feb 23, 2022 at 2:43 AM Gourav Sengupta <
>> gourav.sengupta@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> the SPARK community should have been able to build distributed ML
>>> capabilities, and as far as I remember that was the idea initially behind
>>> SPARK 3.x roadmap (barrier execution mode,
>>> https://issues.apache.org/jira/browse/SPARK-24579).
>>>
>>> Ray, another Berkeley Labs output like SPARK, is trying to capture that
>>> market space.
>>>
>>> I am not sure whether there is any steer by the SPARK community leaders
>>> to seriously prioritise building those capabilities at all. But I am sure
>>> if the brilliant and fantastic minds behind SPARK did actually want to
>>> allow building those capabilities, they can easily do so, and achieve that
>>> :)
>>>
>>> I would sincerely request the open source SPARK community to prioritise
>>> building the SPARK capabilities to scale ML applications.
>>>
>>>
>>>
>>> Thanks and Regards,
>>> Gourav Sengupta
>>>
>>> On Wed, Feb 23, 2022 at 3:53 AM Bitfox <bi...@bitfox.top> wrote:
>>>
>>>> tensorflow itself can implement the distributed computing via a
>>>> parameter server. Why did you want spark here?
>>>>
>>>> regards.
>>>>
>>>> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
>>>> <Vi...@mavenir.com.invalid> wrote:
>>>>
>>>>> Thanks Sean for your response. !!
>>>>>
>>>>>
>>>>>
>>>>> Want to add some more background here.
>>>>>
>>>>>
>>>>>
>>>>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>>>>
>>>>> My use case is not for the image data but for the Time-series data
>>>>> where I am using LSTM and transformers to forecast.
>>>>>
>>>>>
>>>>>
>>>>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>>>>> there has been no major development recently on those libraries. I faced
>>>>> the issue of version dependencies on those and had a hard time fixing the
>>>>> library compatibilities. Hence a couple of below doubts:-
>>>>>
>>>>>
>>>>>
>>>>>    - Does *Horovod* have any dependencies?
>>>>>    - Any other library which is suitable for my use case.?
>>>>>    - Any example code would really be of great help to understand.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Vijayant
>>>>>
>>>>>
>>>>>
>>>>> *From:* Sean Owen <sr...@gmail.com>
>>>>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>>>>> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
>>>>> *Cc:* user @spark <us...@spark.apache.org>
>>>>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>>>>
>>>>>
>>>>>
>>>>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware
>>>>> of Phishing Scams, Report questionable emails to spam@mavenir.com
>>>>>
>>>>> Sure, Horovod is commonly used on Spark for this:
>>>>>
>>>>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>>>>> Vijayant.Kumar@mavenir.com.invalid> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>>
>>>>>
>>>>> Anyone using Apache spark with TensorFlow for building models. My
>>>>> requirement is to use TensorFlow distributed model training across the
>>>>> Spark executors.
>>>>>
>>>>> Please help me with some resources or some sample code.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Vijayant
>>>>> ------------------------------
>>>>>
>>>>> This e-mail message may contain confidential or proprietary
>>>>> information of Mavenir Systems, Inc. or its affiliates and is intended
>>>>> solely for the use of the intended recipient(s). If you are not the
>>>>> intended recipient of this message, you are hereby notified that any
>>>>> review, use or distribution of this information is absolutely prohibited
>>>>> and we request that you delete all copies in your control and contact us by
>>>>> e-mailing to security@mavenir.com. This message contains the views of
>>>>> its author and may not necessarily reflect the views of Mavenir Systems,
>>>>> Inc. or its affiliates, who employ systems to monitor email messages, but
>>>>> make no representation that such messages are authorized, secure,
>>>>> uncompromised, or free from computer viruses, malware, or other defects.
>>>>> Thank You
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> This e-mail message may contain confidential or proprietary
>>>>> information of Mavenir Systems, Inc. or its affiliates and is intended
>>>>> solely for the use of the intended recipient(s). If you are not the
>>>>> intended recipient of this message, you are hereby notified that any
>>>>> review, use or distribution of this information is absolutely prohibited
>>>>> and we request that you delete all copies in your control and contact us by
>>>>> e-mailing to security@mavenir.com. This message contains the views of
>>>>> its author and may not necessarily reflect the views of Mavenir Systems,
>>>>> Inc. or its affiliates, who employ systems to monitor email messages, but
>>>>> make no representation that such messages are authorized, secure,
>>>>> uncompromised, or free from computer viruses, malware, or other defects.
>>>>> Thank You
>>>>>
>>>>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

I am sure those who have actually built a data processing pipeline whose
contents have to be then delivered to tensorflow or pytorch (not for POC,
or writing a blog to get clicks, or resolving symptomatic bugs, but in real
life end-to-end application), will perhaps understand some of  the issues
because SPARK dataframes do not natively integrate with tensorflow/
pytorch.

But perhaps I am wrong.

My point of mentioning Ray is simple, it is based on the fact that if SPARK
were to be able to natively scale out and distribute data to tensorflow, or
pytorch then there will be competition between Ray and SPARK.

Regards,
Gourav Sengupta

On Wed, Feb 23, 2022 at 12:35 PM Sean Owen <sr...@gmail.com> wrote:

> Spark does do distributed ML, but not Tensorflow. Barrier execution mode
> is an element that things like Horovod uses. Not sure what you are getting
> at?
> Ray is not Spark.
> As I say -- Horovod does this already. The upside over TF distributed is
> that Spark sets up and manages the daemon processes rather than doing it by
> hand.
>
>
> On Wed, Feb 23, 2022 at 2:43 AM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> the SPARK community should have been able to build distributed ML
>> capabilities, and as far as I remember that was the idea initially behind
>> SPARK 3.x roadmap (barrier execution mode,
>> https://issues.apache.org/jira/browse/SPARK-24579).
>>
>> Ray, another Berkeley Labs output like SPARK, is trying to capture that
>> market space.
>>
>> I am not sure whether there is any steer by the SPARK community leaders
>> to seriously prioritise building those capabilities at all. But I am sure
>> if the brilliant and fantastic minds behind SPARK did actually want to
>> allow building those capabilities, they can easily do so, and achieve that
>> :)
>>
>> I would sincerely request the open source SPARK community to prioritise
>> building the SPARK capabilities to scale ML applications.
>>
>>
>>
>> Thanks and Regards,
>> Gourav Sengupta
>>
>> On Wed, Feb 23, 2022 at 3:53 AM Bitfox <bi...@bitfox.top> wrote:
>>
>>> tensorflow itself can implement the distributed computing via a
>>> parameter server. Why did you want spark here?
>>>
>>> regards.
>>>
>>> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
>>> <Vi...@mavenir.com.invalid> wrote:
>>>
>>>> Thanks Sean for your response. !!
>>>>
>>>>
>>>>
>>>> Want to add some more background here.
>>>>
>>>>
>>>>
>>>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>>>
>>>> My use case is not for the image data but for the Time-series data
>>>> where I am using LSTM and transformers to forecast.
>>>>
>>>>
>>>>
>>>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>>>> there has been no major development recently on those libraries. I faced
>>>> the issue of version dependencies on those and had a hard time fixing the
>>>> library compatibilities. Hence a couple of below doubts:-
>>>>
>>>>
>>>>
>>>>    - Does *Horovod* have any dependencies?
>>>>    - Any other library which is suitable for my use case.?
>>>>    - Any example code would really be of great help to understand.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Vijayant
>>>>
>>>>
>>>>
>>>> *From:* Sean Owen <sr...@gmail.com>
>>>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>>>> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
>>>> *Cc:* user @spark <us...@spark.apache.org>
>>>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>>>
>>>>
>>>>
>>>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware
>>>> of Phishing Scams, Report questionable emails to spam@mavenir.com
>>>>
>>>> Sure, Horovod is commonly used on Spark for this:
>>>>
>>>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>>>
>>>>
>>>>
>>>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>>>> Vijayant.Kumar@mavenir.com.invalid> wrote:
>>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> Anyone using Apache spark with TensorFlow for building models. My
>>>> requirement is to use TensorFlow distributed model training across the
>>>> Spark executors.
>>>>
>>>> Please help me with some resources or some sample code.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Vijayant
>>>> ------------------------------
>>>>
>>>> This e-mail message may contain confidential or proprietary information
>>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>>> use of the intended recipient(s). If you are not the intended recipient of
>>>> this message, you are hereby notified that any review, use or distribution
>>>> of this information is absolutely prohibited and we request that you delete
>>>> all copies in your control and contact us by e-mailing to
>>>> security@mavenir.com. This message contains the views of its author
>>>> and may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>>> affiliates, who employ systems to monitor email messages, but make no
>>>> representation that such messages are authorized, secure, uncompromised, or
>>>> free from computer viruses, malware, or other defects. Thank You
>>>>
>>>> ------------------------------
>>>>
>>>> This e-mail message may contain confidential or proprietary information
>>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>>> use of the intended recipient(s). If you are not the intended recipient of
>>>> this message, you are hereby notified that any review, use or distribution
>>>> of this information is absolutely prohibited and we request that you delete
>>>> all copies in your control and contact us by e-mailing to
>>>> security@mavenir.com. This message contains the views of its author
>>>> and may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>>> affiliates, who employ systems to monitor email messages, but make no
>>>> representation that such messages are authorized, secure, uncompromised, or
>>>> free from computer viruses, malware, or other defects. Thank You
>>>>
>>>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Sean Owen <sr...@gmail.com>.
Spark does do distributed ML, but not Tensorflow. Barrier execution mode is
an element that things like Horovod uses. Not sure what you are getting at?
Ray is not Spark.
As I say -- Horovod does this already. The upside over TF distributed is
that Spark sets up and manages the daemon processes rather than doing it by
hand.


On Wed, Feb 23, 2022 at 2:43 AM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
>
> the SPARK community should have been able to build distributed ML
> capabilities, and as far as I remember that was the idea initially behind
> SPARK 3.x roadmap (barrier execution mode,
> https://issues.apache.org/jira/browse/SPARK-24579).
>
> Ray, another Berkeley Labs output like SPARK, is trying to capture that
> market space.
>
> I am not sure whether there is any steer by the SPARK community leaders to
> seriously prioritise building those capabilities at all. But I am sure if
> the brilliant and fantastic minds behind SPARK did actually want to allow
> building those capabilities, they can easily do so, and achieve that :)
>
> I would sincerely request the open source SPARK community to prioritise
> building the SPARK capabilities to scale ML applications.
>
>
>
> Thanks and Regards,
> Gourav Sengupta
>
> On Wed, Feb 23, 2022 at 3:53 AM Bitfox <bi...@bitfox.top> wrote:
>
>> tensorflow itself can implement the distributed computing via a
>> parameter server. Why did you want spark here?
>>
>> regards.
>>
>> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
>> <Vi...@mavenir.com.invalid> wrote:
>>
>>> Thanks Sean for your response. !!
>>>
>>>
>>>
>>> Want to add some more background here.
>>>
>>>
>>>
>>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>>
>>> My use case is not for the image data but for the Time-series data where
>>> I am using LSTM and transformers to forecast.
>>>
>>>
>>>
>>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>>> there has been no major development recently on those libraries. I faced
>>> the issue of version dependencies on those and had a hard time fixing the
>>> library compatibilities. Hence a couple of below doubts:-
>>>
>>>
>>>
>>>    - Does *Horovod* have any dependencies?
>>>    - Any other library which is suitable for my use case.?
>>>    - Any example code would really be of great help to understand.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Vijayant
>>>
>>>
>>>
>>> *From:* Sean Owen <sr...@gmail.com>
>>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>>> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
>>> *Cc:* user @spark <us...@spark.apache.org>
>>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>>
>>>
>>>
>>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware of
>>> Phishing Scams, Report questionable emails to spam@mavenir.com
>>>
>>> Sure, Horovod is commonly used on Spark for this:
>>>
>>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>>
>>>
>>>
>>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>>> Vijayant.Kumar@mavenir.com.invalid> wrote:
>>>
>>> Hi All,
>>>
>>>
>>>
>>> Anyone using Apache spark with TensorFlow for building models. My
>>> requirement is to use TensorFlow distributed model training across the
>>> Spark executors.
>>>
>>> Please help me with some resources or some sample code.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Vijayant
>>> ------------------------------
>>>
>>> This e-mail message may contain confidential or proprietary information
>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>> use of the intended recipient(s). If you are not the intended recipient of
>>> this message, you are hereby notified that any review, use or distribution
>>> of this information is absolutely prohibited and we request that you delete
>>> all copies in your control and contact us by e-mailing to
>>> security@mavenir.com. This message contains the views of its author and
>>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>> affiliates, who employ systems to monitor email messages, but make no
>>> representation that such messages are authorized, secure, uncompromised, or
>>> free from computer viruses, malware, or other defects. Thank You
>>>
>>> ------------------------------
>>>
>>> This e-mail message may contain confidential or proprietary information
>>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>>> use of the intended recipient(s). If you are not the intended recipient of
>>> this message, you are hereby notified that any review, use or distribution
>>> of this information is absolutely prohibited and we request that you delete
>>> all copies in your control and contact us by e-mailing to
>>> security@mavenir.com. This message contains the views of its author and
>>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>>> affiliates, who employ systems to monitor email messages, but make no
>>> representation that such messages are authorized, secure, uncompromised, or
>>> free from computer viruses, malware, or other defects. Thank You
>>>
>>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

the SPARK community should have been able to build distributed ML
capabilities, and as far as I remember that was the idea initially behind
SPARK 3.x roadmap (barrier execution mode,
https://issues.apache.org/jira/browse/SPARK-24579).

Ray, another Berkeley Labs output like SPARK, is trying to capture that
market space.

I am not sure whether there is any steer by the SPARK community leaders to
seriously prioritise building those capabilities at all. But I am sure if
the brilliant and fantastic minds behind SPARK did actually want to allow
building those capabilities, they can easily do so, and achieve that :)

I would sincerely request the open source SPARK community to prioritise
building the SPARK capabilities to scale ML applications.



Thanks and Regards,
Gourav Sengupta

On Wed, Feb 23, 2022 at 3:53 AM Bitfox <bi...@bitfox.top> wrote:

> tensorflow itself can implement the distributed computing via a
> parameter server. Why did you want spark here?
>
> regards.
>
> On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
> <Vi...@mavenir.com.invalid> wrote:
>
>> Thanks Sean for your response. !!
>>
>>
>>
>> Want to add some more background here.
>>
>>
>>
>> I am using Spark3.0+ version with Tensorflow 2.0+.
>>
>> My use case is not for the image data but for the Time-series data where
>> I am using LSTM and transformers to forecast.
>>
>>
>>
>> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
>> there has been no major development recently on those libraries. I faced
>> the issue of version dependencies on those and had a hard time fixing the
>> library compatibilities. Hence a couple of below doubts:-
>>
>>
>>
>>    - Does *Horovod* have any dependencies?
>>    - Any other library which is suitable for my use case.?
>>    - Any example code would really be of great help to understand.
>>
>>
>>
>> Thanks,
>>
>> Vijayant
>>
>>
>>
>> *From:* Sean Owen <sr...@gmail.com>
>> *Sent:* Wednesday, February 23, 2022 8:40 AM
>> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
>> *Cc:* user @spark <us...@spark.apache.org>
>> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>>
>>
>>
>> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware of
>> Phishing Scams, Report questionable emails to spam@mavenir.com
>>
>> Sure, Horovod is commonly used on Spark for this:
>>
>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>
>>
>>
>> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
>> Vijayant.Kumar@mavenir.com.invalid> wrote:
>>
>> Hi All,
>>
>>
>>
>> Anyone using Apache spark with TensorFlow for building models. My
>> requirement is to use TensorFlow distributed model training across the
>> Spark executors.
>>
>> Please help me with some resources or some sample code.
>>
>>
>>
>> Thanks,
>>
>> Vijayant
>> ------------------------------
>>
>> This e-mail message may contain confidential or proprietary information
>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>> use of the intended recipient(s). If you are not the intended recipient of
>> this message, you are hereby notified that any review, use or distribution
>> of this information is absolutely prohibited and we request that you delete
>> all copies in your control and contact us by e-mailing to
>> security@mavenir.com. This message contains the views of its author and
>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>> affiliates, who employ systems to monitor email messages, but make no
>> representation that such messages are authorized, secure, uncompromised, or
>> free from computer viruses, malware, or other defects. Thank You
>>
>> ------------------------------
>>
>> This e-mail message may contain confidential or proprietary information
>> of Mavenir Systems, Inc. or its affiliates and is intended solely for the
>> use of the intended recipient(s). If you are not the intended recipient of
>> this message, you are hereby notified that any review, use or distribution
>> of this information is absolutely prohibited and we request that you delete
>> all copies in your control and contact us by e-mailing to
>> security@mavenir.com. This message contains the views of its author and
>> may not necessarily reflect the views of Mavenir Systems, Inc. or its
>> affiliates, who employ systems to monitor email messages, but make no
>> representation that such messages are authorized, secure, uncompromised, or
>> free from computer viruses, malware, or other defects. Thank You
>>
>

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Posted by Bitfox <bi...@bitfox.top>.
tensorflow itself can implement the distributed computing via a
parameter server. Why did you want spark here?

regards.

On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar
<Vi...@mavenir.com.invalid> wrote:

> Thanks Sean for your response. !!
>
>
>
> Want to add some more background here.
>
>
>
> I am using Spark3.0+ version with Tensorflow 2.0+.
>
> My use case is not for the image data but for the Time-series data where I
> am using LSTM and transformers to forecast.
>
>
>
> I evaluated *SparkFlow* and *spark_tensorflow_distributor *libraries, and
> there has been no major development recently on those libraries. I faced
> the issue of version dependencies on those and had a hard time fixing the
> library compatibilities. Hence a couple of below doubts:-
>
>
>
>    - Does *Horovod* have any dependencies?
>    - Any other library which is suitable for my use case.?
>    - Any example code would really be of great help to understand.
>
>
>
> Thanks,
>
> Vijayant
>
>
>
> *From:* Sean Owen <sr...@gmail.com>
> *Sent:* Wednesday, February 23, 2022 8:40 AM
> *To:* Vijayant Kumar <Vi...@mavenir.com.invalid>
> *Cc:* user @spark <us...@spark.apache.org>
> *Subject:* [E] COMMERCIAL BULK: Re: TensorFlow on Spark
>
>
>
> *Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) *: Beware of
> Phishing Scams, Report questionable emails to spam@mavenir.com
>
> Sure, Horovod is commonly used on Spark for this:
>
> https://horovod.readthedocs.io/en/stable/spark_include.html
>
>
>
> On Tue, Feb 22, 2022 at 8:51 PM Vijayant Kumar <
> Vijayant.Kumar@mavenir.com.invalid> wrote:
>
> Hi All,
>
>
>
> Anyone using Apache spark with TensorFlow for building models. My
> requirement is to use TensorFlow distributed model training across the
> Spark executors.
>
> Please help me with some resources or some sample code.
>
>
>
> Thanks,
>
> Vijayant
> ------------------------------
>
> This e-mail message may contain confidential or proprietary information of
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use
> of the intended recipient(s). If you are not the intended recipient of this
> message, you are hereby notified that any review, use or distribution of
> this information is absolutely prohibited and we request that you delete
> all copies in your control and contact us by e-mailing to
> security@mavenir.com. This message contains the views of its author and
> may not necessarily reflect the views of Mavenir Systems, Inc. or its
> affiliates, who employ systems to monitor email messages, but make no
> representation that such messages are authorized, secure, uncompromised, or
> free from computer viruses, malware, or other defects. Thank You
>
> ------------------------------
>
> This e-mail message may contain confidential or proprietary information of
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use
> of the intended recipient(s). If you are not the intended recipient of this
> message, you are hereby notified that any review, use or distribution of
> this information is absolutely prohibited and we request that you delete
> all copies in your control and contact us by e-mailing to
> security@mavenir.com. This message contains the views of its author and
> may not necessarily reflect the views of Mavenir Systems, Inc. or its
> affiliates, who employ systems to monitor email messages, but make no
> representation that such messages are authorized, secure, uncompromised, or
> free from computer viruses, malware, or other defects. Thank You
>