You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Emmanouil Kritharakis <kr...@gmail.com> on 2023/03/14 16:04:11 UTC

Question related to asynchronously map transformation using java spark structured streaming

Hello,

I hope this email finds you well!

I have a simple dataflow in which I read from a kafka topic, perform a map
transformation and then I write the result to another topic. Based on your
documentation here
<https://spark.apache.org/docs/3.3.2/structured-streaming-kafka-integration.html#content>,
I need to work with Dataset data structures. Even though my solution works,
I need to utilize map transformation asynchronously. So my question is how
can I asynchronously call map transformation with Dataset data structures
in a java structured streaming environment? Can you please share a working
example?

I am looking forward to hearing from you as soon as possible. Thanks in
advance!

Kind regards

------------------------------------------------------------------

Emmanouil (Manos) Kritharakis

Ph.D. candidate in the Department of Computer Science
<https://sites.bu.edu/casp/people/ekritharakis/>

Boston University

Re: Question related to asynchronously map transformation using java spark structured streaming

Posted by Mich Talebzadeh <mi...@gmail.com>.

 Agreed.  How does asynchronous communication relate to Spark Structured
streaming?

In the previous post of yours,  you made your Spark to run on the driver in
a single JVM. You attempted to increase the number of executors to 3 after
submission of the job that (as Sean alluded to) would not work. So if you
want to improve performance of spark job  you will need to submit your
spark job similar to below (illustration only), specifying your
configuration(parameters number of executors etc) at time of submission:

         spark-submit --verbose \
           --deploy-mode client \
             .....
           --conf "spark.driver.memory"=4G \
           --conf "spark.executor.memory"=4G \
           --conf "spark.num.executors"=4 \
           --conf "spark.executor.cores"=2 \

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 26 Mar 2023 at 16:47, Sean Owen <sr...@gmail.com> wrote:

> What do you mean by asynchronously here?
>
> On Sun, Mar 26, 2023, 10:22 AM Emmanouil Kritharakis <
> kritharakismanolis@gmail.com> wrote:
>
>> Hello again,
>>
>> Do we have any news for the above question?
>> I would really appreciate it.
>>
>> Thank you,
>>
>> ------------------------------------------------------------------
>>
>> Emmanouil (Manos) Kritharakis
>>
>> Ph.D. candidate in the Department of Computer Science
>> <https://sites.bu.edu/casp/people/ekritharakis/>
>>
>> Boston University
>>
>>
>> On Tue, Mar 14, 2023 at 12:04 PM Emmanouil Kritharakis <
>> kritharakismanolis@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I hope this email finds you well!
>>>
>>> I have a simple dataflow in which I read from a kafka topic, perform a
>>> map transformation and then I write the result to another topic. Based on
>>> your documentation here
>>> <https://spark.apache.org/docs/3.3.2/structured-streaming-kafka-integration.html#content>,
>>> I need to work with Dataset data structures. Even though my solution works,
>>> I need to utilize map transformation asynchronously. So my question is how
>>> can I asynchronously call map transformation with Dataset data structures
>>> in a java structured streaming environment? Can you please share a working
>>> example?
>>>
>>> I am looking forward to hearing from you as soon as possible. Thanks in
>>> advance!
>>>
>>> Kind regards
>>>
>>> ------------------------------------------------------------------
>>>
>>> Emmanouil (Manos) Kritharakis
>>>
>>> Ph.D. candidate in the Department of Computer Science
>>> <https://sites.bu.edu/casp/people/ekritharakis/>
>>>
>>> Boston University
>>>
>>

Re: Question related to asynchronously map transformation using java spark structured streaming

Posted by Sean Owen <sr...@gmail.com>.

What do you mean by asynchronously here?

On Sun, Mar 26, 2023, 10:22 AM Emmanouil Kritharakis <
kritharakismanolis@gmail.com> wrote:

> Hello again,
>
> Do we have any news for the above question?
> I would really appreciate it.
>
> Thank you,
>
> ------------------------------------------------------------------
>
> Emmanouil (Manos) Kritharakis
>
> Ph.D. candidate in the Department of Computer Science
> <https://sites.bu.edu/casp/people/ekritharakis/>
>
> Boston University
>
>
> On Tue, Mar 14, 2023 at 12:04 PM Emmanouil Kritharakis <
> kritharakismanolis@gmail.com> wrote:
>
>> Hello,
>>
>> I hope this email finds you well!
>>
>> I have a simple dataflow in which I read from a kafka topic, perform a
>> map transformation and then I write the result to another topic. Based on
>> your documentation here
>> <https://spark.apache.org/docs/3.3.2/structured-streaming-kafka-integration.html#content>,
>> I need to work with Dataset data structures. Even though my solution works,
>> I need to utilize map transformation asynchronously. So my question is how
>> can I asynchronously call map transformation with Dataset data structures
>> in a java structured streaming environment? Can you please share a working
>> example?
>>
>> I am looking forward to hearing from you as soon as possible. Thanks in
>> advance!
>>
>> Kind regards
>>
>> ------------------------------------------------------------------
>>
>> Emmanouil (Manos) Kritharakis
>>
>> Ph.D. candidate in the Department of Computer Science
>> <https://sites.bu.edu/casp/people/ekritharakis/>
>>
>> Boston University
>>
>

Re: Question related to asynchronously map transformation using java spark structured streaming

Posted by Emmanouil Kritharakis <kr...@gmail.com>.

Hello again,

Do we have any news for the above question?
I would really appreciate it.

Thank you,

------------------------------------------------------------------

Emmanouil (Manos) Kritharakis

Ph.D. candidate in the Department of Computer Science
<https://sites.bu.edu/casp/people/ekritharakis/>

Boston University


On Tue, Mar 14, 2023 at 12:04 PM Emmanouil Kritharakis <
kritharakismanolis@gmail.com> wrote:

> Hello,
>
> I hope this email finds you well!
>
> I have a simple dataflow in which I read from a kafka topic, perform a map
> transformation and then I write the result to another topic. Based on your
> documentation here
> <https://spark.apache.org/docs/3.3.2/structured-streaming-kafka-integration.html#content>,
> I need to work with Dataset data structures. Even though my solution works,
> I need to utilize map transformation asynchronously. So my question is how
> can I asynchronously call map transformation with Dataset data structures
> in a java structured streaming environment? Can you please share a working
> example?
>
> I am looking forward to hearing from you as soon as possible. Thanks in
> advance!
>
> Kind regards
>
> ------------------------------------------------------------------
>
> Emmanouil (Manos) Kritharakis
>
> Ph.D. candidate in the Department of Computer Science
> <https://sites.bu.edu/casp/people/ekritharakis/>
>
> Boston University
>