You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Srinivas V <sr...@gmail.com> on 2020/05/02 18:37:01 UTC

Re: Spark structured streaming - performance tuning

Hi Alex, read the book , it is a good one but i don’t see things which I
strongly want to understand.
You are right on the partition and tasks.
1.How to use coalesce with spark structured streaming ?

Also I want to ask few more questions,
2. How to restrict number of executors on structured streaming?
 —num-executors is minimum is it ?
To cap max, can I use spark.dynamicAllocation.maxExecutors ?

3. Does other streaming properties hold good for structured streaming?
Like spark.streaming.dynamicAllocation.enabled ?
If not what are the ones it takes into consideration?

4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/
cores? In case of Kafka consumer, when the cluster has to scale down, does
it reconfigure the mapping of executors cores to kaka partitions?

5. Why spark srtructured  Streaming web ui (SQL tab) is not so informative
like streaming tab of Spark streaming ?

It would be great if these questions are answered, otherwise the only
option left would be to go through the spark code and figure out.

On Sat, Apr 18, 2020 at 1:09 PM Alex Ott <al...@gmail.com> wrote:

> Just to clarify - I didn't write this explicitly in my answer. When you're
> working with Kafka, every partition in Kafka is mapped into Spark
> partition. And in Spark, every partition is mapped into task.   But you can
> use `coalesce` to decrease the number of Spark partitions, so you'll have
> less tasks...
>
> Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
>  SV> Thank you Alex. I will check it out and let you know if I have any
> questions
>
>  SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <al...@gmail.com> wrote:
>
>  SV>     http://shop.oreilly.com/product/0636920047568.do has quite good
> information
>  SV>     on it.  For Kafka, you need to start with approximation that
> processing of
>  SV>     each partition is a separate task that need to be executed, so
> you need to
>  SV>     plan number of cores correspondingly.
>  SV>
>  SV>     Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
>  SV>      SV> Hello,
>  SV>      SV> Can someone point me to a good video or document which takes
> about performance tuning for structured streaming app?
>  SV>      SV> I am looking especially for listening to Kafka topics say 5
> topics each with 100 portions .
>  SV>      SV> Trying to figure out best cluster size and number of
> executors and cores required.
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: Spark structured streaming - performance tuning

Posted by Srinivas V <sr...@gmail.com>.

Anyone else can answer below questions on performance tuning Structured
streaming?
@Jacek?

On Sun, May 3, 2020 at 12:07 AM Srinivas V <sr...@gmail.com> wrote:

> Hi Alex, read the book , it is a good one but i don’t see things which I
> strongly want to understand.
> You are right on the partition and tasks.
> 1.How to use coalesce with spark structured streaming ?
>
> Also I want to ask few more questions,
> 2. How to restrict number of executors on structured streaming?
>  —num-executors is minimum is it ?
> To cap max, can I use spark.dynamicAllocation.maxExecutors ?
>
> 3. Does other streaming properties hold good for structured streaming?
> Like spark.streaming.dynamicAllocation.enabled ?
> If not what are the ones it takes into consideration?
>
> 4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/
> cores? In case of Kafka consumer, when the cluster has to scale down, does
> it reconfigure the mapping of executors cores to kaka partitions?
>
> 5. Why spark srtructured  Streaming web ui (SQL tab) is not so informative
> like streaming tab of Spark streaming ?
>
> It would be great if these questions are answered, otherwise the only
> option left would be to go through the spark code and figure out.
>
> On Sat, Apr 18, 2020 at 1:09 PM Alex Ott <al...@gmail.com> wrote:
>
>> Just to clarify - I didn't write this explicitly in my answer. When you're
>> working with Kafka, every partition in Kafka is mapped into Spark
>> partition. And in Spark, every partition is mapped into task.   But you
>> can
>> use `coalesce` to decrease the number of Spark partitions, so you'll have
>> less tasks...
>>
>> Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
>>  SV> Thank you Alex. I will check it out and let you know if I have any
>> questions
>>
>>  SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <al...@gmail.com> wrote:
>>
>>  SV>     http://shop.oreilly.com/product/0636920047568.do has quite good
>> information
>>  SV>     on it.  For Kafka, you need to start with approximation that
>> processing of
>>  SV>     each partition is a separate task that need to be executed, so
>> you need to
>>  SV>     plan number of cores correspondingly.
>>  SV>
>>  SV>     Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
>>  SV>      SV> Hello,
>>  SV>      SV> Can someone point me to a good video or document which
>> takes about performance tuning for structured streaming app?
>>  SV>      SV> I am looking especially for listening to Kafka topics say 5
>> topics each with 100 portions .
>>  SV>      SV> Trying to figure out best cluster size and number of
>> executors and cores required.
>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>