You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Prasad Bhalerao <pr...@gmail.com> on 2022/01/25 12:09:42 UTC

Bottlenecks in spark application (Spark version 3.0)

Hello,
I am looking at some 3rd party project code which is taking a huge time to
complete. I am going thro' this code to understand business logic and find
out the reason for this. But to get some quick details I decided to profile
it.
This spark application is spending too much time in serializing (16%) and
deserializing (26%) the task. If you look at the readObject stack, its
going recursive very deep. I tried to profile using uber jvm profiler and
created the flame graph. But in the flame graph, I saw very less time being
spent in actual business logic. So I used Jprofiler to profile. Please
check the following jprofiler's cpu views snapshot for details.


[image: image.png]

Please check the following snapshot as well. ClassLoader is also taking 11%
of total execution time. So out of total execution time 16%+26%+11% = 53%
of time is just being spent in deserialization of tasks, serialization of
tasks and class loading.

[image: image.png]

Re: Bottlenecks in spark application (Spark version 3.0)

Posted by Prasad Bhalerao <pr...@gmail.com>.

Hi,

Tried Kryos as well, but  did not see any improvement.
Is there any way to list the size of all closures? I was debugging the
org.apache.spark.scheduler.DAGScheduler.submitMissingTasks() method to
check the task size by inspecting taskBinaryBytes to check the sizes of few
tasks but could not find the exact reason for latency. Still working on it
but any pointer would be helpful. What should be the ideal size of closure?
I am new to spark to please bear with me.

Regards,
Prasad

On Tue, Jan 25, 2022 at 7:06 PM Sean Owen <sr...@gmail.com> wrote:

> Please don't ping the list.
> This says you are spending a lot of time in Java serialization. You
> haven't changed the default serialization from kryo right? It's possible
> you have very large closures in your app. Really hard to say just from this.
>
> On Tue, Jan 25, 2022 at 6:18 AM Prasad Bhalerao <
> prasadbhalerao1983@gmail.com> wrote:
>
>> Hello,
>>
>> Can someone please help me to indetify the exact problem ?
>>
>> I am looking at some 3rd party project code which is taking a
>> huge time to complete. I am going thro' this code to understand business
>> logic and find out the reason for this. But to get some quick details I
>> decided to profile it.
>> This spark application is spending too much time in serializing (16%) and
>> deserializing (26%) the task. If you look at the readObject stack, its
>> going recursive very deep. I tried to profile using uber jvm profiler and
>> created the flame graph. But in the flame graph, I saw very less time being
>> spent in actual business logic. So I used Jprofiler to profile. Please
>> check the following jprofiler's cpu views snapshot for details.
>>
>>
>> [image: image.png]
>>
>> Please check the following snapshot as well. ClassLoader is also taking
>> 11% of total execution time. So out of total execution time 16%+26%+11% =
>> 53% of time is just being spent in deserialization of tasks,
>> serialization of tasks and class loading.
>>
>> [image: image.png]
>>
>

Re: Bottlenecks in spark application (Spark version 3.0)

Posted by Sean Owen <sr...@gmail.com>.

Please don't ping the list.
This says you are spending a lot of time in Java serialization. You haven't
changed the default serialization from kryo right? It's possible you have
very large closures in your app. Really hard to say just from this.

On Tue, Jan 25, 2022 at 6:18 AM Prasad Bhalerao <
prasadbhalerao1983@gmail.com> wrote:

> Hello,
>
> Can someone please help me to indetify the exact problem ?
>
> I am looking at some 3rd party project code which is taking a huge time to
> complete. I am going thro' this code to understand business logic and find
> out the reason for this. But to get some quick details I decided to profile
> it.
> This spark application is spending too much time in serializing (16%) and
> deserializing (26%) the task. If you look at the readObject stack, its
> going recursive very deep. I tried to profile using uber jvm profiler and
> created the flame graph. But in the flame graph, I saw very less time being
> spent in actual business logic. So I used Jprofiler to profile. Please
> check the following jprofiler's cpu views snapshot for details.
>
>
> [image: image.png]
>
> Please check the following snapshot as well. ClassLoader is also taking
> 11% of total execution time. So out of total execution time 16%+26%+11% =
> 53% of time is just being spent in deserialization of tasks,
> serialization of tasks and class loading.
>
> [image: image.png]
>

Bottlenecks in spark application (Spark version 3.0)

Posted by Prasad Bhalerao <pr...@gmail.com>.

Hello,

Can someone please help me to indetify the exact problem ?

I am looking at some 3rd party project code which is taking a huge time to
complete. I am going thro' this code to understand business logic and find
out the reason for this. But to get some quick details I decided to profile
it.
This spark application is spending too much time in serializing (16%) and
deserializing (26%) the task. If you look at the readObject stack, its
going recursive very deep. I tried to profile using uber jvm profiler and
created the flame graph. But in the flame graph, I saw very less time being
spent in actual business logic. So I used Jprofiler to profile. Please
check the following jprofiler's cpu views snapshot for details.


[image: image.png]

Please check the following snapshot as well. ClassLoader is also taking 11%
of total execution time. So out of total execution time 16%+26%+11% = 53%
of time is just being spent in deserialization of tasks, serialization of
tasks and class loading.

[image: image.png]