You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by patcharee <Pa...@uni.no> on 2016/07/21 09:35:56 UTC

what contribute to Task Deserialization Time

Hi,

I'm running a simple job (reading sequential file and collect data at 
the driver) with yarn-client mode. When looking at the history server 
UI, Task Deserialization Time of tasks are quite different (5 ms to 5 
s). What contribute to this Task Deserialization Time?

Thank you in advance!

Patcharee



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: what contribute to Task Deserialization Time

Posted by Silvio Fiorito <si...@granturing.com>.
Are you referencing member variables or other objects of your driver in your transformations? Those would have to be serialized and shipped to each executor when that job kicks off.

On 7/22/16, 8:54 AM, "Jacek Laskowski" <ja...@japila.pl> wrote:

Hi,

I can't specifically answer your question, but my understanding of
Task Deserialization Time is that it's time to deserialize a
serialized task from the driver before it gets run. See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L236
and on.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jul 21, 2016 at 11:35 AM, patcharee <Pa...@uni.no> wrote:
> Hi,
>
> I'm running a simple job (reading sequential file and collect data at the
> driver) with yarn-client mode. When looking at the history server UI, Task
> Deserialization Time of tasks are quite different (5 ms to 5 s). What
> contribute to this Task Deserialization Time?
>
> Thank you in advance!
>
> Patcharee
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org




Re: what contribute to Task Deserialization Time

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

I can't specifically answer your question, but my understanding of
Task Deserialization Time is that it's time to deserialize a
serialized task from the driver before it gets run. See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L236
and on.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jul 21, 2016 at 11:35 AM, patcharee <Pa...@uni.no> wrote:
> Hi,
>
> I'm running a simple job (reading sequential file and collect data at the
> driver) with yarn-client mode. When looking at the history server UI, Task
> Deserialization Time of tasks are quite different (5 ms to 5 s). What
> contribute to this Task Deserialization Time?
>
> Thank you in advance!
>
> Patcharee
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org