You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Debasish Das <de...@gmail.com> on 2018/02/04 20:34:53 UTC
High task deserialization time

Hi,

Attached is a plot of high task deserialization from a treeAggregate stage.
Any pointers to optimize it further will be of great help.

I have checked if the jars are getting downloaded on executors and they are
getting downloaded:

18/02/04 19:36:24 INFO executor.Executor: Adding
file:/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/container_1513365768637_11452_01_000011/./lucene-core-6.5.1.jar
to class loader
18/02/04 19:36:24 INFO executor.Executor: Fetching
http://146.1.180.8:58268/jars/trapezium-dal-1.3.0.jar with timestamp
1517772929263*18/02/04 19:36:24 INFO util.Utils: Fetching
http://146.1.180.8:58268/jars/trapezium-dal-1.3.0.jar
<http://146.1.180.8:58268/jars/trapezium-dal-1.3.0.jar> to
/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/spark-31f30235-1dd0-445b-b787-bb09b46a0494/fetchFileTemp1627909873930687004.tmp
18/02/04 19:36:24 INFO util.Utils: Copying
/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/spark-31f30235-1dd0-445b-b787-bb09b46a0494/-9651954071517772929263_cache
to /data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/container_1513365768637_11452_01_000011/./trapezium-dal-1.3.0.jar
*18/02/04 19:36:24 INFO executor.Executor: Adding
file:/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/container_1513365768637_11452_01_000011/./trapezium-dal-1.3.0.jar
to class loader
18/02/04 19:36:24 INFO executor.Executor: Fetching
http://146.1.180.8:58268/jars/lucene-queryparser-6.5.1.jar with
timestamp 1517772929262
18/02/04 19:36:24 INFO util.Utils: Fetching
http://146.1.180.8:58268/jars/lucene-queryparser-6.5.1.jar to
/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/spark-31f30235-1dd0-445b-b787-bb09b46a0494/fetchFileTemp2483104863263290828.tmp
18/02/04 19:36:24 INFO util.Utils: Copying
/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/spark-31f30235-1dd0-445b-b787-bb09b46a0494/11283393001517772929262_cache
to /data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/container_1513365768637_11452_01_000011/./lucene-queryparser-6.5.1.jar
18/02/04 19:36:24 INFO executor.Executor: Adding
file:/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/container_1513365768637_11452_01_000011/./lucene-queryparser-6.5.1.jar
to class loader
18/02/04 19:36:24 INFO executor.Executor: Fetching
http://146.1.180.8:58268/jars/lucene-analyzers-common-6.5.1.jar with
timestamp 1517772929252
18/02/04 19:36:24 INFO util.Utils: Fetching
http://146.1.180.8:58268/jars/lucene-analyzers-common-6.5.1.jar to
/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/spark-31f30235-1dd0-445b-b787-bb09b46a0494/fetchFileTemp1225559656466540565.tmp
18/02/04 19:36:24 INFO util.Utils: Copying
/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/spark-31f30235-1dd0-445b-b787-bb09b46a0494/-14518785141517772929252_cache
to /data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/container_1513365768637_11452_01_000011/./lucene-analyzers-common-6.5.1.jar
18/02/04 19:36:24 INFO executor.Executor: Adding
file:/data0/yarn/nm/usercache/palomar/appcache/application_1513365768637_11452/container_1513365768637_11452_01_000011/./lucene-analyzers-common-6.5.1.jar
to class loader


For each closure from the code, does it need to scan the jar every
time to deserialize the class ? Will it help if I bundle these jars as
part of spark libraries for example ?


I am using spark-as-a-service and the flow is similar to SparkSQL
queries. The details of the flow are over here:


https://www.slideshare.net/SparkSummit/realtime-analytical-query-processing-and-predictive-model-building-on-high-dimensional-document-datasets-with-timestamps-talk-by-debasish-das


Thanks.

Deb