You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Prashant Sharma <pr...@plume.com> on 2018/09/06 04:13:57 UTC
Spark Streaming RDD Cleanup too slow
I have a Spark Streaming job which takes too long to delete temp RDD's. I
collect about 4MM telemetry metrics per minute and do minor aggregations in
the Streaming Job.
I am using Amazon R4 instances. The Driver RPC call although Async,i
believe, is slow getting the handle for future object at "askAsync call.
Here is the Spark code which does the cleanup -
https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala#L125
Any chance anyone else encountered similar issue with their Streaming jobs?
About 20% of our time (~60 secs) is spent in cleaning the temp RDDs.
best,
Prashant