You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Oleksii Kostyliev (JIRA)" <ji...@apache.org> on 2015/04/29 13:39:07 UTC

[jira] [Commented] (SPARK-7233) ClosureCleaner#clean blocks concurrent job submitter threads

    [ https://issues.apache.org/jira/browse/SPARK-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519169#comment-14519169 ] 

Oleksii Kostyliev commented on SPARK-7233:
------------------------------------------

To illustrate the issue, I performed a test against local Spark.
Attached is the screenshot from the Threads view in Yourkit profiler.
The test was generating only 20 concurrent requests.
As you can see, job submitter threads mainly spend their time being blocked by each other.

> ClosureCleaner#clean blocks concurrent job submitter threads
> ------------------------------------------------------------
>
>                 Key: SPARK-7233
>                 URL: https://issues.apache.org/jira/browse/SPARK-7233
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.1, 1.4.0
>            Reporter: Oleksii Kostyliev
>         Attachments: blocked_threads_closurecleaner.png
>
>
> {{org.apache.spark.util.ClosureCleaner#clean}} method contains logic to determine if Spark is run in interpreter mode: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L120
> While this behavior is indeed valuable in particular situations, in addition to this it causes concurrent submitter threads to be blocked on a native call to {{java.lang.Class#forName0}} since it appears only 1 thread at a time can make the call.
> This becomes a major issue when you have multiple threads concurrently submitting short-lived jobs. This is one of the patterns how we use Spark in production, and the number of parallel requests is expected to be quite high, up to a couple of thousand at a time.
> A typical stacktrace of a blocked thread looks like:
> {code}
> http-bio-8091-exec-14 [BLOCKED] [DAEMON]
> java.lang.Class.forName0(String, boolean, ClassLoader, Class) Class.java (native)
> java.lang.Class.forName(String) Class.java:260
> org.apache.spark.util.ClosureCleaner$.clean(Object, boolean) ClosureCleaner.scala:122
> org.apache.spark.SparkContext.clean(Object, boolean) SparkContext.scala:1623
> org.apache.spark.rdd.RDD.reduce(Function2) RDD.scala:883
> org.apache.spark.rdd.RDD.takeOrdered(int, Ordering) RDD.scala:1240
> org.apache.spark.api.java.JavaRDDLike$class.takeOrdered(JavaRDDLike, int, Comparator) JavaRDDLike.scala:586
> org.apache.spark.api.java.AbstractJavaRDDLike.takeOrdered(int, Comparator) JavaRDDLike.scala:46
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org