You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jeffrey Shmain (JIRA)" <ji...@apache.org> on 2016/11/10 20:45:58 UTC

[jira] [Created] (SPARK-18404) RPC call from executor to driver blocks when getting map output locations (Netty Only)

Jeffrey Shmain created SPARK-18404:
--------------------------------------

             Summary: RPC call from executor to driver blocks when getting map output locations (Netty Only)
                 Key: SPARK-18404
                 URL: https://issues.apache.org/jira/browse/SPARK-18404
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.6.0
            Reporter: Jeffrey Shmain


Compared identical application run on Spark 1.5 and Spark 1.6.  Noticed that jobs became slower. After looking at it closer, found that 75% of tasks finished same or above, and 25% had significant delays (unrelated to data skew and GC)

After more debugging noticed that the executors are blocking for few seconds (sometimes 25) on this call:

https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L199

       logInfo("Doing the fetch; tracker endpoint = " + trackerEndpoint)
        // This try-finally prevents hangs due to timeouts:
        try {
          val fetchedBytes = askTracker[Array[Byte]](GetMapOutputStatuses(shuffleId))
          fetchedStatuses = MapOutputTracker.deserializeMapStatuses(fetchedBytes)
          logInfo("Got the output locations")

So the regression seems to be related changing the default from from Akka to Netty.  

This was an application working with RDDs, submitting 10 concurrent queries at a time.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org