You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/03/23 05:50:25 UTC

[jira] [Created] (SPARK-14091) Consider improving performance of SparkContext.getCallSite()

Rajesh Balamohan created SPARK-14091:
----------------------------------------

             Summary: Consider improving performance of SparkContext.getCallSite()
                 Key: SPARK-14091
                 URL: https://issues.apache.org/jira/browse/SPARK-14091
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
            Reporter: Rajesh Balamohan


Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().

{noformat}
  private[spark] def getCallSite(): CallSite = {
    val callSite = Utils.getCallSite()
    CallSite(
      Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
      Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
    )
  }
{noformat}

However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite().  But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed.  This would impact when lots of RDDs are created (e.g spends close to 3-7 seconds when 1000+ are RDDs are present, which can have significant impact when entire query runtime is in the order of 10-20 seconds)

Creating this jira to consider evaluating getCallSite only when needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org