You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2015/10/26 15:01:27 UTC

[jira] [Created] (SPARK-11316) isEmpty before coalesce seems to cause huge performance issue in setupGroups

Thomas Graves created SPARK-11316:
-------------------------------------

             Summary: isEmpty before coalesce seems to cause huge performance issue in setupGroups
                 Key: SPARK-11316
                 URL: https://issues.apache.org/jira/browse/SPARK-11316
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.5.1
            Reporter: Thomas Graves


So I haven't fully debugged this yet but reporting what I'm seeing and think might be going on.

I have a graph processing job that is seeing huge slow down in setupGroups in the location iterator where its getting the preferred locations for the coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ hours to do the calculation.  Killed it at this point so don't know total time.

It appears that the job is doing an isEmpty call, a bunch of other transformation, then a coalesce (where it takes so long), other transformations, then finally a count to trigger it.   

It appears that there is only one node that its finding in the setupGroup call and to get to that node it has to first to through the while loop:

    while (numCreated < targetLen && tries < expectedCoupons2) {
where expectedCoupons2 is around 19000.  It finds very few or none in this loop.  

Then it does the second loop:

while (numCreated < targetLen) {  // if we don't have enough partition groups, create duplicates
      var (nxt_replica, nxt_part) = rotIt.next()
      val pgroup = PartitionGroup(nxt_replica)
      groupArr += pgroup
      groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
      var tries = 0
      while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // ensure at least one part
        nxt_part = rotIt.next()._2
        tries += 1
      }
      numCreated += 1
    }

Where it has an inner while loop and both of those are going 1200 times.  1200*1200 loops.  This is taking a very long time.

The user can work around the issue by adding in a count() call very close to after the isEmpty call before the coalesce is called.  I also tried putting in a take(10000)  right before the isEmpty call and it seems to work around the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org