You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2014/04/01 00:35:55 UTC

[GitHub] spark pull request: Do not re-use objects in the EdgePartition/Edg...

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/276#issuecomment-39150913
  
    Thanks for sending this in. It's something that is extremely confusing in graphx and we need to fix it. However, I am not sure if taking out the object reuse in edges is the way to fix this problem. 
    
    This is actually hard to test in micro benchmarks, because you rarely get GCs in micro benchmarks. In these cases where an edge/triplet is returned by an iterator, JVM's escape analysis cannot capture the scope and cannot do on-stack allocation. As a result, there are lots of temporary objects allocated in the heap.
    
    Allocations of short-lived objects are supposed to be cheap. The most expensive objects to gc are medium-lived objects. However, the more temporary objects we allocate, the more frequent a young gen gc happens. The more frequent young gen gc happens, the more likely for random objects to become medium lived. 
    
    Maybe a better way to fix this is to leave the object reuse as is, but in all places where we return the object to the user, we should make sure it copies it.
    
    I just looked at the code, and I think we can accomplish that by just adding a copy to EdgeRDD.compute, i.e.
    ```scala
      override def compute(part: Partition, context: TaskContext): Iterator[Edge[ED]] = {
        firstParent[(PartitionID, EdgePartition[ED])].iterator(part, context).next._2.iterator.map(_.copy())
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---