You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by advancedxy <gi...@git.apache.org> on 2015/04/27 16:54:45 UTC

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-96685117
  
    Ping @shivaram, @rxin. I finally got some time to look at the code.
    The ExternalSorterSuite kept failing because there was no spilling when we just insert 100000 elements. 
    I don't know how 100000 was calculated, maybe you guys can give some related info. Before this pr, The size of  an integer instance is 24 bytes on a 64-bit JVM with UseCompressedOops on. Now, It's a correct 16 bytes. The ExternalSorterSuit inserts integers into buffers, so a reduced size of integer class results a smaller total estimated size, hence no spilling. 
    
    Another Note: When I was digging the ExternalSorterSuite and ExternalSorter, I noticed something wrong. The ExternalSorter uses the SizeTrackingPairBuffer, SizeTrackingPairBuffer[(Int, Int), Int] to be specifically in the test suite. The SizeTrackingPairBuffer[(Int, Int), Int] increases only 40 bytes per insertion(estimated by SizeEstimator). The 40 bytes  consisted of 24 bytes of Tuple2 and 16 bytes Integer. I changes the insertion number based on this fact (24 + 24 per insertion to 24 + 16 per insertion, 48 * 100000 = 40 * 120000). However 24 bytes of Tuple2 is not right, It should be the specialized version of Tuple2[Int, Int] which is 32 bytes. It's not SizeEstimator's fault as It can only get the Tuple2 class rather than scala.Tuple2$mcII$sp. I doubt it may be something related to the scala compiler, I will look into this problem or someone more familiar with scala compiler should.
    
    As for this pr, I think it's good to go. Hope we can get it merged before 1.4


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org