You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by advancedxy <gi...@git.apache.org> on 2015/02/26 07:58:41 UTC

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

GitHub user advancedxy opened a pull request:

    https://github.com/apache/spark/pull/4783

    [SPARK-6030][CORE] Don't alignSize the computed shellSize for classes in the getClassInfo method

    SizeEstimator gives wrong result for Integer on 64bit JVM with UseCompressedOops on, this pr fixes that. For more details, please refer [SPARK-6030](https://issues.apache.org/jira/browse/SPARK-6030)
    @sryza, I noticed there is a pr to expose SizeEstimator, maybe that should be waited by this pr get merged if we confirm this problem.
    And @shivaram would you mind to review this pr since you contribute related code. Also cc to @srowen and @mateiz 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/advancedxy/spark SPARK-6030

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4783
    
----
commit 8a127dc55ab1e8d88e940384e3b37839b45fda97
Author: Ye Xianjin <ad...@gmail.com>
Date:   2015-02-26T06:40:08Z

    Don't alignSize for objects' shellSize, alignSize when added to state.size. Add
    some primitive wrapper objects size tests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76262495
  
    @advancedxy -- Yeah it sounds good to me to use 4-byte shell size alignment without CompressedOops and 8 bytes otherwise (we can wait to see if there are other opinions). If we want to be extra careful we can put this inside a `if(isHotspot)` check.
    
    BTW it would be great if you can check if this is consistent across Java 1.6, 1.7, 1.8 HotSpot versions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25969340
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    
    +    // we have only four sizes: 1, 2, 4, 8. pointerSize can be 4 or 8.
    +    // we can just use an Array.fill(8 + 1) to hold all the size counting, but we
    +    // can also use a HashMap to store size counting; Here we uses Array.
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    --- End diff --
    
    You can delete the two lines 
    ```
    Aleksey Shiplev also leaders the development of JOL (Java Object 
    Layout) tool. The simulated code is also in the JOL. 
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-96815195
  
    Thanks @advancedxy for fixing the tests. This change LGTM. @rxin @srowen Could you also take a final look at this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-83749071
  
    What do you mean in practice Tuple2 is 24 bytes? How did you measure that? If the object layout tool shows it takes 32 bytes, doesn't it just take 32 bytes? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78406725
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28486/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25969222
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    
    +    // we have only four sizes: 1, 2, 4, 8. pointerSize can be 4 or 8.
    --- End diff --
    
    I don't think you need this comment. Its pretty self explanatory that we are counting how many times each fieldSize + pointers occur


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78407837
  
    I just run test-only for SizeEstimatorSuite. It do pass the tests. I thought the ExternalSorter was unrelated. I will test the ExternalSorter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25840206
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    
    +    // we have only four sizes: 1, 2, 4, 8. pointerSize can be 4 or 8.
    +    // we can just use an Array.fill(8 + 1) to hold all the size counting, but we
    +    // can also use a HashMap to store size counting; Here we uses Array.
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    +    // http://cr.openjdk.java.net/~shade/papers/2013-shipilev-fieldlayout-latest.pdf
    +    // The code is in Figure 9. Aleksey Shiplev also leaders the development of JOL (Java Object 
    +    // Layout) tool. The simulated code is also in the JOL. The simplified idea of field layout 
    +    // consists of these parts (see more details in the report):
    +    // 1. field alignment: HotSpot lay out the fields aligned by their size.
    +    // 2. object alignment: HotSpot rounds instance size up to 8 bytes
    +    // 3. consistent field layouts throughout the hierarchy: This means we should layout 
    +    // superclass first. And we use superclass's shellSize as a start point to layout the
    +    // other fields in this class.
    +    // 4. class alignment: HotSpot rounds field blocks upto to HeapOopSize not 4 bytes, confirmed
    +    // with Aleksey. see https://bugs.openjdk.java.net/browse/CODETOOLS-7901322
    +    //
    +    // The real world field layout is much more complicated. There are three kinds of fields
    +    // order in Java 8. And we don't consider the @contended annotation introduced by Java 8.
    +    // see the HotSpot classloader code, layout_fields method for more details.
    +    // hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/classfile/classFileParser.cpp
    --- End diff --
    
    It exceeds the 100 characters. So, I remove the first "http://" characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76139869
  
    Ok, Thanks for the update. I will look into the jol and the example when I get some spare time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

https://github.com/apache/spark/pull/4783#issuecomment-96685117

Ping @shivaram, @rxin. I finally got some time to look at the code.
The ExternalSorterSuite kept failing because there was no spilling when we just insert 100000 elements.
I don't know how 100000 was calculated, maybe you guys can give some related info. Before this pr, The size of an integer instance is 24 bytes on a 64-bit JVM with UseCompressedOops on. Now, It's a correct 16 bytes. The ExternalSorterSuit inserts integers into buffers, so a reduced size of integer class results a smaller total estimated size, hence no spilling.

Another Note: When I was digging the ExternalSorterSuite and ExternalSorter, I noticed something wrong. The ExternalSorter uses the SizeTrackingPairBuffer, SizeTrackingPairBuffer[(Int, Int), Int] to be specifically in the test suite. The SizeTrackingPairBuffer[(Int, Int), Int] increases only 40 bytes per insertion(estimated by SizeEstimator). The 40 bytes consisted of 24 bytes of Tuple2 and 16 bytes Integer. I changes the insertion number based on this fact (24 + 24 per insertion to 24 + 16 per insertion, 48 * 100000 = 40 * 120000). However 24 bytes of Tuple2 is not right, It should be the specialized version of Tuple2[Int, Int] which is 32 bytes. It's not SizeEstimator's fault as It can only get the Tuple2 class rather than scala.Tuple2$mcII$sp. I doubt it may be something related to the scala compiler, I will look into this problem or someone more familiar with scala compiler should.

As for this pr, I think it's good to go. Hope we can get it merged before 1.4

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25840258
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -283,4 +323,7 @@ private[spark] object SizeEstimator extends Logging {
         val rem = size % ALIGN_SIZE
         if (rem == 0) size else (size + ALIGN_SIZE - rem)
       }
    +
    +  private def alignSizeUp(size: Long, alignSize: Int): Long =
    --- End diff --
    
    I think this code is faster than the previous one and it's self-documentary. Do I need to replace the original one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77625383
  
      [Test build #28351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28351/consoleFull) for   PR 4783 at commit [`a55f3e1`](https://github.com/apache/spark/commit/a55f3e1655b3b6ebef1f0d1bc615654a6803f7c8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-96769562
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77135811
  
      [Test build #28259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28259/consoleFull) for   PR 4783 at commit [`4b8aa90`](https://github.com/apache/spark/commit/4b8aa90da94d1b416f2d7144222c102f9cef1b3b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78410311
  
    @shivaram ExternalSorterSuite failed on my local machine too. Do you have any thought why ExternalSortedSuite is failing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76190779
  
    @shivaram -- I followed the [JDK-8024912](https://bugs.openjdk.java.net/browse/JDK-8024912), [JDK-8024913](https://bugs.openjdk.java.net/browse/JDK-8024913) and [JDK-7007564](https://bugs.openjdk.java.net/browse/JDK-7007564), It looks like non-static field size is aligned to wordSize(8 bytes on 64bit JVM without UseCompressedOops, 4bytes otherwise, verified by the Sizeof code I mentioned on the jira). And the hotspot JVM wouldn't change that in short time.
    
    This is the hotspot version, and I have no ideas about other version's of JVM. But I think we can use this setting up for now just like we did with the ALIGN_SIZE? 
    
    And this is what I get for now. If we (including others or some JVM experts) can agree on that, I will update the code. So, waiting for others' opinion now 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76140139
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27992/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76135015
  
      [Test build #27992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27992/consoleFull) for   PR 4783 at commit [`8a127dc`](https://github.com/apache/spark/commit/8a127dc55ab1e8d88e940384e3b37839b45fda97).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77308814
  
    @shivaram, I add a bunch of comments in the code. I think it's time for you to review it now. And the previous failures are not related to this code change. Wonder why that would happen.
    Also, I think you should review the wording in the comments as I am not a native speaker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78407056
  
    @advancedxy Do the tests pass on your local machine ? It looks the ExternalSorter uses a size-tracking hash map which in turn depends on the SizeEstimator. So I am wondering if the test failure is actually related to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-96775081
  
      [Test build #31061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31061/consoleFull) for   PR 4783 at commit [`db1e948`](https://github.com/apache/spark/commit/db1e948097b202573cac16a23a8bf22f1d6e2a5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76134709
  
    Jenkins, this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r26208393
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,52 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    +    // http://cr.openjdk.java.net/~shade/papers/2013-shipilev-fieldlayout-latest.pdf
    +    // The code is in Figure 9.
    +    // The simplified idea of field layout consists of 4 parts (see more details in the report):
    +    //
    +    // 1. field alignment: HotSpot lays out the fields aligned by their size.
    +    // 2. object alignment: HotSpot rounds instance size up to 8 bytes
    +    // 3. consistent fields layouts throughout the hierarchy: This means we should layout
    +    // superclass first. And we can use superclass's shellSize as a starting point to layout the
    +    // other fields in this class.
    +    // 4. class alignment: HotSpot rounds field blocks up to to HeapOopSize not 4 bytes, confirmed
    +    // with Aleksey. see https://bugs.openjdk.java.net/browse/CODETOOLS-7901322
    +    //
    +    // The real world field layout is much more complicated. There are three kinds of fields
    +    // order in Java 8. And we don't consider the @contended annotation introduced by Java 8.
    +    // see the HotSpot classloader code, layout_fields method for more details.
    +    // hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/classfile/classFileParser.cpp
    +    var alignedSize = shellSize
    +    for (size <- fieldSizes if sizeCount(size) > 0) {
    +      val count = sizeCount(size)
    +      // If there are internal gaps, smaller field can fit in.
    +      alignedSize = math.max(alignedSize, alignSizeUp(shellSize, size) + size * count)
    +      shellSize += size * count
    +    }
    +
    +    // we should choose a larger size and clearly alignedSize >= shellSize.
    +    shellSize = alignedSize
    +
    +    // round up instance filed blocks
    +    shellSize = alignSizeUp(shellSize, pointerSize)
    --- End diff --
    
    Can the first arg just be `alignedSize`? I found it slightly funny to see the variable assigned twice.
    Otherwise looking good to me though I'm not so familiar with this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77312747
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28277/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25408258
  
    --- Diff: core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala ---
    @@ -59,6 +59,13 @@ class SizeEstimatorSuite
         assertResult(24)(SizeEstimator.estimate(new DummyClass4(null)))
         assertResult(48)(SizeEstimator.estimate(new DummyClass4(new DummyClass3)))
       }
    +  
    +  test("primitive wrapper objects") {
    --- End diff --
    
    Should I just use primitive wrapper classes or create dummy classes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r26208786
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,52 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    +    // http://cr.openjdk.java.net/~shade/papers/2013-shipilev-fieldlayout-latest.pdf
    +    // The code is in Figure 9.
    +    // The simplified idea of field layout consists of 4 parts (see more details in the report):
    +    //
    +    // 1. field alignment: HotSpot lays out the fields aligned by their size.
    +    // 2. object alignment: HotSpot rounds instance size up to 8 bytes
    +    // 3. consistent fields layouts throughout the hierarchy: This means we should layout
    +    // superclass first. And we can use superclass's shellSize as a starting point to layout the
    +    // other fields in this class.
    +    // 4. class alignment: HotSpot rounds field blocks up to to HeapOopSize not 4 bytes, confirmed
    +    // with Aleksey. see https://bugs.openjdk.java.net/browse/CODETOOLS-7901322
    +    //
    +    // The real world field layout is much more complicated. There are three kinds of fields
    +    // order in Java 8. And we don't consider the @contended annotation introduced by Java 8.
    +    // see the HotSpot classloader code, layout_fields method for more details.
    +    // hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/classfile/classFileParser.cpp
    +    var alignedSize = shellSize
    +    for (size <- fieldSizes if sizeCount(size) > 0) {
    +      val count = sizeCount(size)
    +      // If there are internal gaps, smaller field can fit in.
    +      alignedSize = math.max(alignedSize, alignSizeUp(shellSize, size) + size * count)
    +      shellSize += size * count
    +    }
    +
    +    // we should choose a larger size and clearly alignedSize >= shellSize.
    +    shellSize = alignedSize
    +
    +    // round up instance filed blocks
    +    shellSize = alignSizeUp(shellSize, pointerSize)
    --- End diff --
    
    Yes. I thought just using the alignedSize in the first arg before. But like the comment said, It has two steps. The first assignment is to get the internal aligned size. The second assignment is to round up instance field blocks. I thought it wold be more clear to assigned twice. I will change it if you think it's funny.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77312744
  
      [Test build #28277 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28277/consoleFull) for   PR 4783 at commit [`a55f3e1`](https://github.com/apache/spark/commit/a55f3e1655b3b6ebef1f0d1bc615654a6803f7c8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25969639
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    
    +    // we have only four sizes: 1, 2, 4, 8. pointerSize can be 4 or 8.
    +    // we can just use an Array.fill(8 + 1) to hold all the size counting, but we
    +    // can also use a HashMap to store size counting; Here we uses Array.
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    +    // http://cr.openjdk.java.net/~shade/papers/2013-shipilev-fieldlayout-latest.pdf
    +    // The code is in Figure 9. Aleksey Shiplev also leaders the development of JOL (Java Object 
    +    // Layout) tool. The simulated code is also in the JOL. The simplified idea of field layout 
    +    // consists of these parts (see more details in the report):
    +    // 1. field alignment: HotSpot lay out the fields aligned by their size.
    +    // 2. object alignment: HotSpot rounds instance size up to 8 bytes
    +    // 3. consistent field layouts throughout the hierarchy: This means we should layout 
    +    // superclass first. And we use superclass's shellSize as a start point to layout the
    --- End diff --
    
    `as a starting point`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25969836
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -283,4 +323,7 @@ private[spark] object SizeEstimator extends Logging {
         val rem = size % ALIGN_SIZE
         if (rem == 0) size else (size + ALIGN_SIZE - rem)
       }
    +
    +  private def alignSizeUp(size: Long, alignSize: Int): Long =
    --- End diff --
    
    Yeah its fine to replace the original alignSize with this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25968891
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -49,6 +49,11 @@ private[spark] object SizeEstimator extends Logging {
       private val FLOAT_SIZE   = 4
       private val DOUBLE_SIZE  = 8
     
    +  // Fields can be primitive types, sizes are: 1, 2, 4, 8. Or fields can be pointers. The size of 
    +  // a point is 4 or 8 depends on the JVM (32-bit or 64-bit) and UseCompressedOops flag.
    --- End diff --
    
    Nits:
    
    size of point -> size of a pointer
    depends on the JVM -> depending on the JVM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77615616
  
      [Test build #28351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28351/consoleFull) for   PR 4783 at commit [`a55f3e1`](https://github.com/apache/spark/commit/a55f3e1655b3b6ebef1f0d1bc615654a6803f7c8).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77615372
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-96773741
  
    Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-96800849
  
      [Test build #31061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31061/consoleFull) for   PR 4783 at commit [`db1e948`](https://github.com/apache/spark/commit/db1e948097b202573cac16a23a8bf22f1d6e2a5b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25969952
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    
    +    // we have only four sizes: 1, 2, 4, 8. pointerSize can be 4 or 8.
    +    // we can just use an Array.fill(8 + 1) to hold all the size counting, but we
    +    // can also use a HashMap to store size counting; Here we uses Array.
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    +    // http://cr.openjdk.java.net/~shade/papers/2013-shipilev-fieldlayout-latest.pdf
    +    // The code is in Figure 9. Aleksey Shiplev also leaders the development of JOL (Java Object 
    +    // Layout) tool. The simulated code is also in the JOL. The simplified idea of field layout 
    +    // consists of these parts (see more details in the report):
    +    // 1. field alignment: HotSpot lay out the fields aligned by their size.
    +    // 2. object alignment: HotSpot rounds instance size up to 8 bytes
    +    // 3. consistent field layouts throughout the hierarchy: This means we should layout 
    +    // superclass first. And we use superclass's shellSize as a start point to layout the
    +    // other fields in this class.
    +    // 4. class alignment: HotSpot rounds field blocks upto to HeapOopSize not 4 bytes, confirmed
    +    // with Aleksey. see https://bugs.openjdk.java.net/browse/CODETOOLS-7901322
    +    //
    +    // The real world field layout is much more complicated. There are three kinds of fields
    +    // order in Java 8. And we don't consider the @contended annotation introduced by Java 8.
    +    // see the HotSpot classloader code, layout_fields method for more details.
    +    // hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/classfile/classFileParser.cpp
    +    var alignedSize = shellSize
    +    for (size <- fieldSizes if sizeCount(size) > 0) {
    +      val count = sizeCount(size)
    +      // If there are internal gaps, smaller filed can fit in.
    +      alignedSize = (alignSizeUp(shellSize, size) + size * count) max alignedSize
    --- End diff --
    
    This is a code style comment -- In Spark we typically avoid infix operators (https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-InfixMethods)
    
    You can make this `max(alignedSize, ...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78406716
  
      [Test build #28486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28486/consoleFull) for   PR 4783 at commit [`9f92469`](https://github.com/apache/spark/commit/9f924696b3ad536f474d24657302e01e96eacf47).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25992824
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -49,6 +49,11 @@ private[spark] object SizeEstimator extends Logging {
       private val FLOAT_SIZE   = 4
       private val DOUBLE_SIZE  = 8
     
    +  // Fields can be primitive types, sizes are: 1, 2, 4, 8. Or fields can be pointers. The size of
    +  // a pointer is 4 or 8 depending on the JVM (32-bit or 64-bit) and UseCompressedOops flag.
    +  // The sizes should be in descending order, as we will use that information for fields placement.
    +  private val fieldSizes = List(8, 4, 2, 1)
    +
       // Alignment boundary for objects
       // TODO: Is this arch dependent ?
       private val ALIGN_SIZE = 8
    --- End diff --
    
    It's almost always 8 bytes. But, it can be set to other bytes by -XX:ObjectAlignmentInBytes=  . See http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/test/gc/TestObjectAlignment.java, But I'd rather to deal with that in another pr. I am feeling it's a bit of more work here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

https://github.com/apache/spark/pull/4783#issuecomment-78435831

@shivaram The reason why ExternalSorter failed is that It doesn't spilling files for these two failure tests.
( If we increase the input from `0 until 100000` to `0 until 200000`, It will spilling files to disks, which passes the tests) However the input type for sorter is Iterator[(Int, Int)], and the older SizeEstimator gives 32 bytes and the new SizeEstimator gives the same 32 bytes for (Int, Int) (64 bit JVM with UseCompressedOops on assumed). So, It's very wried to see different results.
Seems @mateiz is busy. @jerryshao cloud you take a look at the failed tests, since you wrote some of the tests.

And, I believe there is another bug in the current SizeEstimator, Scala specialized Int, Long, Float, Double for Tuples, So, The size of (Int, Int) should be 24 bytes, rather than 32bytes. Verified by the method introduced by this article http://www.javaworld.com/article/2077496/testing-debugging/java-tip-130--do-you-know-your-data-size-.html.

I will take a look at the size problem.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78401317
  
      [Test build #28486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28486/consoleFull) for   PR 4783 at commit [`9f92469`](https://github.com/apache/spark/commit/9f924696b3ad536f474d24657302e01e96eacf47).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76140128
  
      [Test build #27992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27992/consoleFull) for   PR 4783 at commit [`8a127dc`](https://github.com/apache/spark/commit/8a127dc55ab1e8d88e940384e3b37839b45fda97).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r26208364
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -279,8 +315,8 @@ private[spark] object SizeEstimator extends Logging {
         newInfo
       }
     
    -  private def alignSize(size: Long): Long = {
    -    val rem = size % ALIGN_SIZE
    -    if (rem == 0) size else (size + ALIGN_SIZE - rem)
    -  }
    +  private def alignSize(size: Long): Long = alignSizeUp(size, ALIGN_SIZE)
    +
    +  private def alignSizeUp(size: Long, alignSize: Int): Long =
    +    (size + alignSize - 1) & ~(alignSize - 1)
    --- End diff --
    
    Can you perhaps explain this computation? I get what it does, but it's not obvious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76131588
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78223640
  
    ping @shivaram, @srowen and @mateiz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25969626
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    
    +    // we have only four sizes: 1, 2, 4, 8. pointerSize can be 4 or 8.
    +    // we can just use an Array.fill(8 + 1) to hold all the size counting, but we
    +    // can also use a HashMap to store size counting; Here we uses Array.
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    +    // http://cr.openjdk.java.net/~shade/papers/2013-shipilev-fieldlayout-latest.pdf
    +    // The code is in Figure 9. Aleksey Shiplev also leaders the development of JOL (Java Object 
    +    // Layout) tool. The simulated code is also in the JOL. The simplified idea of field layout 
    +    // consists of these parts (see more details in the report):
    +    // 1. field alignment: HotSpot lay out the fields aligned by their size.
    --- End diff --
    
    reword to - `HotSpot lays out fields aligned by their size`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77672792
  
      [Test build #28363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28363/consoleFull) for   PR 4783 at commit [`cb0af95`](https://github.com/apache/spark/commit/cb0af956daba1ccd9d2633ccdde20163def1ec68).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77670754
  
      [Test build #28363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28363/consoleFull) for   PR 4783 at commit [`cb0af95`](https://github.com/apache/spark/commit/cb0af956daba1ccd9d2633ccdde20163def1ec68).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77671093
  
    @shivaram I update the comment, please review again. And still unrelated test failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76138824
  
    @advancedxy -- Thanks for the detailed writeup and the gist. I can see the problem but I am not sure this is the complete fix.  You are right that shellSizes don't need to be aligned to 8 bytes for superclasses and the alignment only needs to happen on object boundaries.  However I am not sure this completely extends all the way -- i.e. when the super class has say a 1 byte field then I don't think we can start counting from 13 (12 for Object + 1 for byte).
    
    While looking around I found this very nice tool called the Java Object Layout [1] which prints out the layout with alignment, gaps given an object name. You can see an example of the alignment gaps I am talking about in [2]  where super-classes with boolean fields end up being aligned to 4 bytes by hotspot (I have no idea what the IBM / Oracle JVMs do).
    
    I'll try to think if there is a more elegant fix and if not we can go with this.
    
    cc @rxin
    
    [1] http://openjdk.java.net/projects/code-tools/jol/
    [2] http://hg.openjdk.java.net/code-tools/jol/file/6df1a9b2e5da/jol-samples/src/main/java/org/openjdk/jol/samples/JOLSample_06_Gaps.java 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-83878165
  
    @rxin as I mentioned in the previous comment, I use the method in this article.
    http://www.javaworld.com/article/2077496/testing-debugging/java-tip-130--do-you-know-your-data-size-.html
    
    The related code is listed below. The scala library should be included in the classpath
    ```
    // ----------------------------------------------------------------------------
    /**
     * A simple class to experiment with your JVM's garbage collector
     * and memory sizes for various data types.
     *
     * @author <a href="mailto:vlad@trilogy.com">Vladimir Roubtsov</a>
     */
    import scala.Tuple2;
    import scala.Tuple2$mcII$sp;
    public class Sizeof
    {
        public static class DummyClass1 {}
        public static class DummyClass2 extends DummyClass1 {
            public boolean x;
        }
        public static class DummyClass3 extends DummyClass2 {
            public boolean y;
        }
    
        public static void main (String [] args) throws Exception
        {
            // "warm up" all classes/methods that we are going to use:
            runGC ();
            usedMemory ();
    
            // array to keep strong references to allocated objects:
            final int count = 10000; // 10000 or so is enough for small ojects
            Object [] objects = new Object [count];
    
    
            long heap1 = 0;
    
            // allocate count+1 objects, discard the first one:
            for (int i = -1; i < count; ++ i)
            {
                Object object;
    
                // INSTANTIATE YOUR DATA HERE AND ASSIGN IT TO 'object':
    
                object = new Tuple2(1, 2); // This gives 24 bytes. 64 bit vm
                // object = new Tuple2$mcII$sp(1, 2); // This gives 32 bytes. 64 bit vm
    
                // This gives 56 bytes per object.
                // 24B(Tuple2) + 16B(Integer)+ 16B(Integer) = 56B
                // object = new Tuple2(2 * i + 1, 2 * i + 2);
    
                if (i >= 0)
                    objects [i] = object;
                else
                {
                    object = null; // discard the "warmup" object
                    runGC ();
                    heap1 = usedMemory (); // take a "before" heap snapshot
                }
            }
    
            runGC ();
            long heap2 = usedMemory (); // take an "after" heap snapshot:
    
            final int size = Math.round (((float)(heap2 - heap1))/count);
            System.out.println ("'before' heap: " + heap1 +
                                ", 'after' heap: " + heap2);
            System.out.println ("heap delta: " + (heap2 - heap1) +
                ", {" + objects [0].getClass () + "} size = " + size + " bytes");
        }
    
        // a helper method for creating Strings of desired length
        // and avoiding getting tricked by String interning:
        public static String createString (final int length)
        {
            final char [] result = new char [length];
            for (int i = 0; i < length; ++ i) result [i] = (char) i;
    
            return new String (result);
        }
    
        // this is our way of requesting garbage collection to be run:
        // [how aggressive it is depends on the JVM to a large degree, but
        // it is almost always better than a single Runtime.gc() call]
        private static void runGC () throws Exception
        {
            // for whatever reason it helps to call Runtime.gc()
            // using several method calls:
            for (int r = 0; r < 4; ++ r) _runGC ();
        }
    
        private static void _runGC () throws Exception
        {
            long usedMem1 = usedMemory (), usedMem2 = Long.MAX_VALUE;
    
            for (int i = 0; (usedMem1 < usedMem2) && (i < 1000); ++ i)
            {
                s_runtime.runFinalization ();
                s_runtime.gc ();
                Thread.currentThread ().yield ();
    
                usedMem2 = usedMem1;
                usedMem1 = usedMemory ();
            }
        }
    
        private static long usedMemory ()
        {
            return s_runtime.totalMemory () - s_runtime.freeMemory ();
        }
    
        private static final Runtime s_runtime = Runtime.getRuntime ();
    
    } // end of class
    // ----------------------------------------------------------------------------
    ```
    
    I took another look at the result. I think I got fooled by the result of Tuple2(1,2);  It gives 24 bytes as result, but the javac don't pick the specialized version class. The Tuple2$mcII$sp(1, 2) does give 32 bytes as result. So, The (1, 2) in Scala on 64 bit JVM do takes 32 bytes size. We get that right
    
    The last thing left is to figure out why the ExternalSort keeping failing tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-78411597
  
    I don't know much about the ExternalSorter. You could try to walk through the code and try to see how the size estimator changes are affecting it.
    
    @mateiz @rxin might know how it depends on the SizeEstimator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77672798
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28363/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77625394
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28351/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-77308110
  
      [Test build #28277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28277/consoleFull) for   PR 4783 at commit [`a55f3e1`](https://github.com/apache/spark/commit/a55f3e1655b3b6ebef1f0d1bc615654a6803f7c8).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-96759699
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r26209062
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -279,8 +315,8 @@ private[spark] object SizeEstimator extends Logging {
         newInfo
       }
     
    -  private def alignSize(size: Long): Long = {
    -    val rem = size % ALIGN_SIZE
    -    if (rem == 0) size else (size + ALIGN_SIZE - rem)
    -  }
    +  private def alignSize(size: Long): Long = alignSizeUp(size, ALIGN_SIZE)
    +
    +  private def alignSizeUp(size: Long, alignSize: Int): Long =
    +    (size + alignSize - 1) & ~(alignSize - 1)
    --- End diff --
    
    I noticed this code when I looked at the HotSpot JVM classloader code. It's a c macro. And I think it's quite elegant. So I ported this computation. I will add some comment if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-83410489
  
    @shivaram @srowen Tuple2(Int, Int) got specialized to Tuple2$mcII$sp class. But the Tuple2$mcII$sp is a subclass of Tuple2. So in our implementation, the specialized class will get two additional object references. (_1, _2 in superclass Tuple2, in our case). So, for Tuple2(Int, Int), SizeEstimator will give 32 bytes rather than 24 bytes. In theory, the Tuple2(1,2) class filed layout should be something like below.
    ```
    scala.Tuple2$mcII$sp object internals:
    OFFSET SIZE TYPE DESCRIPTION VALUE
    0 4 (object header) 01 00 00 00 (0000 0001 0000 0000 0000 0000 0000 0000)
    4 4 (object header) 00 00 00 00 (0000 0000 0000 0000 0000 0000 0000 0000)
    8 4 (object header) 05 c3 00 f8 (0000 0101 1100 0011 0000 0000 1111 1000)
    12 4 Object Tuple2._1 null
    16 4 Object Tuple2._2 null
    20 4 int Tuple2$mcII$sp._1$mcI$sp 1
    24 4 int Tuple2$mcII$sp._2$mcI$sp 2
    28 4 (loss due to the next object alignment)
    Instance size: 32 bytes (reported by Instrumentation API)
    Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
    ```
    
    But in practice, the size of Tuple2(1, 2) is 24 bytes. So is there any scala expert we can ping? I really want to know why Tuple2(1, 2) can be 24 bytes when the specialized version is involved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4783#discussion_r25969730
  
    --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
    @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging {
         val parent = getClassInfo(cls.getSuperclass)
         var shellSize = parent.shellSize
         var pointerFields = parent.pointerFields
    +    
    +    // we have only four sizes: 1, 2, 4, 8. pointerSize can be 4 or 8.
    +    // we can just use an Array.fill(8 + 1) to hold all the size counting, but we
    +    // can also use a HashMap to store size counting; Here we uses Array.
    +    val sizeCount = Array.fill(fieldSizes.max + 1)(0)
     
    +    // iterate through the fields of this class and gather information.
         for (field <- cls.getDeclaredFields) {
           if (!Modifier.isStatic(field.getModifiers)) {
             val fieldClass = field.getType
             if (fieldClass.isPrimitive) {
    -          shellSize += primitiveSize(fieldClass)
    +          sizeCount(primitiveSize(fieldClass)) += 1
             } else {
               field.setAccessible(true) // Enable future get()'s on this field
    -          shellSize += pointerSize
    +          sizeCount(pointerSize) += 1
               pointerFields = field :: pointerFields
             }
           }
         }
     
    -    shellSize = alignSize(shellSize)
    +    // Based on the simulated field layout code in Aleksey Shipilev's report:
    +    // http://cr.openjdk.java.net/~shade/papers/2013-shipilev-fieldlayout-latest.pdf
    +    // The code is in Figure 9. Aleksey Shiplev also leaders the development of JOL (Java Object 
    +    // Layout) tool. The simulated code is also in the JOL. The simplified idea of field layout 
    +    // consists of these parts (see more details in the report):
    +    // 1. field alignment: HotSpot lay out the fields aligned by their size.
    +    // 2. object alignment: HotSpot rounds instance size up to 8 bytes
    +    // 3. consistent field layouts throughout the hierarchy: This means we should layout 
    +    // superclass first. And we use superclass's shellSize as a start point to layout the
    +    // other fields in this class.
    +    // 4. class alignment: HotSpot rounds field blocks upto to HeapOopSize not 4 bytes, confirmed
    +    // with Aleksey. see https://bugs.openjdk.java.net/browse/CODETOOLS-7901322
    +    //
    +    // The real world field layout is much more complicated. There are three kinds of fields
    +    // order in Java 8. And we don't consider the @contended annotation introduced by Java 8.
    +    // see the HotSpot classloader code, layout_fields method for more details.
    +    // hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/classfile/classFileParser.cpp
    +    var alignedSize = shellSize
    +    for (size <- fieldSizes if sizeCount(size) > 0) {
    +      val count = sizeCount(size)
    +      // If there are internal gaps, smaller filed can fit in.
    --- End diff --
    
    filed -> field


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/4783#issuecomment-76352556
  
    @shivaram I updated the gist, you can look at the result. [gist](https://gist.github.com/advancedxy/2ae7c9cc7629f3aeb679)
    
    Also you can just download the shell script and try it yourself. The script assumes you are on a Mac and installed java 6,7,8. And of course, you should look at first.
    
    ``` shell
    curl -o align-test.sh https://gist.githubusercontent.com/advancedxy/2ae7c9cc7629f3aeb679/raw/4f23d55d0f836c2d49f05378398f3c01d7ecdab2/align-test.sh && sh align-test.sh
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org