You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rednaxelafx <gi...@git.apache.org> on 2018/08/14 10:53:22 UTC

[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

GitHub user rednaxelafx opened a pull request:

    https://github.com/apache/spark/pull/22103

    [SPARK-25113][SQL] Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit

    ## What changes were proposed in this pull request?
    
    Add logging for all generated methods from the `CodeGenerator` whose bytecode size goes above 8000 bytes.
    This is to help with gathering stats on how often Spark is generating methods too big to be JIT'd. It covers all codegen scenarios, include whole-stage codegen and also individual expression codegen, e.g. unsafe projection, mutable projection, etc.
    
    ## How was this patch tested?
    
    Manually tested that logging did happen when generated method was above 8000 bytes.
    Also added a new unit test case to `CodeGenerationSuite` to verify that the logging did happen.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rednaxelafx/apache-spark codegen-8k-logging

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22103
    
----
commit 640c9cd3b99d51f38c9b1a0c3f94bae676d11e51
Author: Kris Mok <kr...@...>
Date:   2018-08-14T10:50:01Z

    SPARK-25113: Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r210152479
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
    --- End diff --
    
    Yeah, I know The current comparison is more strict. Although the previous comparison was only for name, the current comparison is for a pair of class loader and name. 
    
    I worried whether the strictness may change behavior.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22103: [SPARK-25113][SQL] Add logging to CodeGenerator when any...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22103
  
    **[Test build #94740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94740/testReport)** for PR 22103 at commit [`640c9cd`](https://github.com/apache/spark/commit/640c9cd3b99d51f38c9b1a0c3f94bae676d11e51).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22103


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r210129972
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
                 val byteCodeSize = codeAttrField.get(a).asInstanceOf[Array[Byte]].length
                 CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)
    +
    +            if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
    +              logInfo("Generated method too long to be JIT compiled: " +
    --- End diff --
    
    I'd say either way is fine. They're different tenses and the nuances are slightly different.
    
    "This story is too good to be true"
    vs
    "A story too good to be true"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r209914174
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
                 val byteCodeSize = codeAttrField.get(a).asInstanceOf[Array[Byte]].length
                 CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)
    +
    +            if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
    +              logInfo("Generated method too long to be JIT compiled: " +
    --- End diff --
    
    why info and not debug?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22103: [SPARK-25113][SQL] Add logging to CodeGenerator when any...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22103
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r209995159
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
    --- End diff --
    
    Why do we need change this condition?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r210169785
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
    --- End diff --
    
    > I worried whether the strictness may change behavior.
    
    Right, I can tell. And as I've mentioned above, although my new check is stricter, it doesn't make the behavior any "worse" than before, because we're reflectively accessing the `code` field immediately after via `codeAttrField.get(a)`, and that won't work unless the classes are matching exactly.
    
    The old code before my change would actually be too permissive -- in the case of class loader mismatch, the old check will allow the it go run to the reflective access site, but it'll then fail because reflection doesn't allow access from the wrong class.
    
    This can be exemplified by the following pseudocode 
    ```
    val c1 = new URLClassLoader(somePath).loadClass("Foo") // load a class
    val c2 = new URLClassLoader(somePath).loadClass("Foo") // load another class with the same name from the same path, but different class loader
    val nameEq = c1.getName == c2.getName // true
    val refEq = c1 eq c2 // false
    val f1 = c1.getClass.getField("a")
    val o1 = c1.newInstance
    val o2 = c2.newInstance
    f1.get(o1) // okay
    f1.get(o2) // fail with exception
    ``` 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22103: [SPARK-25113][SQL] Add logging to CodeGenerator when any...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22103
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94740/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22103: [SPARK-25113][SQL] Add logging to CodeGenerator when any...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22103
  
    **[Test build #94740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94740/testReport)** for PR 22103 at commit [`640c9cd`](https://github.com/apache/spark/commit/640c9cd3b99d51f38c9b1a0c3f94bae676d11e51).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r209993118
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
                 val byteCodeSize = codeAttrField.get(a).asInstanceOf[Array[Byte]].length
                 CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)
    +
    +            if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
    +              logInfo("Generated method too long to be JIT compiled: " +
    --- End diff --
    
    nit: `Generated method is too long ...`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r209936266
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
                 val byteCodeSize = codeAttrField.get(a).asInstanceOf[Array[Byte]].length
                 CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)
    +
    +            if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
    +              logInfo("Generated method too long to be JIT compiled: " +
    --- End diff --
    
    when we hit this, the JIT will very likely not work and performance may drop a lot. This even worth a warning...
    
    Since it's just an estimation and Spark SQL can still work, I think info is fine here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22103: [SPARK-25113][SQL] Add logging to CodeGenerator when any...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22103
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22103: [SPARK-25113][SQL] Add logging to CodeGenerator when any...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22103
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22103: [SPARK-25113][SQL] Add logging to CodeGenerator when any...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22103
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2175/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22103#discussion_r210130579
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
             val stats = cf.methodInfos.asScala.flatMap { method =>
    -          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
    +          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
    --- End diff --
    
    The `getName` was accessing the class name unnecessarily and then doing string comparison unnecessarily. Just changing it when touching the code around it.
    The JVM guarantees that a `(defining class loader, full class name)` pair is unique at runtime, in this case the `java.lang.Class` instance is guaranteed to be unique, so a reference equality check is fast and sufficient.
    There's no worry of cross class loader issue here, because if there is, the code that follows won't work anyway.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org