You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/04/18 11:01:34 UTC

[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/12468

    [SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type

    ## What changes were proposed in this pull request?
    
    After https://github.com/apache/spark/pull/12067, we now use expressions to do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we produce a new buffer deserializer expression by replacing `AttributeReference` with right-side buffer attribute, like other `DeclarativeAggregate`s do, and finally combine the left and right buffer deserializer with `Invoke`.
    
    However, after https://github.com/apache/spark/pull/12338, we will add loop variable to class members when codegen `MapObjects`. If the `Aggregator` buffer type is `Seq`, which is implemented by `MapObjects` expression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause the `ClassFormatError`.
    
    This PR fixes this issue by calling `distinct` before declare the class menbers.
    
    ## How was this patch tested?
    
    new regression test in `DatasetAggregatorSuite`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12468
    
----
commit 43e6ae1a46a53f9e618773ce5855e8729d7abef6
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-04-18T08:01:27Z

    ClassFormatError when use Seq as Aggregator buffer type

commit 003833fd3c544131bef189a5cd0a5809b9a8502b
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-04-18T08:11:51Z

    Merge remote-tracking branch 'origin/master' into bug

commit 3426310f1be91a9a7438584def604c76813d9056
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-04-18T08:35:55Z

    another fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12468


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-212039493
  
    LGTM. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211699133
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211283520
  
    cc @davies @rxin @yhuai @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211541480
  
    **[Test build #56067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56067/consoleFull)** for PR 12468 at commit [`3426310`](https://github.com/apache/spark/commit/3426310f1be91a9a7438584def604c76813d9056).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12468#discussion_r60108108
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -110,13 +110,13 @@ class CodegenContext {
       }
     
       def declareMutableStates(): String = {
    -    mutableStates.map { case (javaType, variableName, _) =>
    +    mutableStates.distinct.map { case (javaType, variableName, _) =>
    --- End diff --
    
    need comment explaining the distinct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211699134
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56163/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211698718
  
    **[Test build #56163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56163/consoleFull)** for PR 12468 at commit [`a57f8e6`](https://github.com/apache/spark/commit/a57f8e69bb0eb371de69ca1b76409ec2ad483ae7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211541863
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12468#discussion_r60332726
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -110,13 +110,17 @@ class CodegenContext {
       }
     
       def declareMutableStates(): String = {
    -    mutableStates.map { case (javaType, variableName, _) =>
    +    // It's possible that we add same mutable state twice, e.g. the `mergeExpressions` in
    +    // `TypedAggregateExpression`, we should call `distinct` here to remove the duplicated ones.
    --- End diff --
    
    yea, we can transform the left deserializer in `TypedAggregateExpression` and create new `LambdaVariable` with different names. But I'm afraid there will be more similar problems so I go with this approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12468#discussion_r60108072
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -69,8 +69,15 @@ class EquivalentExpressions {
        */
       def addExprTree(root: Expression, ignoreLeaf: Boolean = true): Unit = {
         val skip = root.isInstanceOf[LeafExpression] && ignoreLeaf
    -    // the children of CodegenFallback will not be used to generate code (call eval() instead)
    -    if (!skip && !addExpr(root) && !root.isInstanceOf[CodegenFallback]) {
    +    // There are some special expressions that we should not recurse into children.
    +    //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    +    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    +    val shouldRecurse = root match {
    +      case _: CodegenFallback => false
    --- End diff --
    
    a few expressions implement CodegenFallback but only use it in some corner cases


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211487425
  
    **[Test build #56067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56067/consoleFull)** for PR 12468 at commit [`3426310`](https://github.com/apache/spark/commit/3426310f1be91a9a7438584def604c76813d9056).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211541865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56067/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12468#issuecomment-211659699
  
    **[Test build #56163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56163/consoleFull)** for PR 12468 at commit [`a57f8e6`](https://github.com/apache/spark/commit/a57f8e69bb0eb371de69ca1b76409ec2ad483ae7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14675][SQL] ClassFormatError when use S...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12468#discussion_r60277819
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -110,13 +110,17 @@ class CodegenContext {
       }
     
       def declareMutableStates(): String = {
    -    mutableStates.map { case (javaType, variableName, _) =>
    +    // It's possible that we add same mutable state twice, e.g. the `mergeExpressions` in
    +    // `TypedAggregateExpression`, we should call `distinct` here to remove the duplicated ones.
    --- End diff --
    
    Can we avoid of adding the same mutable state twice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org