You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dmitry Zanozin (JIRA)" <ji...@apache.org> on 2018/09/10 19:48:00 UTC

[jira] [Comment Edited] (SPARK-23986) CompileException when using too many avg aggregation after joining

    [ https://issues.apache.org/jira/browse/SPARK-23986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609708#comment-16609708 ] 

Dmitry Zanozin edited comment on SPARK-23986 at 9/10/18 7:47 PM:
-----------------------------------------------------------------

Spark 2.3.1 still generates methods with duplicate parameter names. I've just got this method (which obviously failed with the following exception: "{{ERROR CodeGenerator:91 - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 686, Column 28: Redefinition of parameter "agg_expr_21"}}" ):
{code}
/* 686 */
private void agg_doConsume1(byte agg_expr_01, boolean agg_exprIsNull_01,
                            short agg_expr_11, boolean agg_exprIsNull_11,
                            short agg_expr_21, boolean agg_exprIsNull_21,
                            int agg_expr_31, boolean agg_exprIsNull_31,
                            int agg_expr_41, boolean agg_exprIsNull_41,
                            int agg_expr_51, boolean agg_exprIsNull_51,
                            UTF8String agg_expr_61, boolean agg_exprIsNull_61,
                            byte agg_expr_71, boolean agg_exprIsNull_71,
                            long agg_expr_81, boolean agg_exprIsNull_81,
                            double agg_expr_91, boolean agg_exprIsNull_91,
                            long agg_expr_101, boolean agg_exprIsNull_101,
                            double agg_expr_111, boolean agg_exprIsNull_111,
                            long agg_expr_121, boolean agg_exprIsNull_121,
                            int agg_expr_131, boolean agg_exprIsNull_131,
                            long agg_expr_141, boolean agg_exprIsNull_141,
                            int agg_expr_151, boolean agg_exprIsNull_151,
                            boolean agg_expr_161, boolean agg_exprIsNull_161,
                            long agg_expr_171,
                            byte agg_expr_18, boolean agg_exprIsNull_18,
                            boolean agg_expr_19, boolean agg_exprIsNull_19,
                            byte agg_expr_20, boolean agg_exprIsNull_20,
                            boolean agg_expr_21, boolean agg_exprIsNull_21,
                            short agg_expr_22, boolean agg_exprIsNull_22,
                            int agg_expr_23, boolean agg_exprIsNull_23) throws java.io.IOException {
{code}


was (Author: dzanozin):
Spark 2.3.1 still generates methods with duplicate parameter names. I've just got this method (which obviously failed with the following exception: "\{{ERROR CodeGenerator:91 - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 686, Column 28: Redefinition of parameter "agg_expr_21"}}":

{code}

{color:#808080}/* 686 */
{color}{color:#cc7832}private void {color}agg_doConsume1({color:#cc7832}byte {color}agg_expr_01{color:#cc7832}, boolean {color}agg_exprIsNull_01{color:#cc7832},
{color}{color:#cc7832}  short {color}agg_expr_11{color:#cc7832}, boolean {color}agg_exprIsNull_11{color:#cc7832},
{color}{color:#cc7832}  short {color}agg_expr_21{color:#cc7832}, boolean {color}agg_exprIsNull_21{color:#cc7832},
{color}{color:#cc7832}  int {color}agg_expr_31{color:#cc7832}, boolean {color}agg_exprIsNull_31{color:#cc7832},
{color}{color:#cc7832}  int {color}agg_expr_41{color:#cc7832}, boolean {color}agg_exprIsNull_41{color:#cc7832},
{color}{color:#cc7832}  int {color}agg_expr_51{color:#cc7832}, boolean {color}agg_exprIsNull_51{color:#cc7832},
{color}  UTF8String agg_expr_61{color:#cc7832}, boolean {color}agg_exprIsNull_61{color:#cc7832},
{color}{color:#cc7832}  byte {color}agg_expr_71{color:#cc7832}, boolean {color}agg_exprIsNull_71{color:#cc7832},
{color}{color:#cc7832}  long {color}agg_expr_81{color:#cc7832}, boolean {color}agg_exprIsNull_81{color:#cc7832},
{color}{color:#cc7832}  double {color}agg_expr_91{color:#cc7832}, boolean {color}agg_exprIsNull_91{color:#cc7832},
{color}{color:#cc7832}  long {color}agg_expr_101{color:#cc7832}, boolean {color}agg_exprIsNull_101{color:#cc7832},
{color}{color:#cc7832}  double {color}agg_expr_111{color:#cc7832}, boolean {color}agg_exprIsNull_111{color:#cc7832},
{color}{color:#cc7832}  long {color}agg_expr_121{color:#cc7832}, boolean {color}agg_exprIsNull_121{color:#cc7832},
{color}{color:#cc7832}  int {color}agg_expr_131{color:#cc7832}, boolean {color}agg_exprIsNull_131{color:#cc7832},
{color}{color:#cc7832}  long {color}agg_expr_141{color:#cc7832}, boolean {color}agg_exprIsNull_141{color:#cc7832},
{color}{color:#cc7832}  int {color}agg_expr_151{color:#cc7832}, boolean {color}agg_exprIsNull_151{color:#cc7832},
{color}{color:#cc7832}  boolean {color}agg_expr_161{color:#cc7832}, boolean {color}agg_exprIsNull_161{color:#cc7832},
{color}{color:#cc7832}  long {color}agg_expr_171{color:#cc7832},
{color}{color:#cc7832}  byte {color}agg_expr_18{color:#cc7832}, boolean {color}agg_exprIsNull_18{color:#cc7832},
{color}{color:#cc7832}  boolean {color}agg_expr_19{color:#cc7832}, boolean {color}agg_exprIsNull_19{color:#cc7832},
{color}{color:#cc7832}  byte {color}agg_expr_20{color:#cc7832}, boolean {color}agg_exprIsNull_20{color:#cc7832},
{color}{color:#cc7832}  boolean {color}agg_expr_21{color:#cc7832}, boolean {color}agg_exprIsNull_21{color:#cc7832},
{color}{color:#cc7832}  short {color}agg_expr_22{color:#cc7832}, boolean {color}agg_exprIsNull_22{color:#cc7832},
{color}{color:#cc7832}  int {color}agg_expr_23{color:#cc7832}, boolean {color}agg_exprIsNull_23) {color:#cc7832}throws {color}java.io.IOException {

{code}

> CompileException when using too many avg aggregation after joining
> ------------------------------------------------------------------
>
>                 Key: SPARK-23986
>                 URL: https://issues.apache.org/jira/browse/SPARK-23986
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Michel Davit
>            Assignee: Marco Gaido
>            Priority: Major
>             Fix For: 2.3.1, 2.4.0
>
>         Attachments: spark-generated.java
>
>
> Considering the following code:
> {code:java}
>     val df1: DataFrame = sparkSession.sparkContext
>       .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6)))
>       .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6")
>     val df2: DataFrame = sparkSession.sparkContext
>       .makeRDD(Seq((0, "val1", "val2")))
>       .toDF("key", "dummy1", "dummy2")
>     val agg = df1
>       .join(df2, df1("key") === df2("key"), "leftouter")
>       .groupBy(df1("key"))
>       .agg(
>         avg("col2").as("avg2"),
>         avg("col3").as("avg3"),
>         avg("col4").as("avg4"),
>         avg("col1").as("avg1"),
>         avg("col5").as("avg5"),
>         avg("col6").as("avg6")
>       )
>     val head = agg.take(1)
> {code}
> This logs the following exception:
> {code:java}
> ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 467, Column 28: Redefinition of parameter "agg_expr_11"
> {code}
> I am not a spark expert but after investigation, I realized that the generated {{doConsume}} method is responsible of the exception.
> Indeed, {{avg}} calls several times {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. The 1st time with the 'avg' Expr and a second time for the base aggregation Expr (count and sum).
> The problem comes from the generation of parameters in CodeGenerator:
> {code:java}
>   /**
>    * Returns a term name that is unique within this instance of a `CodegenContext`.
>    */
>   def freshName(name: String): String = synchronized {
>     val fullName = if (freshNamePrefix == "") {
>       name
>     } else {
>       s"${freshNamePrefix}_$name"
>     }
>     if (freshNameIds.contains(fullName)) {
>       val id = freshNameIds(fullName)
>       freshNameIds(fullName) = id + 1
>       s"$fullName$id"
>     } else {
>       freshNameIds += fullName -> 1
>       fullName
>     }
>   }
> {code}
> The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call.
>  The second call is made with {{agg_expr_[1..12]}} and generates the following names:
>  {{agg_expr_[11|21|31|41|51|61|11|12]}}. We then have a parameter name conflicts in the generated code: {{agg_expr_11.}}
> Appending the 'id' in s"$fullName$id" to generate unique term name is source of conflict. Maybe simply using undersoce can solve this issue : $fullName_$id"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org