You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/10/01 16:35:00 UTC

[jira] [Assigned] (SPARK-25582) Error in Spark logs when using the org.apache.spark:spark-sql_2.11:2.2.0 Java library

     [ https://issues.apache.org/jira/browse/SPARK-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-25582:
------------------------------------

    Assignee: Apache Spark

> Error in Spark logs when using the org.apache.spark:spark-sql_2.11:2.2.0 Java library
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-25582
>                 URL: https://issues.apache.org/jira/browse/SPARK-25582
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.2.0
>            Reporter: Thomas Brugiere
>            Assignee: Apache Spark
>            Priority: Major
>         Attachments: fileA.csv, fileB.csv, fileC.csv
>
>
> I have noticed an error that appears in the Spark logs when using the Spark SQL library in a Java 8 project.
> When I run the code below with the attached files as input, I can see the ERROR below in the application logs.
> I am using the *org.apache.spark:spark-sql_2.11:2.2.0* library in my Java project
> Note that the same logic implemented with the Python API (pyspark) doesn't produce any Exception like this.
> *Code*
> {code:java}
> SparkConf conf = new SparkConf().setAppName("SparkBug").setMaster("local");
> SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate();
> Dataset<Row> df_a = sparkSession.read().option("header", true).csv("local/fileA.csv").dropDuplicates();
> Dataset<Row> df_b = sparkSession.read().option("header", true).csv("local/fileB.csv").dropDuplicates();
> Dataset<Row> df_c = sparkSession.read().option("header", true).csv("local/fileC.csv").dropDuplicates();
> String[] key_join_1 = new String[]{"colA", "colB", "colC", "colD", "colE", "colF"};
> String[] key_join_2 = new String[]{"colA", "colB", "colC", "colD", "colE"};
> Dataset<Row> df_inventory_1 = df_a.join(df_b, arrayToSeq(key_join_1), "left");
> Dataset<Row> df_inventory_2 = df_inventory_1.join(df_c, arrayToSeq(key_join_2), "left");
> df_inventory_2.show();
> {code}
> *Error message*
> {code:java}
> 18/10/01 09:58:07 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 202, Column 18: Expression "agg_isNull_28" is not an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 202, Column 18: Expression "agg_isNull_28" is not an rvalue
>     at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821)
>     at org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7170)
>     at org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:5332)
>     at org.codehaus.janino.UnitCompiler.access$9400(UnitCompiler.java:212)
>     at org.codehaus.janino.UnitCompiler$13$1.visitAmbiguousName(UnitCompiler.java:5287)
>     at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4053)
>     at org.codehaus.janino.UnitCompiler$13.visitLvalue(UnitCompiler.java:5284)
>     at org.codehaus.janino.Java$Lvalue.accept(Java.java:3977)
>     at org.codehaus.janino.UnitCompiler.getConstantValue(UnitCompiler.java:5280)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2391)
>     at org.codehaus.janino.UnitCompiler.access$1900(UnitCompiler.java:212)
>     at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1474)
>     at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1466)
>     at org.codehaus.janino.Java$IfStatement.accept(Java.java:2926)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1466)
>     at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1546)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3075)
>     at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
>     at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
>     at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)
>     at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
>     at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
>     at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385)
>     at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1285)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:825)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:411)
>     at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:212)
>     at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:390)
>     at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:385)
>     at org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1405)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385)
>     at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:357)
>     at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)
>     at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446)
>     at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
>     at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235)
>     at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204)
>     at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
>     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1417)
>     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
>     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490)
>     at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>     at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>     at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>     at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>     at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>     at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>     at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>     at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365)
>     at org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:579)
>     at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>     at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>     at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
>     at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
>     at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
>     at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
>     at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
>     at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
>     at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
>     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
>     at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
>     at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
>     at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
>     at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
>     at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
>     at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
>     at SparkBug.main(SparkBug.java:30)
> 18/10/01 09:58:07 INFO CodeGenerator:
> /* 001 */ public Object generate(Object[] references) {
> /* 002 */   return new GeneratedIteratorForCodegenStage6(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ final class GeneratedIteratorForCodegenStage6 extends org.apache.spark.sql.execution.BufferedRowIterator {
> /* 006 */   private Object[] references;
> /* 007 */   private scala.collection.Iterator[] inputs;
> /* 008 */   private boolean agg_initAgg_0;
> /* 009 */   private org.apache.spark.unsafe.KVIterator agg_mapIter_0;
> /* 010 */   private org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap agg_hashMap_0;
> /* 011 */   private org.apache.spark.sql.execution.UnsafeKVExternalSorter agg_sorter_0;
> /* 012 */   private scala.collection.Iterator inputadapter_input_0;
> /* 013 */   private org.apache.spark.sql.execution.joins.UnsafeHashedRelation bhj_relation_0;
> /* 014 */   private org.apache.spark.sql.execution.joins.UnsafeHashedRelation bhj_relation_1;
> /* 015 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[] agg_mutableStateArray_1 = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[8];
> /* 016 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] agg_mutableStateArray_2 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[8];
> /* 017 */   private UnsafeRow[] agg_mutableStateArray_0 = new UnsafeRow[8];
> /* 018 */
> /* 019 */   public GeneratedIteratorForCodegenStage6(Object[] references) {
> /* 020 */     this.references = references;
> /* 021 */   }
> /* 022 */
> /* 023 */   public void init(int index, scala.collection.Iterator[] inputs) {
> /* 024 */     partitionIndex = index;
> /* 025 */     this.inputs = inputs;
> /* 026 */     wholestagecodegen_init_0_0();
> /* 027 */     wholestagecodegen_init_0_1();
> /* 028 */     wholestagecodegen_init_0_2();
> /* 029 */
> /* 030 */   }
> /* 031 */
> /* 032 */   private void wholestagecodegen_init_0_2() {
> /* 033 */     agg_mutableStateArray_0[5] = new UnsafeRow(5);
> /* 034 */     agg_mutableStateArray_1[5] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[5], 160);
> /* 035 */     agg_mutableStateArray_2[5] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[5], 5);
> /* 036 */     agg_mutableStateArray_0[6] = new UnsafeRow(23);
> /* 037 */     agg_mutableStateArray_1[6] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[6], 736);
> /* 038 */     agg_mutableStateArray_2[6] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[6], 23);
> /* 039 */     agg_mutableStateArray_0[7] = new UnsafeRow(18);
> /* 040 */     agg_mutableStateArray_1[7] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[7], 576);
> /* 041 */     agg_mutableStateArray_2[7] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[7], 18);
> /* 042 */
> /* 043 */   }
> /* 044 */
> /* 045 */   private void agg_doAggregateWithKeysOutput_0(UnsafeRow agg_keyTerm_0, UnsafeRow agg_bufferTerm_0)
> /* 046 */   throws java.io.IOException {
> /* 047 */     ((org.apache.spark.sql.execution.metric.SQLMetric) references[4] /* numOutputRows */).add(1);
> /* 048 */
> /* 049 */     boolean agg_isNull_22 = agg_keyTerm_0.isNullAt(1);
> /* 050 */     UTF8String agg_value_22 = agg_isNull_22 ? null : (agg_keyTerm_0.getUTF8String(1));
> /* 051 */     boolean agg_isNull_23 = agg_keyTerm_0.isNullAt(6);
> /* 052 */     UTF8String agg_value_23 = agg_isNull_23 ? null : (agg_keyTerm_0.getUTF8String(6));
> /* 053 */     boolean agg_isNull_24 = agg_keyTerm_0.isNullAt(3);
> /* 054 */     UTF8String agg_value_24 = agg_isNull_24 ? null : (agg_keyTerm_0.getUTF8String(3));
> /* 055 */     boolean agg_isNull_25 = agg_keyTerm_0.isNullAt(4);
> /* 056 */     UTF8String agg_value_25 = agg_isNull_25 ? null : (agg_keyTerm_0.getUTF8String(4));
> /* 057 */     boolean agg_isNull_26 = agg_keyTerm_0.isNullAt(2);
> /* 058 */     UTF8String agg_value_26 = agg_isNull_26 ? null : (agg_keyTerm_0.getUTF8String(2));
> /* 059 */     boolean agg_isNull_27 = agg_keyTerm_0.isNullAt(0);
> /* 060 */     UTF8String agg_value_27 = agg_isNull_27 ? null : (agg_keyTerm_0.getUTF8String(0));
> /* 061 */
> /* 062 */     // generate join key for stream side
> /* 063 */
> /* 064 */     agg_mutableStateArray_1[2].reset();
> /* 065 */
> /* 066 */     agg_mutableStateArray_2[2].zeroOutNullBytes();
> /* 067 */
> /* 068 */     if (agg_isNull_22) {
> /* 069 */       agg_mutableStateArray_2[2].setNullAt(0);
> /* 070 */     } else {
> /* 071 */       agg_mutableStateArray_2[2].write(0, agg_value_22);
> /* 072 */     }
> /* 073 */
> /* 074 */     if (agg_isNull_23) {
> /* 075 */       agg_mutableStateArray_2[2].setNullAt(1);
> /* 076 */     } else {
> /* 077 */       agg_mutableStateArray_2[2].write(1, agg_value_23);
> /* 078 */     }
> /* 079 */
> /* 080 */     if (agg_isNull_24) {
> /* 081 */       agg_mutableStateArray_2[2].setNullAt(2);
> /* 082 */     } else {
> /* 083 */       agg_mutableStateArray_2[2].write(2, agg_value_24);
> /* 084 */     }
> /* 085 */
> /* 086 */     if (agg_isNull_25) {
> /* 087 */       agg_mutableStateArray_2[2].setNullAt(3);
> /* 088 */     } else {
> /* 089 */       agg_mutableStateArray_2[2].write(3, agg_value_25);
> /* 090 */     }
> /* 091 */
> /* 092 */     if (agg_isNull_26) {
> /* 093 */       agg_mutableStateArray_2[2].setNullAt(4);
> /* 094 */     } else {
> /* 095 */       agg_mutableStateArray_2[2].write(4, agg_value_26);
> /* 096 */     }
> /* 097 */
> /* 098 */     if (agg_isNull_27) {
> /* 099 */       agg_mutableStateArray_2[2].setNullAt(5);
> /* 100 */     } else {
> /* 101 */       agg_mutableStateArray_2[2].write(5, agg_value_27);
> /* 102 */     }
> /* 103 */     agg_mutableStateArray_0[2].setTotalSize(agg_mutableStateArray_1[2].totalSize());
> /* 104 */
> /* 105 */     // find matches from HashedRelation
> /* 106 */     UnsafeRow bhj_matched_0 = agg_mutableStateArray_0[2].anyNull() ? null: (UnsafeRow)bhj_relation_0.getValue(agg_mutableStateArray_0[2]);
> /* 107 */     final boolean bhj_conditionPassed_0 = true;
> /* 108 */     if (!bhj_conditionPassed_0) {
> /* 109 */       bhj_matched_0 = null;
> /* 110 */       // reset the variables those are already evaluated.
> /* 111 */
> /* 112 */     }
> /* 113 */     ((org.apache.spark.sql.execution.metric.SQLMetric) references[7] /* numOutputRows */).add(1);
> /* 114 */
> /* 115 */     // generate join key for stream side
> /* 116 */
> /* 117 */     agg_mutableStateArray_1[5].reset();
> /* 118 */
> /* 119 */     agg_mutableStateArray_2[5].zeroOutNullBytes();
> /* 120 */
> /* 121 */     if (agg_isNull_22) {
> /* 122 */       agg_mutableStateArray_2[5].setNullAt(0);
> /* 123 */     } else {
> /* 124 */       agg_mutableStateArray_2[5].write(0, agg_value_22);
> /* 125 */     }
> /* 126 */
> /* 127 */     if (agg_isNull_23) {
> /* 128 */       agg_mutableStateArray_2[5].setNullAt(1);
> /* 129 */     } else {
> /* 130 */       agg_mutableStateArray_2[5].write(1, agg_value_23);
> /* 131 */     }
> /* 132 */
> /* 133 */     if (agg_isNull_24) {
> /* 134 */       agg_mutableStateArray_2[5].setNullAt(2);
> /* 135 */     } else {
> /* 136 */       agg_mutableStateArray_2[5].write(2, agg_value_24);
> /* 137 */     }
> /* 138 */
> /* 139 */     if (agg_isNull_25) {
> /* 140 */       agg_mutableStateArray_2[5].setNullAt(3);
> /* 141 */     } else {
> /* 142 */       agg_mutableStateArray_2[5].write(3, agg_value_25);
> /* 143 */     }
> /* 144 */
> /* 145 */     if (agg_isNull_26) {
> /* 146 */       agg_mutableStateArray_2[5].setNullAt(4);
> /* 147 */     } else {
> /* 148 */       agg_mutableStateArray_2[5].write(4, agg_value_26);
> /* 149 */     }
> /* 150 */     agg_mutableStateArray_0[5].setTotalSize(agg_mutableStateArray_1[5].totalSize());
> /* 151 */
> /* 152 */     // find matches from HashedRelation
> /* 153 */     UnsafeRow bhj_matched_1 = agg_mutableStateArray_0[5].anyNull() ? null: (UnsafeRow)bhj_relation_1.getValue(agg_mutableStateArray_0[5]);
> /* 154 */     final boolean bhj_conditionPassed_1 = true;
> /* 155 */     if (!bhj_conditionPassed_1) {
> /* 156 */       bhj_matched_1 = null;
> /* 157 */       // reset the variables those are already evaluated.
> /* 158 */
> /* 159 */     }
> /* 160 */     ((org.apache.spark.sql.execution.metric.SQLMetric) references[10] /* numOutputRows */).add(1);
> /* 161 */
> /* 162 */     agg_mutableStateArray_1[7].reset();
> /* 163 */
> /* 164 */     agg_mutableStateArray_2[7].zeroOutNullBytes();
> /* 165 */
> /* 166 */     if (agg_isNull_22) {
> /* 167 */       agg_mutableStateArray_2[7].setNullAt(0);
> /* 168 */     } else {
> /* 169 */       agg_mutableStateArray_2[7].write(0, agg_value_22);
> /* 170 */     }
> /* 171 */
> /* 172 */     if (agg_isNull_23) {
> /* 173 */       agg_mutableStateArray_2[7].setNullAt(1);
> /* 174 */     } else {
> /* 175 */       agg_mutableStateArray_2[7].write(1, agg_value_23);
> /* 176 */     }
> /* 177 */
> /* 178 */     if (agg_isNull_24) {
> /* 179 */       agg_mutableStateArray_2[7].setNullAt(2);
> /* 180 */     } else {
> /* 181 */       agg_mutableStateArray_2[7].write(2, agg_value_24);
> /* 182 */     }
> /* 183 */
> /* 184 */     if (agg_isNull_25) {
> /* 185 */       agg_mutableStateArray_2[7].setNullAt(3);
> /* 186 */     } else {
> /* 187 */       agg_mutableStateArray_2[7].write(3, agg_value_25);
> /* 188 */     }
> /* 189 */
> /* 190 */     if (agg_isNull_26) {
> /* 191 */       agg_mutableStateArray_2[7].setNullAt(4);
> /* 192 */     } else {
> /* 193 */       agg_mutableStateArray_2[7].write(4, agg_value_26);
> /* 194 */     }
> /* 195 */
> /* 196 */     if (agg_isNull_27) {
> /* 197 */       agg_mutableStateArray_2[7].setNullAt(5);
> /* 198 */     } else {
> /* 199 */       agg_mutableStateArray_2[7].write(5, agg_value_27);
> /* 200 */     }
> /* 201 */
> /* 202 */     if (agg_isNull_28) {
> /* 203 */       agg_mutableStateArray_2[7].setNullAt(6);
> /* 204 */     } else {
> /* 205 */       agg_mutableStateArray_2[7].write(6, agg_value_28);
> /* 206 */     }
> /* 207 */
> /* 208 */     if (bhj_isNull_19) {
> /* 209 */       agg_mutableStateArray_2[7].setNullAt(7);
> /* 210 */     } else {
> /* 211 */       agg_mutableStateArray_2[7].write(7, bhj_value_19);
> /* 212 */     }
> /* 213 */
> /* 214 */     if (bhj_isNull_21) {
> /* 215 */       agg_mutableStateArray_2[7].setNullAt(8);
> /* 216 */     } else {
> /* 217 */       agg_mutableStateArray_2[7].write(8, bhj_value_21);
> /* 218 */     }
> /* 219 */
> /* 220 */     if (bhj_isNull_23) {
> /* 221 */       agg_mutableStateArray_2[7].setNullAt(9);
> /* 222 */     } else {
> /* 223 */       agg_mutableStateArray_2[7].write(9, bhj_value_23);
> /* 224 */     }
> /* 225 */
> /* 226 */     if (bhj_isNull_25) {
> /* 227 */       agg_mutableStateArray_2[7].setNullAt(10);
> /* 228 */     } else {
> /* 229 */       agg_mutableStateArray_2[7].write(10, bhj_value_25);
> /* 230 */     }
> /* 231 */
> /* 232 */     if (bhj_isNull_27) {
> /* 233 */       agg_mutableStateArray_2[7].setNullAt(11);
> /* 234 */     } else {
> /* 235 */       agg_mutableStateArray_2[7].write(11, bhj_value_27);
> /* 236 */     }
> /* 237 */
> /* 238 */     if (bhj_isNull_29) {
> /* 239 */       agg_mutableStateArray_2[7].setNullAt(12);
> /* 240 */     } else {
> /* 241 */       agg_mutableStateArray_2[7].write(12, bhj_value_29);
> /* 242 */     }
> /* 243 */
> /* 244 */     if (bhj_isNull_31) {
> /* 245 */       agg_mutableStateArray_2[7].setNullAt(13);
> /* 246 */     } else {
> /* 247 */       agg_mutableStateArray_2[7].write(13, bhj_value_31);
> /* 248 */     }
> /* 249 */
> /* 250 */     if (bhj_isNull_68) {
> /* 251 */       agg_mutableStateArray_2[7].setNullAt(14);
> /* 252 */     } else {
> /* 253 */       agg_mutableStateArray_2[7].write(14, bhj_value_68);
> /* 254 */     }
> /* 255 */
> /* 256 */     if (bhj_isNull_70) {
> /* 257 */       agg_mutableStateArray_2[7].setNullAt(15);
> /* 258 */     } else {
> /* 259 */       agg_mutableStateArray_2[7].write(15, bhj_value_70);
> /* 260 */     }
> /* 261 */
> /* 262 */     if (bhj_isNull_72) {
> /* 263 */       agg_mutableStateArray_2[7].setNullAt(16);
> /* 264 */     } else {
> /* 265 */       agg_mutableStateArray_2[7].write(16, bhj_value_72);
> /* 266 */     }
> /* 267 */
> /* 268 */     if (bhj_isNull_74) {
> /* 269 */       agg_mutableStateArray_2[7].setNullAt(17);
> /* 270 */     } else {
> /* 271 */       agg_mutableStateArray_2[7].write(17, bhj_value_74);
> /* 272 */     }
> /* 273 */     agg_mutableStateArray_0[7].setTotalSize(agg_mutableStateArray_1[7].totalSize());
> /* 274 */     append(agg_mutableStateArray_0[7]);
> /* 275 */
> /* 276 */   }
> /* 277 */
> /* 278 */   private void wholestagecodegen_init_0_1() {
> /* 279 */     agg_mutableStateArray_0[2] = new UnsafeRow(6);
> /* 280 */     agg_mutableStateArray_1[2] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[2], 192);
> /* 281 */     agg_mutableStateArray_2[2] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[2], 6);
> /* 282 */     agg_mutableStateArray_0[3] = new UnsafeRow(20);
> /* 283 */     agg_mutableStateArray_1[3] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[3], 640);
> /* 284 */     agg_mutableStateArray_2[3] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[3], 20);
> /* 285 */     agg_mutableStateArray_0[4] = new UnsafeRow(14);
> /* 286 */     agg_mutableStateArray_1[4] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[4], 448);
> /* 287 */     agg_mutableStateArray_2[4] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[4], 14);
> /* 288 */
> /* 289 */     bhj_relation_1 = ((org.apache.spark.sql.execution.joins.UnsafeHashedRelation) ((org.apache.spark.broadcast.TorrentBroadcast) references[8] /* broadcast */).value()).asReadOnlyCopy();
> /* 290 */     incPeakExecutionMemory(bhj_relation_1.estimatedSize());
> /* 291 */
> /* 292 */     org.apache.spark.TaskContext$.MODULE$.get().addTaskCompletionListener(new org.apache.spark.util.TaskCompletionListener() {
> /* 293 */         @Override
> /* 294 */         public void onTaskCompletion(org.apache.spark.TaskContext context) {
> /* 295 */           ((org.apache.spark.sql.execution.metric.SQLMetric) references[9] /* avgHashProbe */).set(bhj_relation_1.getAverageProbesPerLookup());
> /* 296 */         }
> /* 297 */       });
> /* 298 */
> /* 299 */   }
> /* 300 */
> /* 301 */   private void agg_doConsume_0(InternalRow inputadapter_row_0, UTF8String agg_expr_0_0, boolean agg_exprIsNull_0_0, UTF8String agg_expr_1_0, boolean agg_exprIsNull_1_0, UTF8String agg_expr_2_0, boolean agg_exprIsNull_2_0, UTF8String agg_expr_3_0, boolean agg_exprIsNull_3_0, UTF8String agg_expr_4_0, boolean agg_exprIsNull_4_0, UTF8String agg_expr_5_0, boolean agg_exprIsNull_5_0, UTF8String agg_expr_6_0, boolean agg_exprIsNull_6_0) throws java.io.IOException {
> /* 302 */     UnsafeRow agg_unsafeRowAggBuffer_0 = null;
> /* 303 */
> /* 304 */     // generate grouping key
> /* 305 */     agg_mutableStateArray_1[0].reset();
> /* 306 */
> /* 307 */     agg_mutableStateArray_2[0].zeroOutNullBytes();
> /* 308 */
> /* 309 */     if (agg_exprIsNull_0_0) {
> /* 310 */       agg_mutableStateArray_2[0].setNullAt(0);
> /* 311 */     } else {
> /* 312 */       agg_mutableStateArray_2[0].write(0, agg_expr_0_0);
> /* 313 */     }
> /* 314 */
> /* 315 */     if (agg_exprIsNull_1_0) {
> /* 316 */       agg_mutableStateArray_2[0].setNullAt(1);
> /* 317 */     } else {
> /* 318 */       agg_mutableStateArray_2[0].write(1, agg_expr_1_0);
> /* 319 */     }
> /* 320 */
> /* 321 */     if (agg_exprIsNull_2_0) {
> /* 322 */       agg_mutableStateArray_2[0].setNullAt(2);
> /* 323 */     } else {
> /* 324 */       agg_mutableStateArray_2[0].write(2, agg_expr_2_0);
> /* 325 */     }
> /* 326 */
> /* 327 */     if (agg_exprIsNull_3_0) {
> /* 328 */       agg_mutableStateArray_2[0].setNullAt(3);
> /* 329 */     } else {
> /* 330 */       agg_mutableStateArray_2[0].write(3, agg_expr_3_0);
> /* 331 */     }
> /* 332 */
> /* 333 */     if (agg_exprIsNull_4_0) {
> /* 334 */       agg_mutableStateArray_2[0].setNullAt(4);
> /* 335 */     } else {
> /* 336 */       agg_mutableStateArray_2[0].write(4, agg_expr_4_0);
> /* 337 */     }
> /* 338 */
> /* 339 */     if (agg_exprIsNull_5_0) {
> /* 340 */       agg_mutableStateArray_2[0].setNullAt(5);
> /* 341 */     } else {
> /* 342 */       agg_mutableStateArray_2[0].write(5, agg_expr_5_0);
> /* 343 */     }
> /* 344 */
> /* 345 */     if (agg_exprIsNull_6_0) {
> /* 346 */       agg_mutableStateArray_2[0].setNullAt(6);
> /* 347 */     } else {
> /* 348 */       agg_mutableStateArray_2[0].write(6, agg_expr_6_0);
> /* 349 */     }
> /* 350 */     agg_mutableStateArray_0[0].setTotalSize(agg_mutableStateArray_1[0].totalSize());
> /* 351 */     int agg_value_14 = 42;
> /* 352 */
> /* 353 */     if (!agg_exprIsNull_0_0) {
> /* 354 */       agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_0_0.getBaseObject(), agg_expr_0_0.getBaseOffset(), agg_expr_0_0.numBytes(), agg_value_14);
> /* 355 */     }
> /* 356 */
> /* 357 */     if (!agg_exprIsNull_1_0) {
> /* 358 */       agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_1_0.getBaseObject(), agg_expr_1_0.getBaseOffset(), agg_expr_1_0.numBytes(), agg_value_14);
> /* 359 */     }
> /* 360 */
> /* 361 */     if (!agg_exprIsNull_2_0) {
> /* 362 */       agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_2_0.getBaseObject(), agg_expr_2_0.getBaseOffset(), agg_expr_2_0.numBytes(), agg_value_14);
> /* 363 */     }
> /* 364 */
> /* 365 */     if (!agg_exprIsNull_3_0) {
> /* 366 */       agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_3_0.getBaseObject(), agg_expr_3_0.getBaseOffset(), agg_expr_3_0.numBytes(), agg_value_14);
> /* 367 */     }
> /* 368 */
> /* 369 */     if (!agg_exprIsNull_4_0) {
> /* 370 */       agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_4_0.getBaseObject(), agg_expr_4_0.getBaseOffset(), agg_expr_4_0.numBytes(), agg_value_14);
> /* 371 */     }
> /* 372 */
> /* 373 */     if (!agg_exprIsNull_5_0) {
> /* 374 */       agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_5_0.getBaseObject(), agg_expr_5_0.getBaseOffset(), agg_expr_5_0.numBytes(), agg_value_14);
> /* 375 */     }
> /* 376 */
> /* 377 */     if (!agg_exprIsNull_6_0) {
> /* 378 */       agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_6_0.getBaseObject(), agg_expr_6_0.getBaseOffset(), agg_expr_6_0.numBytes(), agg_value_14);
> /* 379 */     }
> /* 380 */     if (true) {
> /* 381 */       // try to get the buffer from hash map
> /* 382 */       agg_unsafeRowAggBuffer_0 =
> /* 383 */       agg_hashMap_0.getAggregationBufferFromUnsafeRow(agg_mutableStateArray_0[0], agg_value_14);
> /* 384 */     }
> /* 385 */     // Can't allocate buffer from the hash map. Spill the map and fallback to sort-based
> /* 386 */     // aggregation after processing all input rows.
> /* 387 */     if (agg_unsafeRowAggBuffer_0 == null) {
> /* 388 */       if (agg_sorter_0 == null) {
> /* 389 */         agg_sorter_0 = agg_hashMap_0.destructAndCreateExternalSorter();
> /* 390 */       } else {
> /* 391 */         agg_sorter_0.merge(agg_hashMap_0.destructAndCreateExternalSorter());
> /* 392 */       }
> /* 393 */
> /* 394 */       // the hash map had be spilled, it should have enough memory now,
> /* 395 */       // try to allocate buffer again.
> /* 396 */       agg_unsafeRowAggBuffer_0 = agg_hashMap_0.getAggregationBufferFromUnsafeRow(
> /* 397 */         agg_mutableStateArray_0[0], agg_value_14);
> /* 398 */       if (agg_unsafeRowAggBuffer_0 == null) {
> /* 399 */         // failed to allocate the first page
> /* 400 */         throw new OutOfMemoryError("No enough memory for aggregation");
> /* 401 */       }
> /* 402 */     }
> /* 403 */
> /* 404 */     // common sub-expressions
> /* 405 */
> /* 406 */     // evaluate aggregate function
> /* 407 */
> /* 408 */     // update unsafe row buffer
> /* 409 */
> /* 410 */   }
> /* 411 */
> /* 412 */   private void agg_doAggregateWithKeys_0() throws java.io.IOException {
> /* 413 */     while (inputadapter_input_0.hasNext() && !stopEarly()) {
> /* 414 */       InternalRow inputadapter_row_0 = (InternalRow) inputadapter_input_0.next();
> /* 415 */       boolean inputadapter_isNull_0 = inputadapter_row_0.isNullAt(0);
> /* 416 */       UTF8String inputadapter_value_0 = inputadapter_isNull_0 ? null : (inputadapter_row_0.getUTF8String(0));
> /* 417 */       boolean inputadapter_isNull_1 = inputadapter_row_0.isNullAt(1);
> /* 418 */       UTF8String inputadapter_value_1 = inputadapter_isNull_1 ? null : (inputadapter_row_0.getUTF8String(1));
> /* 419 */       boolean inputadapter_isNull_2 = inputadapter_row_0.isNullAt(2);
> /* 420 */       UTF8String inputadapter_value_2 = inputadapter_isNull_2 ? null : (inputadapter_row_0.getUTF8String(2));
> /* 421 */       boolean inputadapter_isNull_3 = inputadapter_row_0.isNullAt(3);
> /* 422 */       UTF8String inputadapter_value_3 = inputadapter_isNull_3 ? null : (inputadapter_row_0.getUTF8String(3));
> /* 423 */       boolean inputadapter_isNull_4 = inputadapter_row_0.isNullAt(4);
> /* 424 */       UTF8String inputadapter_value_4 = inputadapter_isNull_4 ? null : (inputadapter_row_0.getUTF8String(4));
> /* 425 */       boolean inputadapter_isNull_5 = inputadapter_row_0.isNullAt(5);
> /* 426 */       UTF8String inputadapter_value_5 = inputadapter_isNull_5 ? null : (inputadapter_row_0.getUTF8String(5));
> /* 427 */       boolean inputadapter_isNull_6 = inputadapter_row_0.isNullAt(6);
> /* 428 */       UTF8String inputadapter_value_6 = inputadapter_isNull_6 ? null : (inputadapter_row_0.getUTF8String(6));
> /* 429 */
> /* 430 */       agg_doConsume_0(inputadapter_row_0, inputadapter_value_0, inputadapter_isNull_0, inputadapter_value_1, inputadapter_isNull_1, inputadapter_value_2, inputadapter_isNull_2, inputadapter_value_3, inputadapter_isNull_3, inputadapter_value_4, inputadapter_isNull_4, inputadapter_value_5, inputadapter_isNull_5, inputadapter_value_6, inputadapter_isNull_6);
> /* 431 */       if (shouldStop()) return;
> /* 432 */     }
> /* 433 */
> /* 434 */     agg_mapIter_0 = ((org.apache.spark.sql.execution.aggregate.HashAggregateExec) references[0] /* plan */).finishAggregate(agg_hashMap_0, agg_sorter_0, ((org.apache.spark.sql.execution.metric.SQLMetric) references[1] /* peakMemory */), ((org.apache.spark.sql.execution.metric.SQLMetric) references[2] /* spillSize */), ((org.apache.spark.sql.execution.metric.SQLMetric) references[3] /* avgHashProbe */));
> /* 435 */   }
> /* 436 */
> /* 437 */   protected void processNext() throws java.io.IOException {
> /* 438 */     if (!agg_initAgg_0) {
> /* 439 */       agg_initAgg_0 = true;
> /* 440 */       long wholestagecodegen_beforeAgg_0 = System.nanoTime();
> /* 441 */       agg_doAggregateWithKeys_0();
> /* 442 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[11] /* aggTime */).add((System.nanoTime() - wholestagecodegen_beforeAgg_0) / 1000000);
> /* 443 */     }
> /* 444 */
> /* 445 */     // output the result
> /* 446 */
> /* 447 */     while (agg_mapIter_0.next()) {
> /* 448 */       UnsafeRow agg_aggKey_0 = (UnsafeRow) agg_mapIter_0.getKey();
> /* 449 */       UnsafeRow agg_aggBuffer_0 = (UnsafeRow) agg_mapIter_0.getValue();
> /* 450 */       agg_doAggregateWithKeysOutput_0(agg_aggKey_0, agg_aggBuffer_0);
> /* 451 */
> /* 452 */       if (shouldStop()) return;
> /* 453 */     }
> /* 454 */
> /* 455 */     agg_mapIter_0.close();
> /* 456 */     if (agg_sorter_0 == null) {
> /* 457 */       agg_hashMap_0.free();
> /* 458 */     }
> /* 459 */   }
> /* 460 */
> /* 461 */   private void wholestagecodegen_init_0_0() {
> /* 462 */     agg_hashMap_0 = ((org.apache.spark.sql.execution.aggregate.HashAggregateExec) references[0] /* plan */).createHashMap();
> /* 463 */     inputadapter_input_0 = inputs[0];
> /* 464 */     agg_mutableStateArray_0[0] = new UnsafeRow(7);
> /* 465 */     agg_mutableStateArray_1[0] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[0], 224);
> /* 466 */     agg_mutableStateArray_2[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[0], 7);
> /* 467 */     agg_mutableStateArray_0[1] = new UnsafeRow(7);
> /* 468 */     agg_mutableStateArray_1[1] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[1], 224);
> /* 469 */     agg_mutableStateArray_2[1] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[1], 7);
> /* 470 */
> /* 471 */     bhj_relation_0 = ((org.apache.spark.sql.execution.joins.UnsafeHashedRelation) ((org.apache.spark.broadcast.TorrentBroadcast) references[5] /* broadcast */).value()).asReadOnlyCopy();
> /* 472 */     incPeakExecutionMemory(bhj_relation_0.estimatedSize());
> /* 473 */
> /* 474 */     org.apache.spark.TaskContext$.MODULE$.get().addTaskCompletionListener(new org.apache.spark.util.TaskCompletionListener() {
> /* 475 */         @Override
> /* 476 */         public void onTaskCompletion(org.apache.spark.TaskContext context) {
> /* 477 */           ((org.apache.spark.sql.execution.metric.SQLMetric) references[6] /* avgHashProbe */).set(bhj_relation_0.getAverageProbesPerLookup());
> /* 478 */         }
> /* 479 */       });
> /* 480 */
> /* 481 */   }
> /* 482 */
> /* 483 */ }
> 18/10/01 09:58:07 WARN WholeStageCodegenExec: Whole-stage codegen disabled for plan (id=6):
>  *(6) Project [colA#10, colB#11, colC#12, colD#13, colE#14, colF#15, colG#16, colH#41, ColI#42, colJ#43, colK#44, colL#45, colM#46, colN#47, colP#77, colQ#78, colR#79, colS#80]
> +- *(6) BroadcastHashJoin [colA#10, colB#11, colC#12, colD#13, colE#14], [colA#72, colB#73, colC#74, colD#75, colE#76], LeftOuter, BuildRight
>    :- *(6) Project [colA#10, colB#11, colC#12, colD#13, colE#14, colF#15, colG#16, colH#41, ColI#42, colJ#43, colK#44, colL#45, colM#46, colN#47]
>    :  +- *(6) BroadcastHashJoin [colA#10, colB#11, colC#12, colD#13, colE#14, colF#15], [colA#35, colB#36, colC#37, colD#38, colE#39, colF#40], LeftOuter, BuildRight
>    :     :- *(6) HashAggregate(keys=[colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11], functions=[], output=[colA#10, colB#11, colC#12, colD#13, colE#14, colF#15, colG#16])
>    :     :  +- Exchange hashpartitioning(colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11, 200)
>    :     :     +- *(1) HashAggregate(keys=[colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11], functions=[], output=[colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11])
>    :     :        +- *(1) FileScan csv [colA#10,colB#11,colC#12,colD#13,colE#14,colF#15,colG#16] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/tbrugier/projects/analytics/dev/spark-bug/local/fileA.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<colA:string,colB:string,colC:string,colD:string,colE:string,colF:string,colG:string>
>    :     +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true], input[1, string, true], input[2, string, true], input[3, string, true], input[4, string, true], input[5, string, true]))
>    :        +- *(3) HashAggregate(keys=[colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44], functions=[], output=[colA#35, colB#36, colC#37, colD#38, colE#39, colF#40, colH#41, ColI#42, colJ#43, colK#44, colL#45, colM#46, colN#47])
>    :           +- Exchange hashpartitioning(colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44, 200)
>    :              +- *(2) HashAggregate(keys=[colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44], functions=[], output=[colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44])
>    :                 +- *(2) FileScan csv [colA#35,colB#36,colC#37,colD#38,colE#39,colF#40,colH#41,ColI#42,colJ#43,colK#44,colL#45,colM#46,colN#47] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/tbrugier/projects/analytics/dev/spark-bug/local/fileB.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<colA:string,colB:string,colC:string,colD:string,colE:string,colF:string,colH:string,ColI:s...
>    +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true], input[1, string, true], input[2, string, true], input[3, string, true], input[4, string, true]))
>       +- *(5) HashAggregate(keys=[colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73], functions=[], output=[colA#72, colB#73, colC#74, colD#75, colE#76, colP#77, colQ#78, colR#79, colS#80])
>          +- Exchange hashpartitioning(colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73, 200)
>             +- *(4) HashAggregate(keys=[colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73], functions=[], output=[colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73])
>                +- *(4) FileScan csv [colA#72,colB#73,colC#74,colD#75,colE#76,colP#77,colQ#78,colR#79,colS#80] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/tbrugier/projects/analytics/dev/spark-bug/local/fileC.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<colA:string,colB:string,colC:string,colD:string,colE:string,colP:string,colQ:string,colR:s...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org