You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/10/01 16:35:00 UTC
[jira] [Assigned] (SPARK-25582) Error in Spark logs when using the
org.apache.spark:spark-sql_2.11:2.2.0 Java library
[ https://issues.apache.org/jira/browse/SPARK-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-25582:
------------------------------------
Assignee: Apache Spark
> Error in Spark logs when using the org.apache.spark:spark-sql_2.11:2.2.0 Java library
> -------------------------------------------------------------------------------------
>
> Key: SPARK-25582
> URL: https://issues.apache.org/jira/browse/SPARK-25582
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 2.2.0
> Reporter: Thomas Brugiere
> Assignee: Apache Spark
> Priority: Major
> Attachments: fileA.csv, fileB.csv, fileC.csv
>
>
> I have noticed an error that appears in the Spark logs when using the Spark SQL library in a Java 8 project.
> When I run the code below with the attached files as input, I can see the ERROR below in the application logs.
> I am using the *org.apache.spark:spark-sql_2.11:2.2.0* library in my Java project
> Note that the same logic implemented with the Python API (pyspark) doesn't produce any Exception like this.
> *Code*
> {code:java}
> SparkConf conf = new SparkConf().setAppName("SparkBug").setMaster("local");
> SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate();
> Dataset<Row> df_a = sparkSession.read().option("header", true).csv("local/fileA.csv").dropDuplicates();
> Dataset<Row> df_b = sparkSession.read().option("header", true).csv("local/fileB.csv").dropDuplicates();
> Dataset<Row> df_c = sparkSession.read().option("header", true).csv("local/fileC.csv").dropDuplicates();
> String[] key_join_1 = new String[]{"colA", "colB", "colC", "colD", "colE", "colF"};
> String[] key_join_2 = new String[]{"colA", "colB", "colC", "colD", "colE"};
> Dataset<Row> df_inventory_1 = df_a.join(df_b, arrayToSeq(key_join_1), "left");
> Dataset<Row> df_inventory_2 = df_inventory_1.join(df_c, arrayToSeq(key_join_2), "left");
> df_inventory_2.show();
> {code}
> *Error message*
> {code:java}
> 18/10/01 09:58:07 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 202, Column 18: Expression "agg_isNull_28" is not an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 202, Column 18: Expression "agg_isNull_28" is not an rvalue
> at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821)
> at org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7170)
> at org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:5332)
> at org.codehaus.janino.UnitCompiler.access$9400(UnitCompiler.java:212)
> at org.codehaus.janino.UnitCompiler$13$1.visitAmbiguousName(UnitCompiler.java:5287)
> at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4053)
> at org.codehaus.janino.UnitCompiler$13.visitLvalue(UnitCompiler.java:5284)
> at org.codehaus.janino.Java$Lvalue.accept(Java.java:3977)
> at org.codehaus.janino.UnitCompiler.getConstantValue(UnitCompiler.java:5280)
> at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2391)
> at org.codehaus.janino.UnitCompiler.access$1900(UnitCompiler.java:212)
> at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1474)
> at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1466)
> at org.codehaus.janino.Java$IfStatement.accept(Java.java:2926)
> at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1466)
> at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1546)
> at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3075)
> at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
> at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
> at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
> at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
> at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)
> at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
> at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
> at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286)
> at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385)
> at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1285)
> at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:825)
> at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:411)
> at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:212)
> at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:390)
> at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:385)
> at org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1405)
> at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385)
> at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:357)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)
> at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446)
> at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
> at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235)
> at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204)
> at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1417)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490)
> at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
> at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
> at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
> at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365)
> at org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:579)
> at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
> at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
> at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
> at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
> at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
> at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
> at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
> at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
> at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
> at SparkBug.main(SparkBug.java:30)
> 18/10/01 09:58:07 INFO CodeGenerator:
> /* 001 */ public Object generate(Object[] references) {
> /* 002 */ return new GeneratedIteratorForCodegenStage6(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ final class GeneratedIteratorForCodegenStage6 extends org.apache.spark.sql.execution.BufferedRowIterator {
> /* 006 */ private Object[] references;
> /* 007 */ private scala.collection.Iterator[] inputs;
> /* 008 */ private boolean agg_initAgg_0;
> /* 009 */ private org.apache.spark.unsafe.KVIterator agg_mapIter_0;
> /* 010 */ private org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap agg_hashMap_0;
> /* 011 */ private org.apache.spark.sql.execution.UnsafeKVExternalSorter agg_sorter_0;
> /* 012 */ private scala.collection.Iterator inputadapter_input_0;
> /* 013 */ private org.apache.spark.sql.execution.joins.UnsafeHashedRelation bhj_relation_0;
> /* 014 */ private org.apache.spark.sql.execution.joins.UnsafeHashedRelation bhj_relation_1;
> /* 015 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[] agg_mutableStateArray_1 = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[8];
> /* 016 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] agg_mutableStateArray_2 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[8];
> /* 017 */ private UnsafeRow[] agg_mutableStateArray_0 = new UnsafeRow[8];
> /* 018 */
> /* 019 */ public GeneratedIteratorForCodegenStage6(Object[] references) {
> /* 020 */ this.references = references;
> /* 021 */ }
> /* 022 */
> /* 023 */ public void init(int index, scala.collection.Iterator[] inputs) {
> /* 024 */ partitionIndex = index;
> /* 025 */ this.inputs = inputs;
> /* 026 */ wholestagecodegen_init_0_0();
> /* 027 */ wholestagecodegen_init_0_1();
> /* 028 */ wholestagecodegen_init_0_2();
> /* 029 */
> /* 030 */ }
> /* 031 */
> /* 032 */ private void wholestagecodegen_init_0_2() {
> /* 033 */ agg_mutableStateArray_0[5] = new UnsafeRow(5);
> /* 034 */ agg_mutableStateArray_1[5] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[5], 160);
> /* 035 */ agg_mutableStateArray_2[5] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[5], 5);
> /* 036 */ agg_mutableStateArray_0[6] = new UnsafeRow(23);
> /* 037 */ agg_mutableStateArray_1[6] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[6], 736);
> /* 038 */ agg_mutableStateArray_2[6] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[6], 23);
> /* 039 */ agg_mutableStateArray_0[7] = new UnsafeRow(18);
> /* 040 */ agg_mutableStateArray_1[7] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[7], 576);
> /* 041 */ agg_mutableStateArray_2[7] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[7], 18);
> /* 042 */
> /* 043 */ }
> /* 044 */
> /* 045 */ private void agg_doAggregateWithKeysOutput_0(UnsafeRow agg_keyTerm_0, UnsafeRow agg_bufferTerm_0)
> /* 046 */ throws java.io.IOException {
> /* 047 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[4] /* numOutputRows */).add(1);
> /* 048 */
> /* 049 */ boolean agg_isNull_22 = agg_keyTerm_0.isNullAt(1);
> /* 050 */ UTF8String agg_value_22 = agg_isNull_22 ? null : (agg_keyTerm_0.getUTF8String(1));
> /* 051 */ boolean agg_isNull_23 = agg_keyTerm_0.isNullAt(6);
> /* 052 */ UTF8String agg_value_23 = agg_isNull_23 ? null : (agg_keyTerm_0.getUTF8String(6));
> /* 053 */ boolean agg_isNull_24 = agg_keyTerm_0.isNullAt(3);
> /* 054 */ UTF8String agg_value_24 = agg_isNull_24 ? null : (agg_keyTerm_0.getUTF8String(3));
> /* 055 */ boolean agg_isNull_25 = agg_keyTerm_0.isNullAt(4);
> /* 056 */ UTF8String agg_value_25 = agg_isNull_25 ? null : (agg_keyTerm_0.getUTF8String(4));
> /* 057 */ boolean agg_isNull_26 = agg_keyTerm_0.isNullAt(2);
> /* 058 */ UTF8String agg_value_26 = agg_isNull_26 ? null : (agg_keyTerm_0.getUTF8String(2));
> /* 059 */ boolean agg_isNull_27 = agg_keyTerm_0.isNullAt(0);
> /* 060 */ UTF8String agg_value_27 = agg_isNull_27 ? null : (agg_keyTerm_0.getUTF8String(0));
> /* 061 */
> /* 062 */ // generate join key for stream side
> /* 063 */
> /* 064 */ agg_mutableStateArray_1[2].reset();
> /* 065 */
> /* 066 */ agg_mutableStateArray_2[2].zeroOutNullBytes();
> /* 067 */
> /* 068 */ if (agg_isNull_22) {
> /* 069 */ agg_mutableStateArray_2[2].setNullAt(0);
> /* 070 */ } else {
> /* 071 */ agg_mutableStateArray_2[2].write(0, agg_value_22);
> /* 072 */ }
> /* 073 */
> /* 074 */ if (agg_isNull_23) {
> /* 075 */ agg_mutableStateArray_2[2].setNullAt(1);
> /* 076 */ } else {
> /* 077 */ agg_mutableStateArray_2[2].write(1, agg_value_23);
> /* 078 */ }
> /* 079 */
> /* 080 */ if (agg_isNull_24) {
> /* 081 */ agg_mutableStateArray_2[2].setNullAt(2);
> /* 082 */ } else {
> /* 083 */ agg_mutableStateArray_2[2].write(2, agg_value_24);
> /* 084 */ }
> /* 085 */
> /* 086 */ if (agg_isNull_25) {
> /* 087 */ agg_mutableStateArray_2[2].setNullAt(3);
> /* 088 */ } else {
> /* 089 */ agg_mutableStateArray_2[2].write(3, agg_value_25);
> /* 090 */ }
> /* 091 */
> /* 092 */ if (agg_isNull_26) {
> /* 093 */ agg_mutableStateArray_2[2].setNullAt(4);
> /* 094 */ } else {
> /* 095 */ agg_mutableStateArray_2[2].write(4, agg_value_26);
> /* 096 */ }
> /* 097 */
> /* 098 */ if (agg_isNull_27) {
> /* 099 */ agg_mutableStateArray_2[2].setNullAt(5);
> /* 100 */ } else {
> /* 101 */ agg_mutableStateArray_2[2].write(5, agg_value_27);
> /* 102 */ }
> /* 103 */ agg_mutableStateArray_0[2].setTotalSize(agg_mutableStateArray_1[2].totalSize());
> /* 104 */
> /* 105 */ // find matches from HashedRelation
> /* 106 */ UnsafeRow bhj_matched_0 = agg_mutableStateArray_0[2].anyNull() ? null: (UnsafeRow)bhj_relation_0.getValue(agg_mutableStateArray_0[2]);
> /* 107 */ final boolean bhj_conditionPassed_0 = true;
> /* 108 */ if (!bhj_conditionPassed_0) {
> /* 109 */ bhj_matched_0 = null;
> /* 110 */ // reset the variables those are already evaluated.
> /* 111 */
> /* 112 */ }
> /* 113 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[7] /* numOutputRows */).add(1);
> /* 114 */
> /* 115 */ // generate join key for stream side
> /* 116 */
> /* 117 */ agg_mutableStateArray_1[5].reset();
> /* 118 */
> /* 119 */ agg_mutableStateArray_2[5].zeroOutNullBytes();
> /* 120 */
> /* 121 */ if (agg_isNull_22) {
> /* 122 */ agg_mutableStateArray_2[5].setNullAt(0);
> /* 123 */ } else {
> /* 124 */ agg_mutableStateArray_2[5].write(0, agg_value_22);
> /* 125 */ }
> /* 126 */
> /* 127 */ if (agg_isNull_23) {
> /* 128 */ agg_mutableStateArray_2[5].setNullAt(1);
> /* 129 */ } else {
> /* 130 */ agg_mutableStateArray_2[5].write(1, agg_value_23);
> /* 131 */ }
> /* 132 */
> /* 133 */ if (agg_isNull_24) {
> /* 134 */ agg_mutableStateArray_2[5].setNullAt(2);
> /* 135 */ } else {
> /* 136 */ agg_mutableStateArray_2[5].write(2, agg_value_24);
> /* 137 */ }
> /* 138 */
> /* 139 */ if (agg_isNull_25) {
> /* 140 */ agg_mutableStateArray_2[5].setNullAt(3);
> /* 141 */ } else {
> /* 142 */ agg_mutableStateArray_2[5].write(3, agg_value_25);
> /* 143 */ }
> /* 144 */
> /* 145 */ if (agg_isNull_26) {
> /* 146 */ agg_mutableStateArray_2[5].setNullAt(4);
> /* 147 */ } else {
> /* 148 */ agg_mutableStateArray_2[5].write(4, agg_value_26);
> /* 149 */ }
> /* 150 */ agg_mutableStateArray_0[5].setTotalSize(agg_mutableStateArray_1[5].totalSize());
> /* 151 */
> /* 152 */ // find matches from HashedRelation
> /* 153 */ UnsafeRow bhj_matched_1 = agg_mutableStateArray_0[5].anyNull() ? null: (UnsafeRow)bhj_relation_1.getValue(agg_mutableStateArray_0[5]);
> /* 154 */ final boolean bhj_conditionPassed_1 = true;
> /* 155 */ if (!bhj_conditionPassed_1) {
> /* 156 */ bhj_matched_1 = null;
> /* 157 */ // reset the variables those are already evaluated.
> /* 158 */
> /* 159 */ }
> /* 160 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[10] /* numOutputRows */).add(1);
> /* 161 */
> /* 162 */ agg_mutableStateArray_1[7].reset();
> /* 163 */
> /* 164 */ agg_mutableStateArray_2[7].zeroOutNullBytes();
> /* 165 */
> /* 166 */ if (agg_isNull_22) {
> /* 167 */ agg_mutableStateArray_2[7].setNullAt(0);
> /* 168 */ } else {
> /* 169 */ agg_mutableStateArray_2[7].write(0, agg_value_22);
> /* 170 */ }
> /* 171 */
> /* 172 */ if (agg_isNull_23) {
> /* 173 */ agg_mutableStateArray_2[7].setNullAt(1);
> /* 174 */ } else {
> /* 175 */ agg_mutableStateArray_2[7].write(1, agg_value_23);
> /* 176 */ }
> /* 177 */
> /* 178 */ if (agg_isNull_24) {
> /* 179 */ agg_mutableStateArray_2[7].setNullAt(2);
> /* 180 */ } else {
> /* 181 */ agg_mutableStateArray_2[7].write(2, agg_value_24);
> /* 182 */ }
> /* 183 */
> /* 184 */ if (agg_isNull_25) {
> /* 185 */ agg_mutableStateArray_2[7].setNullAt(3);
> /* 186 */ } else {
> /* 187 */ agg_mutableStateArray_2[7].write(3, agg_value_25);
> /* 188 */ }
> /* 189 */
> /* 190 */ if (agg_isNull_26) {
> /* 191 */ agg_mutableStateArray_2[7].setNullAt(4);
> /* 192 */ } else {
> /* 193 */ agg_mutableStateArray_2[7].write(4, agg_value_26);
> /* 194 */ }
> /* 195 */
> /* 196 */ if (agg_isNull_27) {
> /* 197 */ agg_mutableStateArray_2[7].setNullAt(5);
> /* 198 */ } else {
> /* 199 */ agg_mutableStateArray_2[7].write(5, agg_value_27);
> /* 200 */ }
> /* 201 */
> /* 202 */ if (agg_isNull_28) {
> /* 203 */ agg_mutableStateArray_2[7].setNullAt(6);
> /* 204 */ } else {
> /* 205 */ agg_mutableStateArray_2[7].write(6, agg_value_28);
> /* 206 */ }
> /* 207 */
> /* 208 */ if (bhj_isNull_19) {
> /* 209 */ agg_mutableStateArray_2[7].setNullAt(7);
> /* 210 */ } else {
> /* 211 */ agg_mutableStateArray_2[7].write(7, bhj_value_19);
> /* 212 */ }
> /* 213 */
> /* 214 */ if (bhj_isNull_21) {
> /* 215 */ agg_mutableStateArray_2[7].setNullAt(8);
> /* 216 */ } else {
> /* 217 */ agg_mutableStateArray_2[7].write(8, bhj_value_21);
> /* 218 */ }
> /* 219 */
> /* 220 */ if (bhj_isNull_23) {
> /* 221 */ agg_mutableStateArray_2[7].setNullAt(9);
> /* 222 */ } else {
> /* 223 */ agg_mutableStateArray_2[7].write(9, bhj_value_23);
> /* 224 */ }
> /* 225 */
> /* 226 */ if (bhj_isNull_25) {
> /* 227 */ agg_mutableStateArray_2[7].setNullAt(10);
> /* 228 */ } else {
> /* 229 */ agg_mutableStateArray_2[7].write(10, bhj_value_25);
> /* 230 */ }
> /* 231 */
> /* 232 */ if (bhj_isNull_27) {
> /* 233 */ agg_mutableStateArray_2[7].setNullAt(11);
> /* 234 */ } else {
> /* 235 */ agg_mutableStateArray_2[7].write(11, bhj_value_27);
> /* 236 */ }
> /* 237 */
> /* 238 */ if (bhj_isNull_29) {
> /* 239 */ agg_mutableStateArray_2[7].setNullAt(12);
> /* 240 */ } else {
> /* 241 */ agg_mutableStateArray_2[7].write(12, bhj_value_29);
> /* 242 */ }
> /* 243 */
> /* 244 */ if (bhj_isNull_31) {
> /* 245 */ agg_mutableStateArray_2[7].setNullAt(13);
> /* 246 */ } else {
> /* 247 */ agg_mutableStateArray_2[7].write(13, bhj_value_31);
> /* 248 */ }
> /* 249 */
> /* 250 */ if (bhj_isNull_68) {
> /* 251 */ agg_mutableStateArray_2[7].setNullAt(14);
> /* 252 */ } else {
> /* 253 */ agg_mutableStateArray_2[7].write(14, bhj_value_68);
> /* 254 */ }
> /* 255 */
> /* 256 */ if (bhj_isNull_70) {
> /* 257 */ agg_mutableStateArray_2[7].setNullAt(15);
> /* 258 */ } else {
> /* 259 */ agg_mutableStateArray_2[7].write(15, bhj_value_70);
> /* 260 */ }
> /* 261 */
> /* 262 */ if (bhj_isNull_72) {
> /* 263 */ agg_mutableStateArray_2[7].setNullAt(16);
> /* 264 */ } else {
> /* 265 */ agg_mutableStateArray_2[7].write(16, bhj_value_72);
> /* 266 */ }
> /* 267 */
> /* 268 */ if (bhj_isNull_74) {
> /* 269 */ agg_mutableStateArray_2[7].setNullAt(17);
> /* 270 */ } else {
> /* 271 */ agg_mutableStateArray_2[7].write(17, bhj_value_74);
> /* 272 */ }
> /* 273 */ agg_mutableStateArray_0[7].setTotalSize(agg_mutableStateArray_1[7].totalSize());
> /* 274 */ append(agg_mutableStateArray_0[7]);
> /* 275 */
> /* 276 */ }
> /* 277 */
> /* 278 */ private void wholestagecodegen_init_0_1() {
> /* 279 */ agg_mutableStateArray_0[2] = new UnsafeRow(6);
> /* 280 */ agg_mutableStateArray_1[2] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[2], 192);
> /* 281 */ agg_mutableStateArray_2[2] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[2], 6);
> /* 282 */ agg_mutableStateArray_0[3] = new UnsafeRow(20);
> /* 283 */ agg_mutableStateArray_1[3] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[3], 640);
> /* 284 */ agg_mutableStateArray_2[3] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[3], 20);
> /* 285 */ agg_mutableStateArray_0[4] = new UnsafeRow(14);
> /* 286 */ agg_mutableStateArray_1[4] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[4], 448);
> /* 287 */ agg_mutableStateArray_2[4] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[4], 14);
> /* 288 */
> /* 289 */ bhj_relation_1 = ((org.apache.spark.sql.execution.joins.UnsafeHashedRelation) ((org.apache.spark.broadcast.TorrentBroadcast) references[8] /* broadcast */).value()).asReadOnlyCopy();
> /* 290 */ incPeakExecutionMemory(bhj_relation_1.estimatedSize());
> /* 291 */
> /* 292 */ org.apache.spark.TaskContext$.MODULE$.get().addTaskCompletionListener(new org.apache.spark.util.TaskCompletionListener() {
> /* 293 */ @Override
> /* 294 */ public void onTaskCompletion(org.apache.spark.TaskContext context) {
> /* 295 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[9] /* avgHashProbe */).set(bhj_relation_1.getAverageProbesPerLookup());
> /* 296 */ }
> /* 297 */ });
> /* 298 */
> /* 299 */ }
> /* 300 */
> /* 301 */ private void agg_doConsume_0(InternalRow inputadapter_row_0, UTF8String agg_expr_0_0, boolean agg_exprIsNull_0_0, UTF8String agg_expr_1_0, boolean agg_exprIsNull_1_0, UTF8String agg_expr_2_0, boolean agg_exprIsNull_2_0, UTF8String agg_expr_3_0, boolean agg_exprIsNull_3_0, UTF8String agg_expr_4_0, boolean agg_exprIsNull_4_0, UTF8String agg_expr_5_0, boolean agg_exprIsNull_5_0, UTF8String agg_expr_6_0, boolean agg_exprIsNull_6_0) throws java.io.IOException {
> /* 302 */ UnsafeRow agg_unsafeRowAggBuffer_0 = null;
> /* 303 */
> /* 304 */ // generate grouping key
> /* 305 */ agg_mutableStateArray_1[0].reset();
> /* 306 */
> /* 307 */ agg_mutableStateArray_2[0].zeroOutNullBytes();
> /* 308 */
> /* 309 */ if (agg_exprIsNull_0_0) {
> /* 310 */ agg_mutableStateArray_2[0].setNullAt(0);
> /* 311 */ } else {
> /* 312 */ agg_mutableStateArray_2[0].write(0, agg_expr_0_0);
> /* 313 */ }
> /* 314 */
> /* 315 */ if (agg_exprIsNull_1_0) {
> /* 316 */ agg_mutableStateArray_2[0].setNullAt(1);
> /* 317 */ } else {
> /* 318 */ agg_mutableStateArray_2[0].write(1, agg_expr_1_0);
> /* 319 */ }
> /* 320 */
> /* 321 */ if (agg_exprIsNull_2_0) {
> /* 322 */ agg_mutableStateArray_2[0].setNullAt(2);
> /* 323 */ } else {
> /* 324 */ agg_mutableStateArray_2[0].write(2, agg_expr_2_0);
> /* 325 */ }
> /* 326 */
> /* 327 */ if (agg_exprIsNull_3_0) {
> /* 328 */ agg_mutableStateArray_2[0].setNullAt(3);
> /* 329 */ } else {
> /* 330 */ agg_mutableStateArray_2[0].write(3, agg_expr_3_0);
> /* 331 */ }
> /* 332 */
> /* 333 */ if (agg_exprIsNull_4_0) {
> /* 334 */ agg_mutableStateArray_2[0].setNullAt(4);
> /* 335 */ } else {
> /* 336 */ agg_mutableStateArray_2[0].write(4, agg_expr_4_0);
> /* 337 */ }
> /* 338 */
> /* 339 */ if (agg_exprIsNull_5_0) {
> /* 340 */ agg_mutableStateArray_2[0].setNullAt(5);
> /* 341 */ } else {
> /* 342 */ agg_mutableStateArray_2[0].write(5, agg_expr_5_0);
> /* 343 */ }
> /* 344 */
> /* 345 */ if (agg_exprIsNull_6_0) {
> /* 346 */ agg_mutableStateArray_2[0].setNullAt(6);
> /* 347 */ } else {
> /* 348 */ agg_mutableStateArray_2[0].write(6, agg_expr_6_0);
> /* 349 */ }
> /* 350 */ agg_mutableStateArray_0[0].setTotalSize(agg_mutableStateArray_1[0].totalSize());
> /* 351 */ int agg_value_14 = 42;
> /* 352 */
> /* 353 */ if (!agg_exprIsNull_0_0) {
> /* 354 */ agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_0_0.getBaseObject(), agg_expr_0_0.getBaseOffset(), agg_expr_0_0.numBytes(), agg_value_14);
> /* 355 */ }
> /* 356 */
> /* 357 */ if (!agg_exprIsNull_1_0) {
> /* 358 */ agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_1_0.getBaseObject(), agg_expr_1_0.getBaseOffset(), agg_expr_1_0.numBytes(), agg_value_14);
> /* 359 */ }
> /* 360 */
> /* 361 */ if (!agg_exprIsNull_2_0) {
> /* 362 */ agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_2_0.getBaseObject(), agg_expr_2_0.getBaseOffset(), agg_expr_2_0.numBytes(), agg_value_14);
> /* 363 */ }
> /* 364 */
> /* 365 */ if (!agg_exprIsNull_3_0) {
> /* 366 */ agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_3_0.getBaseObject(), agg_expr_3_0.getBaseOffset(), agg_expr_3_0.numBytes(), agg_value_14);
> /* 367 */ }
> /* 368 */
> /* 369 */ if (!agg_exprIsNull_4_0) {
> /* 370 */ agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_4_0.getBaseObject(), agg_expr_4_0.getBaseOffset(), agg_expr_4_0.numBytes(), agg_value_14);
> /* 371 */ }
> /* 372 */
> /* 373 */ if (!agg_exprIsNull_5_0) {
> /* 374 */ agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_5_0.getBaseObject(), agg_expr_5_0.getBaseOffset(), agg_expr_5_0.numBytes(), agg_value_14);
> /* 375 */ }
> /* 376 */
> /* 377 */ if (!agg_exprIsNull_6_0) {
> /* 378 */ agg_value_14 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(agg_expr_6_0.getBaseObject(), agg_expr_6_0.getBaseOffset(), agg_expr_6_0.numBytes(), agg_value_14);
> /* 379 */ }
> /* 380 */ if (true) {
> /* 381 */ // try to get the buffer from hash map
> /* 382 */ agg_unsafeRowAggBuffer_0 =
> /* 383 */ agg_hashMap_0.getAggregationBufferFromUnsafeRow(agg_mutableStateArray_0[0], agg_value_14);
> /* 384 */ }
> /* 385 */ // Can't allocate buffer from the hash map. Spill the map and fallback to sort-based
> /* 386 */ // aggregation after processing all input rows.
> /* 387 */ if (agg_unsafeRowAggBuffer_0 == null) {
> /* 388 */ if (agg_sorter_0 == null) {
> /* 389 */ agg_sorter_0 = agg_hashMap_0.destructAndCreateExternalSorter();
> /* 390 */ } else {
> /* 391 */ agg_sorter_0.merge(agg_hashMap_0.destructAndCreateExternalSorter());
> /* 392 */ }
> /* 393 */
> /* 394 */ // the hash map had be spilled, it should have enough memory now,
> /* 395 */ // try to allocate buffer again.
> /* 396 */ agg_unsafeRowAggBuffer_0 = agg_hashMap_0.getAggregationBufferFromUnsafeRow(
> /* 397 */ agg_mutableStateArray_0[0], agg_value_14);
> /* 398 */ if (agg_unsafeRowAggBuffer_0 == null) {
> /* 399 */ // failed to allocate the first page
> /* 400 */ throw new OutOfMemoryError("No enough memory for aggregation");
> /* 401 */ }
> /* 402 */ }
> /* 403 */
> /* 404 */ // common sub-expressions
> /* 405 */
> /* 406 */ // evaluate aggregate function
> /* 407 */
> /* 408 */ // update unsafe row buffer
> /* 409 */
> /* 410 */ }
> /* 411 */
> /* 412 */ private void agg_doAggregateWithKeys_0() throws java.io.IOException {
> /* 413 */ while (inputadapter_input_0.hasNext() && !stopEarly()) {
> /* 414 */ InternalRow inputadapter_row_0 = (InternalRow) inputadapter_input_0.next();
> /* 415 */ boolean inputadapter_isNull_0 = inputadapter_row_0.isNullAt(0);
> /* 416 */ UTF8String inputadapter_value_0 = inputadapter_isNull_0 ? null : (inputadapter_row_0.getUTF8String(0));
> /* 417 */ boolean inputadapter_isNull_1 = inputadapter_row_0.isNullAt(1);
> /* 418 */ UTF8String inputadapter_value_1 = inputadapter_isNull_1 ? null : (inputadapter_row_0.getUTF8String(1));
> /* 419 */ boolean inputadapter_isNull_2 = inputadapter_row_0.isNullAt(2);
> /* 420 */ UTF8String inputadapter_value_2 = inputadapter_isNull_2 ? null : (inputadapter_row_0.getUTF8String(2));
> /* 421 */ boolean inputadapter_isNull_3 = inputadapter_row_0.isNullAt(3);
> /* 422 */ UTF8String inputadapter_value_3 = inputadapter_isNull_3 ? null : (inputadapter_row_0.getUTF8String(3));
> /* 423 */ boolean inputadapter_isNull_4 = inputadapter_row_0.isNullAt(4);
> /* 424 */ UTF8String inputadapter_value_4 = inputadapter_isNull_4 ? null : (inputadapter_row_0.getUTF8String(4));
> /* 425 */ boolean inputadapter_isNull_5 = inputadapter_row_0.isNullAt(5);
> /* 426 */ UTF8String inputadapter_value_5 = inputadapter_isNull_5 ? null : (inputadapter_row_0.getUTF8String(5));
> /* 427 */ boolean inputadapter_isNull_6 = inputadapter_row_0.isNullAt(6);
> /* 428 */ UTF8String inputadapter_value_6 = inputadapter_isNull_6 ? null : (inputadapter_row_0.getUTF8String(6));
> /* 429 */
> /* 430 */ agg_doConsume_0(inputadapter_row_0, inputadapter_value_0, inputadapter_isNull_0, inputadapter_value_1, inputadapter_isNull_1, inputadapter_value_2, inputadapter_isNull_2, inputadapter_value_3, inputadapter_isNull_3, inputadapter_value_4, inputadapter_isNull_4, inputadapter_value_5, inputadapter_isNull_5, inputadapter_value_6, inputadapter_isNull_6);
> /* 431 */ if (shouldStop()) return;
> /* 432 */ }
> /* 433 */
> /* 434 */ agg_mapIter_0 = ((org.apache.spark.sql.execution.aggregate.HashAggregateExec) references[0] /* plan */).finishAggregate(agg_hashMap_0, agg_sorter_0, ((org.apache.spark.sql.execution.metric.SQLMetric) references[1] /* peakMemory */), ((org.apache.spark.sql.execution.metric.SQLMetric) references[2] /* spillSize */), ((org.apache.spark.sql.execution.metric.SQLMetric) references[3] /* avgHashProbe */));
> /* 435 */ }
> /* 436 */
> /* 437 */ protected void processNext() throws java.io.IOException {
> /* 438 */ if (!agg_initAgg_0) {
> /* 439 */ agg_initAgg_0 = true;
> /* 440 */ long wholestagecodegen_beforeAgg_0 = System.nanoTime();
> /* 441 */ agg_doAggregateWithKeys_0();
> /* 442 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[11] /* aggTime */).add((System.nanoTime() - wholestagecodegen_beforeAgg_0) / 1000000);
> /* 443 */ }
> /* 444 */
> /* 445 */ // output the result
> /* 446 */
> /* 447 */ while (agg_mapIter_0.next()) {
> /* 448 */ UnsafeRow agg_aggKey_0 = (UnsafeRow) agg_mapIter_0.getKey();
> /* 449 */ UnsafeRow agg_aggBuffer_0 = (UnsafeRow) agg_mapIter_0.getValue();
> /* 450 */ agg_doAggregateWithKeysOutput_0(agg_aggKey_0, agg_aggBuffer_0);
> /* 451 */
> /* 452 */ if (shouldStop()) return;
> /* 453 */ }
> /* 454 */
> /* 455 */ agg_mapIter_0.close();
> /* 456 */ if (agg_sorter_0 == null) {
> /* 457 */ agg_hashMap_0.free();
> /* 458 */ }
> /* 459 */ }
> /* 460 */
> /* 461 */ private void wholestagecodegen_init_0_0() {
> /* 462 */ agg_hashMap_0 = ((org.apache.spark.sql.execution.aggregate.HashAggregateExec) references[0] /* plan */).createHashMap();
> /* 463 */ inputadapter_input_0 = inputs[0];
> /* 464 */ agg_mutableStateArray_0[0] = new UnsafeRow(7);
> /* 465 */ agg_mutableStateArray_1[0] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[0], 224);
> /* 466 */ agg_mutableStateArray_2[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[0], 7);
> /* 467 */ agg_mutableStateArray_0[1] = new UnsafeRow(7);
> /* 468 */ agg_mutableStateArray_1[1] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_mutableStateArray_0[1], 224);
> /* 469 */ agg_mutableStateArray_2[1] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_mutableStateArray_1[1], 7);
> /* 470 */
> /* 471 */ bhj_relation_0 = ((org.apache.spark.sql.execution.joins.UnsafeHashedRelation) ((org.apache.spark.broadcast.TorrentBroadcast) references[5] /* broadcast */).value()).asReadOnlyCopy();
> /* 472 */ incPeakExecutionMemory(bhj_relation_0.estimatedSize());
> /* 473 */
> /* 474 */ org.apache.spark.TaskContext$.MODULE$.get().addTaskCompletionListener(new org.apache.spark.util.TaskCompletionListener() {
> /* 475 */ @Override
> /* 476 */ public void onTaskCompletion(org.apache.spark.TaskContext context) {
> /* 477 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[6] /* avgHashProbe */).set(bhj_relation_0.getAverageProbesPerLookup());
> /* 478 */ }
> /* 479 */ });
> /* 480 */
> /* 481 */ }
> /* 482 */
> /* 483 */ }
> 18/10/01 09:58:07 WARN WholeStageCodegenExec: Whole-stage codegen disabled for plan (id=6):
> *(6) Project [colA#10, colB#11, colC#12, colD#13, colE#14, colF#15, colG#16, colH#41, ColI#42, colJ#43, colK#44, colL#45, colM#46, colN#47, colP#77, colQ#78, colR#79, colS#80]
> +- *(6) BroadcastHashJoin [colA#10, colB#11, colC#12, colD#13, colE#14], [colA#72, colB#73, colC#74, colD#75, colE#76], LeftOuter, BuildRight
> :- *(6) Project [colA#10, colB#11, colC#12, colD#13, colE#14, colF#15, colG#16, colH#41, ColI#42, colJ#43, colK#44, colL#45, colM#46, colN#47]
> : +- *(6) BroadcastHashJoin [colA#10, colB#11, colC#12, colD#13, colE#14, colF#15], [colA#35, colB#36, colC#37, colD#38, colE#39, colF#40], LeftOuter, BuildRight
> : :- *(6) HashAggregate(keys=[colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11], functions=[], output=[colA#10, colB#11, colC#12, colD#13, colE#14, colF#15, colG#16])
> : : +- Exchange hashpartitioning(colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11, 200)
> : : +- *(1) HashAggregate(keys=[colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11], functions=[], output=[colF#15, colA#10, colE#14, colC#12, colD#13, colG#16, colB#11])
> : : +- *(1) FileScan csv [colA#10,colB#11,colC#12,colD#13,colE#14,colF#15,colG#16] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/tbrugier/projects/analytics/dev/spark-bug/local/fileA.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<colA:string,colB:string,colC:string,colD:string,colE:string,colF:string,colG:string>
> : +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true], input[1, string, true], input[2, string, true], input[3, string, true], input[4, string, true], input[5, string, true]))
> : +- *(3) HashAggregate(keys=[colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44], functions=[], output=[colA#35, colB#36, colC#37, colD#38, colE#39, colF#40, colH#41, ColI#42, colJ#43, colK#44, colL#45, colM#46, colN#47])
> : +- Exchange hashpartitioning(colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44, 200)
> : +- *(2) HashAggregate(keys=[colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44], functions=[], output=[colF#40, colL#45, colA#35, colH#41, colE#39, colM#46, colC#37, colN#47, colD#38, colJ#43, ColI#42, colB#36, colK#44])
> : +- *(2) FileScan csv [colA#35,colB#36,colC#37,colD#38,colE#39,colF#40,colH#41,ColI#42,colJ#43,colK#44,colL#45,colM#46,colN#47] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/tbrugier/projects/analytics/dev/spark-bug/local/fileB.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<colA:string,colB:string,colC:string,colD:string,colE:string,colF:string,colH:string,ColI:s...
> +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true], input[1, string, true], input[2, string, true], input[3, string, true], input[4, string, true]))
> +- *(5) HashAggregate(keys=[colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73], functions=[], output=[colA#72, colB#73, colC#74, colD#75, colE#76, colP#77, colQ#78, colR#79, colS#80])
> +- Exchange hashpartitioning(colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73, 200)
> +- *(4) HashAggregate(keys=[colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73], functions=[], output=[colR#79, colA#72, colE#76, colP#77, colC#74, colQ#78, colD#75, colS#80, colB#73])
> +- *(4) FileScan csv [colA#72,colB#73,colC#74,colD#75,colE#76,colP#77,colQ#78,colR#79,colS#80] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/Users/tbrugier/projects/analytics/dev/spark-bug/local/fileC.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<colA:string,colB:string,colC:string,colD:string,colE:string,colP:string,colQ:string,colR:s...
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org