You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/26 20:15:10 UTC

[GitHub] [hudi] zhedoubushishi opened a new pull request #2208: [HUDI-1040] Make Hudi support Spark 3

zhedoubushishi opened a new pull request #2208:
URL: https://github.com/apache/hudi/pull/2208


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   This CR is aim to make Hudi support Spark 3.
   
   After manually changing scala.version to 2.12.10 & scala.binary.version to 2.12. You can use 
   ```mvn clean install -Dscala-2.12 -Dspark3``` to let Hudi compile with Spark 3.
   
   And after modified the corresponding travis file to Spark 3, we can see we can pass all the tests except hudi-integ-test which is as expected:
   https://travis-ci.org/github/zhedoubushishi/hudi/builds/738681227
   
   
   ## Brief change log
   
     - Modified several spark APIs to make them both compatible with Spark 2 & Spark 3.
     - Create a new module named ```hudi-spark2``` which has all the files (all related to bulk insert v2) that needs to be compile with Spark2.
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517718191



##########
File path: pom.xml
##########
@@ -1318,6 +1325,23 @@
         </plugins>
       </build>
     </profile>
+
+    <profile>
+      <id>spark3</id>
+      <properties>
+        <spark.version>${spark3.version}</spark.version>

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r533108221



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -113,9 +113,6 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
       hadoopConf = sqlContext.sparkSession.sessionState.newHadoopConf()
     )
 
-    // Follow the implementation of Spark internal HadoopRDD to handle the broadcast configuration.

Review comment:
       ```SparkHadoopUtil``` becomes ```private``` in spark 3, so I need to see if these lines are necessary.
   
   The reason this is needed in internal implementation is to guard against case where user passes a custom Configuration which doesn't contain credentials to access secure HDFS (https://github.com/apache/spark/pull/2676). Since the Configuration being used here was created as part of spark context the credentials should already be loaded. So we can remove it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r516418986



##########
File path: packaging/hudi-utilities-bundle/pom.xml
##########
@@ -105,6 +107,7 @@
                   <include>io.prometheus:simpleclient_common</include>
                   <include>com.yammer.metrics:metrics-core</include>
                   <include>org.apache.spark:spark-streaming-kafka-0-10_${scala.binary.version}</include>
+                  <include>org.apache.spark:spark-token-provider-kafka-0-10_${scala.binary.version}</include>

Review comment:
       Seems like its possible that `scala.binary.version` is `2.11` when compiling by default and it can conflict here because spark 3 only uses `2.12` ? We should probably override scala versions as well by default with the spark3 maven profile so that such scenarios do not happen.

##########
File path: pom.xml
##########
@@ -100,6 +104,7 @@
     <prometheus.version>0.8.0</prometheus.version>
     <http.version>4.4.1</http.version>
     <spark.version>2.4.4</spark.version>
+    <spark2.version>2.4.4</spark2.version>

Review comment:
       If any dependency or property is configured in both parent and child POMs with different values then the child POM value will take the priority. 

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
##########
@@ -173,4 +176,17 @@ public static InternalRow getInternalRowWithError(String partitionPath) {
         .withBulkInsertParallelism(2);
   }
 
+  private static InternalRow serializeRow(ExpressionEncoder encoder, Row row)
+      throws InvocationTargetException, IllegalAccessException, NoSuchMethodException, ClassNotFoundException {
+    // TODO remove reflection if Spark 2.x support is dropped
+    if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) {
+      Method spark2method = encoder.getClass().getMethod("toRow", Object.class);
+      return (InternalRow) spark2method.invoke(encoder, row);

Review comment:
       It might make sense to create `Spark2RowSerializer` and `Spark3RowSerializer` similar to the implementations we have created for deserializers.

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -96,4 +99,13 @@ object AvroConversionUtils {
     val name = HoodieAvroUtils.sanitizeName(tableName)
     (s"${name}_record", s"hoodie.${name}")
   }
+
+  def createDeserializer(encoder: ExpressionEncoder[Row]): SparkRowDeserializer = {

Review comment:
       I think `HoodieSparkUtils` is a more appropriate place for this function.

##########
File path: hudi-spark2/src/main/scala/org/apache/hudi/DataSourceOptionsForSpark2.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
+
+/**
+  * Options supported for writing hoodie tables.
+  * TODO: This file is partially copied from org.apache.hudi.DataSourceWriteOptions.
+  * Should be removed if Spark 2.x support is dropped.
+  */

Review comment:
       The javadoc formatting is off at various places in this class.

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -121,6 +122,9 @@ private[hudi] object HoodieSparkSqlWriter {
       // short-circuit if bulk_insert via row is enabled.
       // scalastyle:off
       if (parameters(ENABLE_ROW_WRITER_OPT_KEY).toBoolean) {
+        if (SPARK_VERSION.startsWith("3.")) {
+          throw new HoodieException("Bulk insert via row is not compatible with Spark 3, it is only compatible with Spark 2!")
+        }

Review comment:
       Yeah. Anyways I think this message can changed to: `Bulk insert using row writer is not supported with Spark 3. To use row writer switch to spark 2.`.

##########
File path: pom.xml
##########
@@ -1318,6 +1325,23 @@
         </plugins>
       </build>
     </profile>
+
+    <profile>
+      <id>spark3</id>
+      <properties>
+        <spark.version>${spark3.version}</spark.version>

Review comment:
       override scala versions here ?

##########
File path: hudi-spark2/src/main/java/org/apache/hudi/internal/DefaultSource.java
##########
@@ -18,7 +18,7 @@
 
 package org.apache.hudi.internal;
 
-import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceUtilsForSpark2;

Review comment:
       I had misunderstood that you moved `DefaultSource.scala` which is the main datasource implementation. But seems like you have moved the internal datasource implementation used for bulk insert v2. So it seems fine to me.

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieMergeOnReadTestUtils.java
##########
@@ -85,7 +85,7 @@
         .collect(Collectors.toList()));
 
     return inputPaths.stream().map(path -> {
-      setInputPath(jobConf, path);
+      FileInputFormat.setInputPaths(jobConf, path);

Review comment:
       Me and discussed discussed it internally and this is not a concern anymore.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau removed a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau removed a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-732523090


   I tried to run deltastreamer with sqltransformer 
   
   Hi everyone,
   I am running spark3 https://github.com/apache/hudi/pull/2208
   with deltastreamer and sqltranformer for debezium data
   ``` 
   spark-submit \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   --driver-memory 2g \
   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
   --conf spark.sql.hive.convertMetastoreParquet=false \
   --packages org.apache.spark:spark-avro_2.12:3.0.1 \
   ~/workspace/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar \
   --table-type MERGE_ON_READ \
   --source-ordering-field ts_ms \
   --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --target-base-path /Users/users/Downloads/roi/debezium/by_test/ \
   --target-table users \
   --props ./hudi_base.properties \
   --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   hoodie.upsert.shuffle.parallelism=2
   hoodie.insert.shuffle.parallelism=2
   hoodie.bulkinsert.shuffle.parallelism=2
   # Key fields, for kafka example
   hoodie.datasource.write.storage.type=MERGE_ON_READ
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=ts_ms
   hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
   # schema provider configs
   hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/dbz1.by_test.users-value/versions/latest
   #Kafka props
   hoodie.deltastreamer.source.kafka.topic=dbz1.by_test.users
   metadata.broker.list=localhost:9092
   bootstrap.servers=localhost:9092
   auto.offset.reset=earliest
   schema.registry.url=http://localhost:8081
   hoodie.deltastreamer.transformer.sql=SELECT ts_ms, op, after.* FROM <SRC> WHERE op IN ('u', 'c')
   ```
   
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x000000010f4cbad0, pid=33960, tid=0x0000000000013e03
   #
   # JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-b01)
   # Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode bsd-amd64 compressed oops)
   # Problematic frame:
   # V  [libjvm.dylib+0xcbad0]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-737427777


   @giaosudau No problem at all.  Glad it got resolved
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-736838816


   @zhedoubushishi let me take a pass at this again. thanks for updating. 
   
   @giaosudau Need to reproduce this myself to comment further. Does the crash happen deterministically when you try to write? @zhedoubushishi any thoughts? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sbernauer commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
sbernauer commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r524351032



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -121,6 +122,9 @@ private[hudi] object HoodieSparkSqlWriter {
       // short-circuit if bulk_insert via row is enabled.
       // scalastyle:off
       if (parameters(ENABLE_ROW_WRITER_OPT_KEY).toBoolean) {
+        if (SPARK_VERSION.startsWith("3.")) {

Review comment:
       Hi, checking for `if (!SPARK_VERSION.startsWith("2.")) {` would be more robust because of future relases (e.g. Spark 4)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-733747785


   @vinothchandar 
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGBUS (0xa) at pc=0x00000001090cbad0, pid=32045, tid=0x0000000000013d03
   #
   # JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-b01)
   # Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode bsd-amd64 compressed oops)
   # Problematic frame:
   # V  [libjvm.dylib+0xcbad0]  acl_CopyRight+0x29
   #
   # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
   #
   # If you would like to submit a bug report, please visit:
   #   http://bugreport.java.com/bugreport/crash.jsp
   #
   
   ---------------  T H R E A D  ---------------
   
   Current thread (0x00007ff8d78b4800):  JavaThread "Executor task launch worker for task 1" daemon [_thread_in_vm, id=81155, stack(0x00007000124e0000,0x00007000125e0000)]
   
   siginfo: si_signo: 10 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00000007fb74b220
   
   Registers:
   RAX=0x00000007fb84b218, RBX=0x00000007fb74b220, RCX=0x0000000742648aa8, RDX=0xfffffffffffe0004
   RSP=0x00007000125de5a8, RBP=0x00007000125de5f0, RSI=0x0000000742548ab0, RDI=0x00000007fb74b220
   R8 =0x00007000125de620, R9 =0x0000000000000010, R10=0x000000010e547c67, R11=0x000000010e547c18
   R12=0x00000000653849e8, R13=0x00007000125de620, R14=0x0000000742548aa0, R15=0x00000007963c6838
   RIP=0x00000001090cbad0, EFLAGS=0x0000000000010282, ERR=0x0000000000000004
     TRAPNO=0x000000000000000e
   
   Top of Stack: (sp=0x00007000125de5a8)
   0x00007000125de5a8:   0000000109573643 00007ff8d78b4800
   0x00007000125de5b8:   00007ff8d78b4800 0000000000000010
   0x00007000125de5c8:   000000010a97fdb8 0000000000000000
   0x00007000125de5d8:   00007000125de660 00007000125de740
   0x00007000125de5e8:   00007ff8d78b4800 00007000125de640
   0x00007000125de5f8:   000000010e547ce1 0000000000100000
   0x00007000125de608:   0000000740023378 00000007963c6838
   0x00007000125de618:   0000000000000000 0000000742548aa0
   0x00007000125de628:   0000000000000000 0000000000000000
   0x00007000125de638:   0000000000000000 00007000125de6e8
   0x00007000125de648:   000000010e07dffd 0000000000000009
   0x00007000125de658:   000000010e07dffd 0000000000100000
   0x00007000125de668:   00007ff8d78b4800 0000000000000010
   0x00007000125de678:   000000010e07db10 0000000742548aa0
   0x00007000125de688:   00000000653849e8 0000000124168f98
   0x00007000125de698:   00000007963c6838 0000000740023378
   0x00007000125de6a8:   00007000125de6a8 000000010ddb7fc2
   0x00007000125de6b8:   00007000125de740 000000010dde9028
   0x00007000125de6c8:   0000000125c73fe0 000000010ddb8088
   0x00007000125de6d8:   00007000125de660 00007000125de750
   0x00007000125de6e8:   00007000125de890 000000010f4e2cdc
   0x00007000125de6f8:   0000000000100000 0000000000000000
   0x00007000125de708:   000000004f636670 000000010ddb8088
   0x00007000125de718:   0000000000000010 0000000742548aa0
   0x00007000125de728:   0000000742548aa0 00000000653849e8
   0x00007000125de738:   0000000000000202 00000007963c6838
   0x00007000125de748:   000000010f4e2cdc 00000007c09086c8
   0x00007000125de758:   00000007c09086c8 0000000124168f98
   0x00007000125de768:   00007ff8d78b4800 0000000124168f98
   0x00007000125de778:   00007ff8d78b4800 0000000200000000
   0x00007000125de788:   0000000000000000 0000000000000000
   0x00007000125de798:   0000000742548aa0 00007000125de810 
   
   Instructions: (pc=0x00000001090cbad0)
   0x00000001090cbab0:   45 48 8b 74 d0 08 48 89 74 d1 08 48 83 c2 01 75
   0x00000001090cbac0:   f0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
   0x00000001090cbad0:   48 8b 74 d0 e8 48 89 74 d1 e8 48 8b 74 d0 f0 48
   0x00000001090cbae0:   89 74 d1 f0 48 8b 74 d0 f8 48 89 74 d1 f8 48 8b 
   
   Register to memory mapping:
   
   RAX=0x00000007fb84b218 is an unknown value
   RBX=0x00000007fb74b220 is an unknown value
   RCX=0x0000000742648aa8 is pointing into object: 0x0000000742548aa0
   [B 
    - klass: {type array byte}
    - length: 1331914352
   RDX=0xfffffffffffe0004 is an unknown value
   RSP=0x00007000125de5a8 is pointing into the stack for thread: 0x00007ff8d78b4800
   RBP=0x00007000125de5f0 is pointing into the stack for thread: 0x00007ff8d78b4800
   RSI=0x0000000742548ab0 is pointing into object: 0x0000000742548aa0
   [B 
    - klass: {type array byte}
    - length: 1331914352
   RDI=0x00000007fb74b220 is an unknown value
   R8 =0x00007000125de620 is pointing into the stack for thread: 0x00007ff8d78b4800
   R9 =0x0000000000000010 is an unknown value
   R10=0x000000010e547c67 is at entry_point+103 in (nmethod*)0x000000010e547a90
   R11=0x000000010e547c18 is at entry_point+24 in (nmethod*)0x000000010e547a90
   R12=0x00000000653849e8 is an unknown value
   R13=0x00007000125de620 is pointing into the stack for thread: 0x00007ff8d78b4800
   R14=0x0000000742548aa0 is an oop
   [B 
    - klass: {type array byte}
    - length: 1331914352
   R15=0x00000007963c6838 is an oop
   [B 
    - klass: {type array byte}
    - length: 1088
   
   
   Stack: [0x00007000124e0000,0x00007000125e0000],  sp=0x00007000125de5a8,  free space=1017k
   Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
   V  [libjvm.dylib+0xcbad0]  acl_CopyRight+0x29
   J 4313  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x000000010e547ce1 [0x000000010e547c00+0xe1]
   j  org.apache.spark.unsafe.Platform.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V+34
   J 6963 C1 org.apache.spark.unsafe.types.UTF8String.getBytes()[B (81 bytes) @ 0x000000010f4e2cdc [0x000000010f4e2920+0x3bc]
   J 6962 C1 org.apache.spark.unsafe.types.UTF8String.toString()Ljava/lang/String; (15 bytes) @ 0x000000010f4e2464 [0x000000010f4e2320+0x144]
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.If_3$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/String;+205
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow_0_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;[Ljava/lang/Object;)V+87
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow_3_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;[Ljava/lang/Object;)V+57
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Ljava/lang/Object;)Ljava/lang/Object;+14
   j  org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/Object;+29
   j  org.apache.hudi.Spark3RowDeserializer.deserializeRow(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+5
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2
   j  org.apache.hudi.HoodieSparkUtils$$$Lambda$2342.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
   J 6454 C1 scala.collection.Iterator$$anon$10.next()Ljava/lang/Object; (19 bytes) @ 0x000000010f3b3424 [0x000000010f3b3260+0x1c4]
   J 6454 C1 scala.collection.Iterator$$anon$10.next()Ljava/lang/Object; (19 bytes) @ 0x000000010f3b3374 [0x000000010f3b3260+0x114]
   J 9244 C1 scala.collection.Iterator$SliceIterator.next()Ljava/lang/Object; (70 bytes) @ 0x000000010fcfad54 [0x000000010fcfa620+0x734]
   J 9438 C2 scala.collection.AbstractIterator.foreach(Lscala/Function1;)V (6 bytes) @ 0x000000010fd98904 [0x000000010fd988a0+0x64]
   J 6854 C2 scala.collection.generic.Growable.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/generic/Growable; (34 bytes) @ 0x000000010f4b01f8 [0x000000010f4b0000+0x1f8]
   J 6505 C1 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/mutable/ArrayBuffer; (74 bytes) @ 0x000000010f3cb754 [0x000000010f3cae80+0x8d4]
   J 6812 C1 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/generic/Growable; (6 bytes) @ 0x000000010f4930f4 [0x000000010f493000+0xf4]
   j  scala.collection.TraversableOnce.to(Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;+14
   j  scala.collection.TraversableOnce.to$(Lscala/collection/TraversableOnce;Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;+2
   j  scala.collection.AbstractIterator.to(Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;+2
   j  scala.collection.TraversableOnce.toBuffer()Lscala/collection/mutable/Buffer;+7
   j  scala.collection.TraversableOnce.toBuffer$(Lscala/collection/TraversableOnce;)Lscala/collection/mutable/Buffer;+1
   j  scala.collection.AbstractIterator.toBuffer()Lscala/collection/mutable/Buffer;+1
   j  scala.collection.TraversableOnce.toArray(Lscala/reflect/ClassTag;)Ljava/lang/Object;+33
   j  scala.collection.TraversableOnce.toArray$(Lscala/collection/TraversableOnce;Lscala/reflect/ClassTag;)Ljava/lang/Object;+2
   j  scala.collection.AbstractIterator.toArray(Lscala/reflect/ClassTag;)Ljava/lang/Object;+2
   j  org.apache.spark.rdd.RDD.$anonfun$take$2(Lorg/apache/spark/rdd/RDD;ILscala/collection/Iterator;)Ljava/lang/Object;+11
   j  org.apache.spark.rdd.RDD$$Lambda$1378.apply(Ljava/lang/Object;)Ljava/lang/Object;+12
   j  org.apache.spark.SparkContext.$anonfun$runJob$5(Lscala/Function1;Lorg/apache/spark/TaskContext;Lscala/collection/Iterator;)Ljava/lang/Object;+2
   j  org.apache.spark.SparkContext$$Lambda$1379.apply(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+12
   j  org.apache.spark.scheduler.ResultTask.runTask(Lorg/apache/spark/TaskContext;)Ljava/lang/Object;+206
   j  org.apache.spark.scheduler.Task.run(JILorg/apache/spark/metrics/MetricsSystem;Lscala/collection/immutable/Map;)Ljava/lang/Object;+215
   j  org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Lorg/apache/spark/executor/Executor$TaskRunner;Lscala/runtime/BooleanRef;)Ljava/lang/Object;+32
   j  org.apache.spark.executor.Executor$TaskRunner$$Lambda$1344.apply()Ljava/lang/Object;+8
   j  org.apache.spark.util.Utils$.tryWithSafeFinally(Lscala/Function0;Lscala/Function0;)Ljava/lang/Object;+4
   j  org.apache.spark.executor.Executor$TaskRunner.run()V+421
   j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
   j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
   j  java.lang.Thread.run()V+11
   v  ~StubRoutines::call_stub
   V  [libjvm.dylib+0x2c7e0f]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x691
   V  [libjvm.dylib+0x2c6c5b]  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x145
   V  [libjvm.dylib+0x2c6e37]  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x57
   V  [libjvm.dylib+0x33e001]  thread_entry(JavaThread*, Thread*)+0x78
   V  [libjvm.dylib+0x559256]  JavaThread::thread_main_inner()+0x82
   V  [libjvm.dylib+0x55911b]  JavaThread::run()+0x169
   V  [libjvm.dylib+0x487e8f]  java_start(Thread*)+0xfa
   C  [libsystem_pthread.dylib+0x6109]  _pthread_start+0x94
   C  [libsystem_pthread.dylib+0x1b8b]  thread_start+0xf
   C  0x0000000000000000
   
   Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
   J 4313  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x000000010e547c67 [0x000000010e547c00+0x67]
   j  org.apache.spark.unsafe.Platform.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V+34
   J 6963 C1 org.apache.spark.unsafe.types.UTF8String.getBytes()[B (81 bytes) @ 0x000000010f4e2cdc [0x000000010f4e2920+0x3bc]
   J 6962 C1 org.apache.spark.unsafe.types.UTF8String.toString()Ljava/lang/String; (15 bytes) @ 0x000000010f4e2464 [0x000000010f4e2320+0x144]
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.If_3$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/String;+205
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow_0_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;[Ljava/lang/Object;)V+87
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow_3_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;[Ljava/lang/Object;)V+57
   j  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Ljava/lang/Object;)Ljava/lang/Object;+14
   j  org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Ljava/lang/Object;+29
   j  org.apache.hudi.Spark3RowDeserializer.deserializeRow(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+5
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2
   j  org.apache.hudi.HoodieSparkUtils$$$Lambda$2342.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
   J 6454 C1 scala.collection.Iterator$$anon$10.next()Ljava/lang/Object; (19 bytes) @ 0x000000010f3b3424 [0x000000010f3b3260+0x1c4]
   J 6454 C1 scala.collection.Iterator$$anon$10.next()Ljava/lang/Object; (19 bytes) @ 0x000000010f3b3374 [0x000000010f3b3260+0x114]
   J 9244 C1 scala.collection.Iterator$SliceIterator.next()Ljava/lang/Object; (70 bytes) @ 0x000000010fcfad54 [0x000000010fcfa620+0x734]
   J 9438 C2 scala.collection.AbstractIterator.foreach(Lscala/Function1;)V (6 bytes) @ 0x000000010fd98904 [0x000000010fd988a0+0x64]
   J 6854 C2 scala.collection.generic.Growable.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/generic/Growable; (34 bytes) @ 0x000000010f4b01f8 [0x000000010f4b0000+0x1f8]
   J 6505 C1 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/mutable/ArrayBuffer; (74 bytes) @ 0x000000010f3cb754 [0x000000010f3cae80+0x8d4]
   J 6812 C1 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/generic/Growable; (6 bytes) @ 0x000000010f4930f4 [0x000000010f493000+0xf4]
   j  scala.collection.TraversableOnce.to(Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;+14
   j  scala.collection.TraversableOnce.to$(Lscala/collection/TraversableOnce;Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;+2
   j  scala.collection.AbstractIterator.to(Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;+2
   j  scala.collection.TraversableOnce.toBuffer()Lscala/collection/mutable/Buffer;+7
   j  scala.collection.TraversableOnce.toBuffer$(Lscala/collection/TraversableOnce;)Lscala/collection/mutable/Buffer;+1
   j  scala.collection.AbstractIterator.toBuffer()Lscala/collection/mutable/Buffer;+1
   j  scala.collection.TraversableOnce.toArray(Lscala/reflect/ClassTag;)Ljava/lang/Object;+33
   j  scala.collection.TraversableOnce.toArray$(Lscala/collection/TraversableOnce;Lscala/reflect/ClassTag;)Ljava/lang/Object;+2
   j  scala.collection.AbstractIterator.toArray(Lscala/reflect/ClassTag;)Ljava/lang/Object;+2
   j  org.apache.spark.rdd.RDD.$anonfun$take$2(Lorg/apache/spark/rdd/RDD;ILscala/collection/Iterator;)Ljava/lang/Object;+11
   j  org.apache.spark.rdd.RDD$$Lambda$1378.apply(Ljava/lang/Object;)Ljava/lang/Object;+12
   j  org.apache.spark.SparkContext.$anonfun$runJob$5(Lscala/Function1;Lorg/apache/spark/TaskContext;Lscala/collection/Iterator;)Ljava/lang/Object;+2
   j  org.apache.spark.SparkContext$$Lambda$1379.apply(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+12
   j  org.apache.spark.scheduler.ResultTask.runTask(Lorg/apache/spark/TaskContext;)Ljava/lang/Object;+206
   j  org.apache.spark.scheduler.Task.run(JILorg/apache/spark/metrics/MetricsSystem;Lscala/collection/immutable/Map;)Ljava/lang/Object;+215
   j  org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Lorg/apache/spark/executor/Executor$TaskRunner;Lscala/runtime/BooleanRef;)Ljava/lang/Object;+32
   j  org.apache.spark.executor.Executor$TaskRunner$$Lambda$1344.apply()Ljava/lang/Object;+8
   j  org.apache.spark.util.Utils$.tryWithSafeFinally(Lscala/Function0;Lscala/Function0;)Ljava/lang/Object;+4
   j  org.apache.spark.executor.Executor$TaskRunner.run()V+421
   j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
   j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
   j  java.lang.Thread.run()V+11
   v  ~StubRoutines::call_stub
   
   ---------------  P R O C E S S  ---------------
   
   Java Threads: ( => current thread )
     0x00007ff9276f6800 JavaThread "block-manager-ask-thread-pool-1" daemon [_thread_blocked, id=77827, stack(0x0000700012d23000,0x0000700012e23000)]
     0x00007ff92b7eb000 JavaThread "block-manager-ask-thread-pool-0" daemon [_thread_blocked, id=45827, stack(0x0000700012c20000,0x0000700012d20000)]
     0x00007ff92a5cf800 JavaThread "block-manager-slave-async-thread-pool-2" daemon [_thread_blocked, id=45571, stack(0x0000700012b1d000,0x0000700012c1d000)]
     0x00007ff92a5d7800 JavaThread "block-manager-slave-async-thread-pool-1" daemon [_thread_blocked, id=45315, stack(0x0000700012a1a000,0x0000700012b1a000)]
     0x00007ff9276f2000 JavaThread "block-manager-slave-async-thread-pool-0" daemon [_thread_blocked, id=78851, stack(0x0000700012917000,0x0000700012a17000)]
     0x00007ff9276eb800 JavaThread "task-result-getter-0" daemon [_thread_blocked, id=79375, stack(0x0000700012814000,0x0000700012914000)]
     0x00007ff8c7008800 JavaThread "process reaper" daemon [_thread_blocked, id=79619, stack(0x00007000127e9000,0x0000700012811000)]
     0x00007ff8c7daa800 JavaThread "rpc-server-4-1" daemon [_thread_in_native, id=80131, stack(0x00007000126e6000,0x00007000127e6000)]
     0x00007ff92a19a000 JavaThread "files-client-8-1" daemon [_thread_in_native, id=80643, stack(0x00007000125e3000,0x00007000126e3000)]
   =>0x00007ff8d78b4800 JavaThread "Executor task launch worker for task 1" daemon [_thread_in_vm, id=81155, stack(0x00007000124e0000,0x00007000125e0000)]
     0x00007ff927546800 JavaThread "Keep-Alive-Timer" daemon [_thread_blocked, id=81667, stack(0x00007000123dd000,0x00007000124dd000)]
     0x00007ff928bda000 JavaThread "spark-listener-group-shared" daemon [_thread_blocked, id=82179, stack(0x00007000122da000,0x00007000123da000)]
     0x00007ff92719d000 JavaThread "element-tracking-store-worker" daemon [_thread_blocked, id=82691, stack(0x00007000121d7000,0x00007000122d7000)]
     0x00007ff928aa4000 JavaThread "spark-listener-group-executorManagement" daemon [_thread_blocked, id=83203, stack(0x00007000120d4000,0x00007000121d4000)]
     0x00007ff92a30b000 JavaThread "spark-listener-group-appStatus" daemon [_thread_blocked, id=83715, stack(0x0000700011fd1000,0x00007000120d1000)]
     0x00007ff9271a2800 JavaThread "context-cleaner-periodic-gc" daemon [_thread_blocked, id=84227, stack(0x0000700011ece000,0x0000700011fce000)]
     0x00007ff8c703b800 JavaThread "Spark Context Cleaner" daemon [_thread_blocked, id=84739, stack(0x0000700011dcb000,0x0000700011ecb000)]
     0x00007ff928aa6800 JavaThread "shuffle-boss-6-1" daemon [_thread_in_native, id=44547, stack(0x0000700011cc8000,0x0000700011dc8000)]
     0x00007ff8f0384000 JavaThread "executor-heartbeater" daemon [_thread_blocked, id=44035, stack(0x0000700011bc5000,0x0000700011cc5000)]
     0x00007ff92719b800 JavaThread "driver-heartbeater" daemon [_thread_blocked, id=43779, stack(0x0000700011ac2000,0x0000700011bc2000)]
     0x00007ff92823e000 JavaThread "dag-scheduler-event-loop" daemon [_thread_blocked, id=85763, stack(0x00007000119bf000,0x0000700011abf000)]
     0x00007ff92711d000 JavaThread "Timer-1" daemon [_thread_blocked, id=86275, stack(0x00007000118bc000,0x00007000119bc000)]
     0x00007ff928232000 JavaThread "Timer-0" daemon [_thread_blocked, id=86787, stack(0x00007000117b9000,0x00007000118b9000)]
     0x00007ff927a66800 JavaThread "netty-rpc-env-timeout" daemon [_thread_blocked, id=43523, stack(0x00007000116b6000,0x00007000117b6000)]
     0x00007ff92a1ab800 JavaThread "heartbeat-receiver-event-loop-thread" daemon [_thread_blocked, id=31747, stack(0x00007000115b3000,0x00007000116b3000)]
     0x00007ff927118800 JavaThread "SparkUI-47" daemon [_thread_blocked, id=32003, stack(0x00007000114b0000,0x00007000115b0000)]
     0x00007ff927a5f800 JavaThread "SparkUI-46" daemon [_thread_blocked, id=32259, stack(0x00007000113ad000,0x00007000114ad000)]
     0x00007ff927321000 JavaThread "SparkUI-45-acceptor-1@39fd7bba-Spark@76672d90{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" daemon [_thread_blocked, id=30723, stack(0x00007000112aa000,0x00007000113aa000)]
     0x00007ff927a51800 JavaThread "SparkUI-44-acceptor-0@7258a703-ServerConnector@76672d90{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" daemon [_thread_in_native, id=33027, stack(0x00007000111a7000,0x00007000112a7000)]
     0x00007ff8c7db2000 JavaThread "SparkUI-43" daemon [_thread_in_native, id=33539, stack(0x00007000110a4000,0x00007000111a4000)]
     0x00007ff8c7db1000 JavaThread "SparkUI-42" daemon [_thread_in_native, id=30467, stack(0x0000700010fa1000,0x00007000110a1000)]
     0x00007ff8c7db0800 JavaThread "SparkUI-41" daemon [_thread_in_native, id=30211, stack(0x0000700010e9e000,0x0000700010f9e000)]
     0x00007ff8c7dfb800 JavaThread "SparkUI-40" daemon [_thread_in_native, id=29955, stack(0x0000700010d9b000,0x0000700010e9b000)]
     0x00007ff927a3c800 JavaThread "SparkUI-39" daemon [_thread_in_native, id=34563, stack(0x0000700010c98000,0x0000700010d98000)]
     0x00007ff927320800 JavaThread "SparkUI-38" daemon [_thread_in_native, id=29443, stack(0x0000700010b95000,0x0000700010c95000)]
     0x00007ff92731d000 JavaThread "SparkUI-37" daemon [_thread_in_native, id=35075, stack(0x0000700010a92000,0x0000700010b92000)]
     0x00007ff8f0355800 JavaThread "SparkUI-36" daemon [_thread_in_native, id=35331, stack(0x000070001098f000,0x0000700010a8f000)]
     0x00007ff927316800 JavaThread "RemoteBlock-temp-file-clean-thread" daemon [_thread_blocked, id=35587, stack(0x000070001088c000,0x000070001098c000)]
     0x00007ff928aff800 JavaThread "dispatcher-BlockManagerEndpoint1" daemon [_thread_blocked, id=36099, stack(0x0000700010789000,0x0000700010889000)]
     0x00007ff927310000 JavaThread "dispatcher-BlockManagerMaster" daemon [_thread_blocked, id=36611, stack(0x0000700010686000,0x0000700010786000)]
     0x00007ff928ae3000 JavaThread "map-output-dispatcher-7" daemon [_thread_blocked, id=28419, stack(0x0000700010583000,0x0000700010683000)]
     0x00007ff8c7e1b000 JavaThread "map-output-dispatcher-6" daemon [_thread_blocked, id=28163, stack(0x0000700010480000,0x0000700010580000)]
     0x00007ff8c7e1a000 JavaThread "map-output-dispatcher-5" daemon [_thread_blocked, id=37379, stack(0x000070001037d000,0x000070001047d000)]
     0x00007ff8c7e19000 JavaThread "map-output-dispatcher-4" daemon [_thread_blocked, id=27651, stack(0x000070001027a000,0x000070001037a000)]
     0x00007ff8c7e18000 JavaThread "map-output-dispatcher-3" daemon [_thread_blocked, id=27139, stack(0x0000700010177000,0x0000700010277000)]
     0x00007ff8c7e17800 JavaThread "map-output-dispatcher-2" daemon [_thread_blocked, id=38147, stack(0x0000700010074000,0x0000700010174000)]
     0x00007ff92730f000 JavaThread "map-output-dispatcher-1" daemon [_thread_blocked, id=26883, stack(0x000070000ff71000,0x0000700010071000)]
     0x00007ff927a20800 JavaThread "map-output-dispatcher-0" daemon [_thread_blocked, id=38915, stack(0x000070000fe6e000,0x000070000ff6e000)]
     0x00007ff928adf000 JavaThread "dispatcher-event-loop-1" daemon [_thread_blocked, id=26627, stack(0x000070000fd6b000,0x000070000fe6b000)]
     0x00007ff8c7e16800 JavaThread "dispatcher-event-loop-0" daemon [_thread_blocked, id=39683, stack(0x000070000fc68000,0x000070000fd68000)]
     0x00007ff9272f7000 JavaThread "rpc-boss-3-1" daemon [_thread_in_native, id=40195, stack(0x000070000fb65000,0x000070000fc65000)]
     0x00007ff8c7de4800 JavaThread "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" daemon [_thread_blocked, id=40451, stack(0x000070000fa62000,0x000070000fb62000)]
     0x00007ff92800b800 JavaThread "Service Thread" daemon [_thread_blocked, id=40963, stack(0x000070000f85c000,0x000070000f95c000)]
     0x00007ff907008800 JavaThread "C1 CompilerThread11" daemon [_thread_blocked, id=41475, stack(0x000070000f759000,0x000070000f859000)]
     0x00007ff927018800 JavaThread "C1 CompilerThread10" daemon [_thread_blocked, id=25347, stack(0x000070000f656000,0x000070000f756000)]
     0x00007ff927824800 JavaThread "C1 CompilerThread9" daemon [_thread_blocked, id=24835, stack(0x000070000f553000,0x000070000f653000)]
     0x00007ff92b032000 JavaThread "C1 CompilerThread8" daemon [_thread_blocked, id=24579, stack(0x000070000f450000,0x000070000f550000)]
     0x00007ff927823800 JavaThread "C2 CompilerThread7" daemon [_thread_blocked, id=41987, stack(0x000070000f34d000,0x000070000f44d000)]
     0x00007ff92b029000 JavaThread "C2 CompilerThread6" daemon [_thread_blocked, id=23811, stack(0x000070000f24a000,0x000070000f34a000)]
     0x00007ff929888000 JavaThread "C2 CompilerThread5" daemon [_thread_blocked, id=42243, stack(0x000070000f147000,0x000070000f247000)]
     0x00007ff92b020000 JavaThread "C2 CompilerThread4" daemon [_thread_blocked, id=23043, stack(0x000070000f044000,0x000070000f144000)]
     0x00007ff92b017000 JavaThread "C2 CompilerThread3" daemon [_thread_blocked, id=42755, stack(0x000070000ef41000,0x000070000f041000)]
     0x00007ff92780a800 JavaThread "C2 CompilerThread2" daemon [_thread_blocked, id=22531, stack(0x000070000ee3e000,0x000070000ef3e000)]
     0x00007ff92b016800 JavaThread "C2 CompilerThread1" daemon [_thread_blocked, id=22275, stack(0x000070000ed3b000,0x000070000ee3b000)]
     0x00007ff927809800 JavaThread "C2 CompilerThread0" daemon [_thread_blocked, id=43267, stack(0x000070000ec38000,0x000070000ed38000)]
     0x00007ff929887800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=16387, stack(0x000070000eb35000,0x000070000ec35000)]
     0x00007ff92b012000 JavaThread "Finalizer" daemon [_thread_blocked, id=18435, stack(0x000070000e92c000,0x000070000ea2c000)]
     0x00007ff92b00f000 JavaThread "Reference Handler" daemon [_thread_blocked, id=18691, stack(0x000070000e829000,0x000070000e929000)]
     0x00007ff92a009800 JavaThread "main" [_thread_blocked, id=5635, stack(0x000070000d8fc000,0x000070000d9fc000)]
   
   Other Threads:
     0x00007ff929857000 VMThread [stack: 0x000070000e726000,0x000070000e826000] [id=19203]
     0x00007ff927825000 WatcherThread [stack: 0x000070000f95f000,0x000070000fa5f000] [id=26115]
   
   VM state:synchronizing (normal execution)
   
   VM Mutex/Monitor currently owned by a thread:  ([mutex/lock_event])
   [0x00007ff929005e70] Threads_lock - owner thread: 0x00007ff929857000
   
   heap address: 0x0000000740000000, size: 2048 MB, Compressed Oops mode: Zero based, Oop shift amount: 3
   Narrow klass base: 0x0000000000000000, Narrow klass shift: 3
   Compressed class space size: 1073741824 Address: 0x00000007c0000000
   
   Heap:
    PSYoungGen      total 317440K, used 95490K [0x0000000795580000, 0x00000007adc00000, 0x00000007c0000000)
     eden space 292352K, 24% used [0x0000000795580000,0x0000000799ad4bf0,0x00000007a7300000)
     from space 25088K, 97% used [0x00000007a7300000,0x00000007a8aebf48,0x00000007a8b80000)
     to   space 28672K, 0% used [0x00000007ac000000,0x00000007ac000000,0x00000007adc00000)
    ParOldGen       total 1398272K, used 1338876K [0x0000000740000000, 0x0000000795580000, 0x0000000795580000)
     object space 1398272K, 95% used [0x0000000740000000,0x0000000791b7f120,0x0000000795580000)
    Metaspace       used 107502K, capacity 117335K, committed 117504K, reserved 1150976K
     class space    used 14559K, capacity 15503K, committed 15616K, reserved 1048576K
   
   Card table byte_map: [0x0000000109e2c000,0x000000010a22d000] byte_map_base: 0x000000010642c000
   
   Marking Bits: (ParMarkBitMap*) 0x000000010970edc8
    Begin Bits: [0x000000011d076000, 0x000000011f076000)
    End Bits:   [0x000000011f076000, 0x0000000121076000)
   
   Polling page: 0x0000000106ff6000
   
   CodeCache: size=245760Kb used=32764Kb max_used=32875Kb free=212995Kb
    bounds [0x000000010e076000, 0x0000000110186000, 0x000000011d076000]
    total_blobs=9900 nmethods=9013 adapters=795
    compilation: enabled
   
   Compilation events (10 events):
   Event: 8.298 Thread 0x00007ff929888000 nmethod 10204 0x000000010ed06390 code [0x000000010ed06900, 0x000000010ed0a668]
   Event: 8.298 Thread 0x00007ff929888000 10226       4       org.codehaus.janino.Java$Statement::findLocalVariable (22 bytes)
   Event: 8.304 Thread 0x00007ff929888000 nmethod 10226 0x000000010f34a8d0 code [0x000000010f34aa60, 0x000000010f34af58]
   Event: 8.384 Thread 0x00007ff92780a800 nmethod 9976 0x00000001100d7b10 code [0x00000001100d8520, 0x00000001100de0d0]
   Event: 8.469 Thread 0x00007ff92b017000 nmethod 9952 0x00000001100ef210 code [0x00000001100efa60, 0x00000001100f74b0]
   Event: 8.519 Thread 0x00007ff927823800 nmethod 9843 0x0000000110105b10 code [0x00000001101063a0, 0x000000011010d408]
   Event: 8.528 Thread 0x00007ff92b020000 nmethod 10202 0x0000000110121410 code [0x0000000110121da0, 0x0000000110128950]
   Event: 8.530 Thread 0x00007ff927809800 nmethod 9837 0x0000000110134cd0 code [0x0000000110135680, 0x000000011013e8c8]
   Event: 8.570 Thread 0x00007ff92b016800 nmethod 10212 0x000000011014a990 code [0x000000011014b360, 0x0000000110152638]
   Event: 8.749 Thread 0x00007ff92b029000 nmethod 10164 0x00000001101699d0 code [0x000000011016acc0, 0x0000000110175a20]
   
   GC Heap History (10 events):
   Event: 3.617 GC heap before
   {Heap before GC invocations=9 (full 3):
    PSYoungGen      total 241152K, used 6841K [0x0000000795580000, 0x00000007a6200000, 0x00000007c0000000)
     eden space 224256K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007a3080000)
     from space 16896K, 40% used [0x00000007a4100000,0x00000007a47ae480,0x00000007a5180000)
     to   space 16896K, 0% used [0x00000007a3080000,0x00000007a3080000,0x00000007a4100000)
    ParOldGen       total 409600K, used 9447K [0x0000000740000000, 0x0000000759000000, 0x0000000795580000)
     object space 409600K, 2% used [0x0000000740000000,0x0000000740939eb8,0x0000000759000000)
    Metaspace       used 55934K, capacity 59104K, committed 59160K, reserved 1101824K
     class space    used 7429K, capacity 7796K, committed 7808K, reserved 1048576K
   Event: 3.678 GC heap after
   Heap after GC invocations=9 (full 3):
    PSYoungGen      total 241152K, used 0K [0x0000000795580000, 0x00000007a6200000, 0x00000007c0000000)
     eden space 224256K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007a3080000)
     from space 16896K, 0% used [0x00000007a4100000,0x00000007a4100000,0x00000007a5180000)
     to   space 16896K, 0% used [0x00000007a3080000,0x00000007a3080000,0x00000007a4100000)
    ParOldGen       total 567296K, used 13199K [0x0000000740000000, 0x0000000762a00000, 0x0000000795580000)
     object space 567296K, 2% used [0x0000000740000000,0x0000000740ce3fb0,0x0000000762a00000)
    Metaspace       used 55932K, capacity 59098K, committed 59160K, reserved 1101824K
     class space    used 7428K, capacity 7795K, committed 7808K, reserved 1048576K
   }
   Event: 4.809 GC heap before
   {Heap before GC invocations=10 (full 3):
    PSYoungGen      total 241152K, used 224256K [0x0000000795580000, 0x00000007a6200000, 0x00000007c0000000)
     eden space 224256K, 100% used [0x0000000795580000,0x00000007a3080000,0x00000007a3080000)
     from space 16896K, 0% used [0x00000007a4100000,0x00000007a4100000,0x00000007a5180000)
     to   space 16896K, 0% used [0x00000007a3080000,0x00000007a3080000,0x00000007a4100000)
    ParOldGen       total 567296K, used 13199K [0x0000000740000000, 0x0000000762a00000, 0x0000000795580000)
     object space 567296K, 2% used [0x0000000740000000,0x0000000740ce3fb0,0x0000000762a00000)
    Metaspace       used 70624K, capacity 77294K, committed 77440K, reserved 1116160K
     class space    used 9576K, capacity 10350K, committed 10368K, reserved 1048576K
   Event: 4.818 GC heap after
   Heap after GC invocations=10 (full 3):
    PSYoungGen      total 241152K, used 16067K [0x0000000795580000, 0x00000007a9e00000, 0x00000007c0000000)
     eden space 224256K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007a3080000)
     from space 16896K, 95% used [0x00000007a3080000,0x00000007a4030c68,0x00000007a4100000)
     to   space 18944K, 0% used [0x00000007a8b80000,0x00000007a8b80000,0x00000007a9e00000)
    ParOldGen       total 567296K, used 13207K [0x0000000740000000, 0x0000000762a00000, 0x0000000795580000)
     object space 567296K, 2% used [0x0000000740000000,0x0000000740ce5fb0,0x0000000762a00000)
    Metaspace       used 70624K, capacity 77294K, committed 77440K, reserved 1116160K
     class space    used 9576K, capacity 10350K, committed 10368K, reserved 1048576K
   }
   Event: 5.942 GC heap before
   {Heap before GC invocations=11 (full 3):
    PSYoungGen      total 241152K, used 180997K [0x0000000795580000, 0x00000007a9e00000, 0x00000007c0000000)
     eden space 224256K, 73% used [0x0000000795580000,0x000000079f6907d0,0x00000007a3080000)
     from space 16896K, 95% used [0x00000007a3080000,0x00000007a4030c68,0x00000007a4100000)
     to   space 18944K, 0% used [0x00000007a8b80000,0x00000007a8b80000,0x00000007a9e00000)
    ParOldGen       total 567296K, used 13207K [0x0000000740000000, 0x0000000762a00000, 0x0000000795580000)
     object space 567296K, 2% used [0x0000000740000000,0x0000000740ce5fb0,0x0000000762a00000)
    Metaspace       used 90699K, capacity 98443K, committed 98600K, reserved 1134592K
     class space    used 12245K, capacity 13080K, committed 13184K, reserved 1048576K
   Event: 5.956 GC heap after
   Heap after GC invocations=11 (full 3):
    PSYoungGen      total 311296K, used 18921K [0x0000000795580000, 0x00000007aa880000, 0x00000007c0000000)
     eden space 292352K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007a7300000)
     from space 18944K, 99% used [0x00000007a8b80000,0x00000007a9dfa660,0x00000007a9e00000)
     to   space 25088K, 0% used [0x00000007a7300000,0x00000007a7300000,0x00000007a8b80000)
    ParOldGen       total 567296K, used 21464K [0x0000000740000000, 0x0000000762a00000, 0x0000000795580000)
     object space 567296K, 3% used [0x0000000740000000,0x00000007414f60c0,0x0000000762a00000)
    Metaspace       used 90699K, capacity 98443K, committed 98600K, reserved 1134592K
     class space    used 12245K, capacity 13080K, committed 13184K, reserved 1048576K
   }
   Event: 5.956 GC heap before
   {Heap before GC invocations=12 (full 4):
    PSYoungGen      total 311296K, used 18921K [0x0000000795580000, 0x00000007aa880000, 0x00000007c0000000)
     eden space 292352K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007a7300000)
     from space 18944K, 99% used [0x00000007a8b80000,0x00000007a9dfa660,0x00000007a9e00000)
     to   space 25088K, 0% used [0x00000007a7300000,0x00000007a7300000,0x00000007a8b80000)
    ParOldGen       total 567296K, used 21464K [0x0000000740000000, 0x0000000762a00000, 0x0000000795580000)
     object space 567296K, 3% used [0x0000000740000000,0x00000007414f60c0,0x0000000762a00000)
    Metaspace       used 90699K, capacity 98443K, committed 98600K, reserved 1134592K
     class space    used 12245K, capacity 13080K, committed 13184K, reserved 1048576K
   Event: 6.040 GC heap after
   Heap after GC invocations=12 (full 4):
    PSYoungGen      total 311296K, used 0K [0x0000000795580000, 0x00000007aa880000, 0x00000007c0000000)
     eden space 292352K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007a7300000)
     from space 18944K, 0% used [0x00000007a8b80000,0x00000007a8b80000,0x00000007a9e00000)
     to   space 25088K, 0% used [0x00000007a7300000,0x00000007a7300000,0x00000007a8b80000)
    ParOldGen       total 826368K, used 38162K [0x0000000740000000, 0x0000000772700000, 0x0000000795580000)
     object space 826368K, 4% used [0x0000000740000000,0x0000000742544aa0,0x0000000772700000)
    Metaspace       used 90699K, capacity 98443K, committed 98600K, reserved 1134592K
     class space    used 12245K, capacity 13080K, committed 13184K, reserved 1048576K
   }
   Event: 8.008 GC heap before
   {Heap before GC invocations=13 (full 4):
    PSYoungGen      total 311296K, used 292352K [0x0000000795580000, 0x00000007aa880000, 0x00000007c0000000)
     eden space 292352K, 100% used [0x0000000795580000,0x00000007a7300000,0x00000007a7300000)
     from space 18944K, 0% used [0x00000007a8b80000,0x00000007a8b80000,0x00000007a9e00000)
     to   space 25088K, 0% used [0x00000007a7300000,0x00000007a7300000,0x00000007a8b80000)
    ParOldGen       total 826368K, used 38162K [0x0000000740000000, 0x0000000772700000, 0x0000000795580000)
     object space 826368K, 4% used [0x0000000740000000,0x0000000742544aa0,0x0000000772700000)
    Metaspace       used 106999K, capacity 116745K, committed 116864K, reserved 1150976K
     class space    used 14520K, capacity 15488K, committed 15488K, reserved 1048576K
   Event: 8.023 GC heap after
   Heap after GC invocations=13 (full 4):
    PSYoungGen      total 317440K, used 24495K [0x0000000795580000, 0x00000007adc00000, 0x00000007c0000000)
     eden space 292352K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007a7300000)
     from space 25088K, 97% used [0x00000007a7300000,0x00000007a8aebf48,0x00000007a8b80000)
     to   space 28672K, 0% used [0x00000007ac000000,0x00000007ac000000,0x00000007adc00000)
    ParOldGen       total 826368K, used 38178K [0x0000000740000000, 0x0000000772700000, 0x0000000795580000)
     object space 826368K, 4% used [0x0000000740000000,0x0000000742548aa0,0x0000000772700000)
    Metaspace       used 106999K, capacity 116745K, committed 116864K, reserved 1150976K
     class space    used 14520K, capacity 15488K, committed 15488K, reserved 1048576K
   }
   
   Deoptimization events (10 events):
   Event: 8.138 Thread 0x00007ff8d78b4800 Uncommon trap: reason=bimorphic action=maybe_recompile pc=0x000000010e4e44f4 method=org.codehaus.janino.IClass.getSuperclass()Lorg/codehaus/janino/IClass; @ 13
   Event: 8.138 Thread 0x00007ff8d78b4800 Uncommon trap: reason=bimorphic action=maybe_recompile pc=0x000000010e5f0aa4 method=org.codehaus.janino.IClass.getInterfaces()[Lorg/codehaus/janino/IClass; @ 13
   Event: 8.139 Thread 0x00007ff8d78b4800 Uncommon trap: reason=class_check action=maybe_recompile pc=0x000000010fc53ab0 method=org.codehaus.janino.UnitCompiler.internalCheckAccessible(Lorg/codehaus/janino/IClass;Lorg/codehaus/janino/Access;Lorg/codehaus/janino/Java$Scope;)Ljava/lang/String; @ 34
   Event: 8.139 Thread 0x00007ff8d78b4800 Uncommon trap: reason=class_check action=maybe_recompile pc=0x000000010fc53ab0 method=org.codehaus.janino.UnitCompiler.internalCheckAccessible(Lorg/codehaus/janino/IClass;Lorg/codehaus/janino/Access;Lorg/codehaus/janino/Java$Scope;)Ljava/lang/String; @ 34
   Event: 8.139 Thread 0x00007ff8d78b4800 Uncommon trap: reason=bimorphic action=maybe_recompile pc=0x000000010e4e44f4 method=org.codehaus.janino.IClass.getSuperclass()Lorg/codehaus/janino/IClass; @ 13
   Event: 8.140 Thread 0x00007ff8d78b4800 Uncommon trap: reason=bimorphic action=maybe_recompile pc=0x000000010e5f0aa4 method=org.codehaus.janino.IClass.getInterfaces()[Lorg/codehaus/janino/IClass; @ 13
   Event: 8.143 Thread 0x00007ff8d78b4800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x000000010ff8ab2c method=org.codehaus.janino.IClass.implementsInterface(Lorg/codehaus/janino/IClass;)Z @ 39
   Event: 8.143 Thread 0x00007ff8d78b4800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000000110014090 method=org.codehaus.janino.IClass.isAssignableFrom(Lorg/codehaus/janino/IClass;)Z @ 69
   Event: 8.174 Thread 0x00007ff8d78b4800 Uncommon trap: reason=class_check action=maybe_recompile pc=0x00000001100454c4 method=org.codehaus.janino.UnitCompiler.findLocalClassDeclaration(Lorg/codehaus/janino/Java$Scope;Ljava/lang/String;)Lorg/codehaus/janino/Java$LocalClassDeclaration; @ 70
   Event: 8.174 Thread 0x00007ff8d78b4800 Uncommon trap: reason=class_check action=maybe_recompile pc=0x000000010e9bcac8 method=org.codehaus.janino.UnitCompiler.reclassifyName(Lorg/codehaus/commons/compiler/Location;Lorg/codehaus/janino/Java$Scope;[Ljava/lang/String;I)Lorg/codehaus/janino/Java$At
   
   Classes redefined (0 events):
   No events
   
   Internal exceptions (10 events):
   Event: 7.392 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a0187a40) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.393 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a01ac580) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.397 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a01f43a8) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.398 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a021c138) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.400 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a0244530) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.407 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a04f6bb8) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.408 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a0517d30) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.409 Thread 0x00007ff92a009800 Exception <a 'java/lang/NoSuchMethodError': <clinit>> (0x00000007a053b238) thrown at [/Users/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-mac-x64-hotspot/workspace/build/src/hotspot/src/share/vm/prims/jni.cpp, line 1616]
   Event: 7.521 Thread 0x00007ff8d78b4800 Exception <a 'java/lang/NoSuchMethodError': java.lang.Object.$anonfun$prepareArguments$1(Lorg/apache/spark/sql/catalyst/expressions/codegen/CodegenContext;Lorg/apache/spark/sql/catalyst/expressions/Expression;)Ljava/lang/String;> (0x00000007a1e75410) thro
   Event: 7.521 Thread 0x00007ff8d78b4800 Exception <a 'java/lang/NoSuchMethodError': java.lang.Object.$anonfun$prepareArguments$3(Lorg/apache/spark/sql/catalyst/expressions/codegen/CodegenContext;Lscala/collection/Seq;Lscala/Tuple2;)Ljava/lang/String;> (0x00000007a1e7f888) thrown at [/Users/jenk
   
   Events (10 events):
   Event: 8.181 loading class org/apache/spark/sql/catalyst/expressions/MutableLong
   Event: 8.181 loading class org/apache/spark/sql/catalyst/expressions/MutableLong done
   Event: 8.181 loading class org/apache/spark/sql/catalyst/expressions/MutableFloat
   Event: 8.181 loading class org/apache/spark/sql/catalyst/expressions/MutableFloat done
   Event: 8.182 loading class org/apache/spark/sql/catalyst/expressions/MutableDouble
   Event: 8.182 loading class org/apache/spark/sql/catalyst/expressions/MutableDouble done
   Event: 8.182 loading class org/apache/spark/sql/catalyst/expressions/MutableAny
   Event: 8.182 loading class org/apache/spark/sql/catalyst/expressions/MutableAny done
   Event: 8.182 loading class org/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection
   Event: 8.182 loading class org/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificSafeProjection done
   
   
   Dynamic libraries:
   0x00007fff30420000 	/System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa
   0x00007fff3dc19000 	/System/Library/Frameworks/Security.framework/Versions/A/Security
   0x00007fff2f385000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/ApplicationServices
   0x00007fff6a73d000 	/usr/lib/libz.1.dylib
   0x00007fff68201000 	/usr/lib/libSystem.B.dylib
   0x00007fff312c2000 	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
   0x00007fff3397f000 	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
   0x00007fff6a061000 	/usr/lib/libobjc.A.dylib
   0x00007fff2e575000 	/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
   0x00007fff30da4000 	/System/Library/Frameworks/CoreData.framework/Versions/A/CoreData
   0x00007fff62216000 	/System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation
   0x00007fff5d2aa000 	/System/Library/PrivateFrameworks/RemoteViewServices.framework/Versions/A/RemoteViewServices
   0x00007fff64d71000 	/System/Library/PrivateFrameworks/XCTTargetBootstrap.framework/Versions/A/XCTTargetBootstrap
   0x00007fff3118f000 	/System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
   0x00007fff3683a000 	/System/Library/Frameworks/Metal.framework/Versions/A/Metal
   0x00007fff4b2ad000 	/System/Library/PrivateFrameworks/DesktopServicesPriv.framework/Versions/A/DesktopServicesPriv
   0x00007fff6933c000 	/usr/lib/liblangid.dylib
   0x00007fff4a542000 	/System/Library/PrivateFrameworks/CoreSVG.framework/Versions/A/CoreSVG
   0x00007fff60406000 	/System/Library/PrivateFrameworks/SkyLight.framework/Versions/A/SkyLight
   0x00007fff31743000 	/System/Library/Frameworks/CoreGraphics.framework/Versions/A/CoreGraphics
   0x00007fff2cf49000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate
   0x00007fff6a62e000 	/usr/lib/libxml2.2.dylib
   0x00007fff56235000 	/System/Library/PrivateFrameworks/IconServices.framework/Versions/A/IconServices
   0x00007fff3415e000 	/System/Library/Frameworks/IOSurface.framework/Versions/A/IOSurface
   0x00007fff67d2b000 	/usr/lib/libDiagnosticMessagesClient.dylib
   0x00007fff4b107000 	/System/Library/PrivateFrameworks/DFRFoundation.framework/Versions/A/DFRFoundation
   0x00007fff690cb000 	/usr/lib/libicucore.A.dylib
   0x00007fff2f6b7000 	/System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox
   0x00007fff2f79a000 	/System/Library/Frameworks/AudioUnit.framework/Versions/A/AudioUnit
   0x00007fff68403000 	/usr/lib/libauto.dylib
   0x00007fff4b1a9000 	/System/Library/PrivateFrameworks/DataDetectorsCore.framework/Versions/A/DataDetectorsCore
   0x00007fff2ff41000 	/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
   0x00007fff3ce13000 	/System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore
   0x00007fff30280000 	/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/SpeechRecognition.framework/Versions/A/SpeechRecognition
   0x00007fff4ac1b000 	/System/Library/PrivateFrameworks/CoreUI.framework/Versions/A/CoreUI
   0x00007fff3080f000 	/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio
   0x00007fff3363e000 	/System/Library/Frameworks/DiskArbitration.framework/Versions/A/DiskArbitration
   0x00007fff58285000 	/System/Library/PrivateFrameworks/MultitouchSupport.framework/Versions/A/MultitouchSupport
   0x00007fff68f92000 	/usr/lib/libenergytrace.dylib
   0x00007fff340b8000 	/System/Library/Frameworks/IOKit.framework/Versions/A/IOKit
   0x00007fff326aa000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices
   0x00007fff5a43d000 	/System/Library/PrivateFrameworks/PerformanceAnalysis.framework/Versions/A/PerformanceAnalysis
   0x00007fff3be46000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/OpenGL
   0x00007fff3042e000 	/System/Library/Frameworks/ColorSync.framework/Versions/A/ColorSync
   0x00007fff31dc6000 	/System/Library/Frameworks/CoreImage.framework/Versions/A/CoreImage
   0x00007fff3311d000 	/System/Library/Frameworks/CoreText.framework/Versions/A/CoreText
   0x00007fff341ee000 	/System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO
   0x00007fff684e7000 	/usr/lib/libc++.1.dylib
   0x00007fff68563000 	/usr/lib/libcompression.dylib
   0x00007fff6806f000 	/usr/lib/libMobileGestalt.dylib
   0x00007fff61f80000 	/System/Library/PrivateFrameworks/TextureIO.framework/Versions/A/TextureIO
   0x00007fff68366000 	/usr/lib/libate.dylib
   0x00007fff5643e000 	/System/Library/PrivateFrameworks/InternationalSupport.framework/Versions/A/InternationalSupport
   0x00007fff6affe000 	/usr/lib/system/libcache.dylib
   0x00007fff6b004000 	/usr/lib/system/libcommonCrypto.dylib
   0x00007fff6b010000 	/usr/lib/system/libcompiler_rt.dylib
   0x00007fff6b018000 	/usr/lib/system/libcopyfile.dylib
   0x00007fff6b022000 	/usr/lib/system/libcorecrypto.dylib
   0x00007fff6b1c1000 	/usr/lib/system/libdispatch.dylib
   0x00007fff6b202000 	/usr/lib/system/libdyld.dylib
   0x00007fff6b239000 	/usr/lib/system/libkeymgr.dylib
   0x00007fff6b247000 	/usr/lib/system/liblaunch.dylib
   0x00007fff6b248000 	/usr/lib/system/libmacho.dylib
   0x00007fff6b24e000 	/usr/lib/system/libquarantine.dylib
   0x00007fff6b251000 	/usr/lib/system/libremovefile.dylib
   0x00007fff6b253000 	/usr/lib/system/libsystem_asl.dylib
   0x00007fff6b26b000 	/usr/lib/system/libsystem_blocks.dylib
   0x00007fff6b26c000 	/usr/lib/system/libsystem_c.dylib
   0x00007fff6b2f4000 	/usr/lib/system/libsystem_configuration.dylib
   0x00007fff6b2f8000 	/usr/lib/system/libsystem_coreservices.dylib
   0x00007fff6b2fc000 	/usr/lib/system/libsystem_darwin.dylib
   0x00007fff6b305000 	/usr/lib/system/libsystem_dnssd.dylib
   0x00007fff6b30d000 	/usr/lib/system/libsystem_featureflags.dylib
   0x00007fff6b30f000 	/usr/lib/system/libsystem_info.dylib
   0x00007fff6b38a000 	/usr/lib/system/libsystem_m.dylib
   0x00007fff6b3d2000 	/usr/lib/system/libsystem_malloc.dylib
   0x00007fff6b3fa000 	/usr/lib/system/libsystem_networkextension.dylib
   0x00007fff6b408000 	/usr/lib/system/libsystem_notify.dylib
   0x00007fff6b426000 	/usr/lib/system/libsystem_sandbox.dylib
   0x00007fff6b42b000 	/usr/lib/system/libsystem_secinit.dylib
   0x00007fff6b35d000 	/usr/lib/system/libsystem_kernel.dylib
   0x00007fff6b412000 	/usr/lib/system/libsystem_platform.dylib
   0x00007fff6b41b000 	/usr/lib/system/libsystem_pthread.dylib
   0x00007fff6b42e000 	/usr/lib/system/libsystem_symptoms.dylib
   0x00007fff6b436000 	/usr/lib/system/libsystem_trace.dylib
   0x00007fff6b44e000 	/usr/lib/system/libunwind.dylib
   0x00007fff6b454000 	/usr/lib/system/libxpc.dylib
   0x00007fff6853a000 	/usr/lib/libc++abi.dylib
   0x00007fff6933e000 	/usr/lib/liblzma.5.dylib
   0x00007fff68fb9000 	/usr/lib/libfakelink.dylib
   0x00007fff682f3000 	/usr/lib/libarchive.2.dylib
   0x00007fff3ecca000 	/System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration
   0x00007fff67c5e000 	/usr/lib/libCRFSuite.dylib
   0x00007fff2fb31000 	/System/Library/Frameworks/CFNetwork.framework/Versions/A/CFNetwork
   0x00007fff684c9000 	/usr/lib/libbsm.0.dylib
   0x00007fff6b23a000 	/usr/lib/system/libkxld.dylib
   0x00007fff44a84000 	/System/Library/PrivateFrameworks/AppleFSCompression.framework/Versions/A/AppleFSCompression
   0x00007fff68854000 	/usr/lib/libcoretls.dylib
   0x00007fff6886b000 	/usr/lib/libcoretls_cfhelpers.dylib
   0x00007fff6a0a7000 	/usr/lib/libpam.2.dylib
   0x00007fff6a1dc000 	/usr/lib/libsqlite3.dylib
   0x00007fff6a61b000 	/usr/lib/libxar.1.dylib
   0x00007fff684da000 	/usr/lib/libbz2.1.0.dylib
   0x00007fff68fda000 	/usr/lib/libiconv.2.dylib
   0x00007fff68550000 	/usr/lib/libcharset.1.dylib
   0x00007fff69b44000 	/usr/lib/libnetwork.dylib
   0x00007fff6a0ae000 	/usr/lib/libpcap.A.dylib
   0x00007fff682a8000 	/usr/lib/libapple_nghttp2.dylib
   0x00007fff32a61000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/FSEvents.framework/Versions/A/FSEvents
   0x00007fff32731000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore
   0x00007fff32ca5000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Metadata
   0x00007fff32d3e000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/OSServices.framework/Versions/A/OSServices
   0x00007fff32d6c000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/SearchKit.framework/Versions/A/SearchKit
   0x00007fff326ab000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/AE.framework/Versions/A/AE
   0x00007fff32a6a000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/LaunchServices
   0x00007fff32a13000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/DictionaryServices.framework/Versions/A/DictionaryServices
   0x00007fff32dd4000 	/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/SharedFileList.framework/Versions/A/SharedFileList
   0x00007fff37c65000 	/System/Library/Frameworks/NetFS.framework/Versions/A/NetFS
   0x00007fff587b5000 	/System/Library/PrivateFrameworks/NetAuth.framework/Versions/A/NetAuth
   0x00007fff6513e000 	/System/Library/PrivateFrameworks/login.framework/Versions/A/Frameworks/loginsupport.framework/Versions/A/loginsupport
   0x00007fff61a4b000 	/System/Library/PrivateFrameworks/TCC.framework/Versions/A/TCC
   0x00007fff496b5000 	/System/Library/PrivateFrameworks/CoreNLP.framework/Versions/A/CoreNLP
   0x00007fff57cb2000 	/System/Library/PrivateFrameworks/MetadataUtilities.framework/Versions/A/MetadataUtilities
   0x00007fff69416000 	/usr/lib/libmecabra.dylib
   0x00007fff6936e000 	/usr/lib/libmecab.dylib
   0x00007fff68fca000 	/usr/lib/libgermantok.dylib
   0x00007fff6828f000 	/usr/lib/libThaiTokenizer.dylib
   0x00007fff67c95000 	/usr/lib/libChineseTokenizer.dylib
   0x00007fff2cf61000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vImage.framework/Versions/A/vImage
   0x00007fff2e3cb000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/vecLib
   0x00007fff2e308000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvMisc.dylib
   0x00007fff2e130000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib
   0x00007fff2d5b8000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
   0x00007fff2dcf4000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
   0x00007fff2e090000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLinearAlgebra.dylib
   0x00007fff2e11d000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libSparseBLAS.dylib
   0x00007fff2e0a6000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libQuadrature.dylib
   0x00007fff2d820000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBNNS.dylib
   0x00007fff2e0ac000 	/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libSparse.dylib
   0x00007fff56826000 	/System/Library/PrivateFrameworks/LanguageModeling.framework/Versions/A/LanguageModeling
   0x00007fff49065000 	/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/CoreEmoji
   0x00007fff56944000 	/System/Library/PrivateFrameworks/LinguisticData.framework/Versions/A/LinguisticData
   0x00007fff568f5000 	/System/Library/PrivateFrameworks/Lexicon.framework/Versions/A/Lexicon
   0x00007fff68551000 	/usr/lib/libcmph.dylib
   0x00007fff3a854000 	/System/Library/Frameworks/OpenDirectory.framework/Versions/A/Frameworks/CFOpenDirectory.framework/Versions/A/CFOpenDirectory
   0x00007fff3a871000 	/System/Library/Frameworks/OpenDirectory.framework/Versions/A/OpenDirectory
   0x00007fff42cb4000 	/System/Library/PrivateFrameworks/APFS.framework/Versions/A/APFS
   0x00007fff3df63000 	/System/Library/Frameworks/SecurityFoundation.framework/Versions/A/SecurityFoundation
   0x00007fff6a617000 	/usr/lib/libutil.dylib
   0x00007fff4a59a000 	/System/Library/PrivateFrameworks/CoreServicesStore.framework/Versions/A/CoreServicesStore
   0x00007fff3e01a000 	/System/Library/Frameworks/ServiceManagement.framework/Versions/A/ServiceManagement
   0x00007fff46254000 	/System/Library/PrivateFrameworks/BackgroundTaskManagement.framework/Versions/A/BackgroundTaskManagement
   0x00007fff6a714000 	/usr/lib/libxslt.1.dylib
   0x00007fff450f7000 	/System/Library/PrivateFrameworks/AppleSystemInfo.framework/Versions/A/AppleSystemInfo
   0x00007fff34409000 	/System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libJPEG.dylib
   0x00007fff346c8000 	/System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libTIFF.dylib
   0x00007fff346ab000 	/System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libPng.dylib
   0x00007fff3434b000 	/System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libGIF.dylib
   0x00007fff3434f000 	/System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libJP2.dylib
   0x00007fff346c6000 	/System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libRadiance.dylib
   0x00007fff68f93000 	/usr/lib/libexpat.1.dylib
   0x00007fff44bdf000 	/System/Library/PrivateFrameworks/AppleJPEG.framework/Versions/A/AppleJPEG
   0x00007fff4d89a000 	/System/Library/PrivateFrameworks/FontServices.framework/libFontParser.dylib
   0x00007fff64140000 	/System/Library/PrivateFrameworks/WatchdogClient.framework/Versions/A/WatchdogClient
   0x00007fff55e81000 	/System/Library/PrivateFrameworks/IOAccelerator.framework/Versions/A/IOAccelerator
   0x00007fff36bd5000 	/System/Library/Frameworks/MetalPerformanceShaders.framework/Versions/A/MetalPerformanceShaders
   0x00007fff52617000 	/System/Library/PrivateFrameworks/GPUWrangler.framework/Versions/A/GPUWrangler
   0x00007fff55e96000 	/System/Library/PrivateFrameworks/IOPresentment.framework/Versions/A/IOPresentment
   0x00007fff4b11a000 	/System/Library/PrivateFrameworks/DSExternalDisplay.framework/Versions/A/DSExternalDisplay
   0x00007fff3b1e5000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libCoreFSCache.dylib
   0x00007fff36921000 	/System/Library/Frameworks/MetalPerformanceShaders.framework/Frameworks/MPSCore.framework/Versions/A/MPSCore
   0x00007fff3695f000 	/System/Library/Frameworks/MetalPerformanceShaders.framework/Frameworks/MPSImage.framework/Versions/A/MPSImage
   0x00007fff36a26000 	/System/Library/Frameworks/MetalPerformanceShaders.framework/Frameworks/MPSNeuralNetwork.framework/Versions/A/MPSNeuralNetwork
   0x00007fff369ea000 	/System/Library/Frameworks/MetalPerformanceShaders.framework/Frameworks/MPSMatrix.framework/Versions/A/MPSMatrix
   0x00007fff36b85000 	/System/Library/Frameworks/MetalPerformanceShaders.framework/Frameworks/MPSRayIntersector.framework/Versions/A/MPSRayIntersector
   0x00007fff36a10000 	/System/Library/Frameworks/MetalPerformanceShaders.framework/Frameworks/MPSNDArray.framework/Versions/A/MPSNDArray
   0x00007fff57cff000 	/System/Library/PrivateFrameworks/MetalTools.framework/Versions/A/MetalTools
   0x00007fff43e8e000 	/System/Library/PrivateFrameworks/AggregateDictionary.framework/Versions/A/AggregateDictionary
   0x00007fff48aad000 	/System/Library/PrivateFrameworks/CoreAnalytics.framework/Versions/A/CoreAnalytics
   0x00007fff45016000 	/System/Library/PrivateFrameworks/AppleSauce.framework/Versions/A/AppleSauce
   0x00007fff67f51000 	/usr/lib/libIOReport.dylib
   0x00007fff332d5000 	/System/Library/Frameworks/CoreVideo.framework/Versions/A/CoreVideo
   0x00007fff53a90000 	/System/Library/PrivateFrameworks/GraphVisualizer.framework/Versions/A/GraphVisualizer
   0x00007fff4cde0000 	/System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/FaceCore
   0x00007fff3a7fb000 	/System/Library/Frameworks/OpenCL.framework/Versions/A/OpenCL
   0x00007fff67d73000 	/usr/lib/libFosl_dynamic.dylib
   0x00007fff591d5000 	/System/Library/PrivateFrameworks/OTSVG.framework/Versions/A/OTSVG
   0x00007fff2f48a000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/libFontRegistry.dylib
   0x00007fff4da6b000 	/System/Library/PrivateFrameworks/FontServices.framework/libhvf.dylib
   0x00007fff3b1f0000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGFXShared.dylib
   0x00007fff3b3ce000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLU.dylib
   0x00007fff3b1f9000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGL.dylib
   0x00007fff3b204000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLImage.dylib
   0x00007fff3b1e2000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libCVMSPluginSupport.dylib
   0x00007fff3b1eb000 	/System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libCoreVMClient.dylib
   0x00007fff699e5000 	/usr/lib/libncurses.5.4.dylib
   0x00007fff2f386000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/ATS
   0x00007fff2f553000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ColorSyncLegacy.framework/Versions/A/ColorSyncLegacy
   0x00007fff2f5f1000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/HIServices.framework/Versions/A/HIServices
   0x00007fff2f649000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/LangAnalysis.framework/Versions/A/LangAnalysis
   0x00007fff2f658000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/PrintCore.framework/Versions/A/PrintCore
   0x00007fff2f69e000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/QD.framework/Versions/A/QD
   0x00007fff2f6a9000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/SpeechSynthesis.framework/Versions/A/SpeechSynthesis
   0x00007fff2f523000 	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATSUI.framework/Versions/A/ATSUI
   0x00007fff68e29000 	/usr/lib/libcups.2.dylib
   0x00007fff35c70000 	/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos
   0x00007fff33db1000 	/System/Library/Frameworks/GSS.framework/Versions/A/GSS
   0x00007fff6a168000 	/usr/lib/libresolv.9.dylib
   0x00007fff53c3d000 	/System/Library/PrivateFrameworks/Heimdal.framework/Versions/A/Heimdal
   0x00007fff35c83000 	/System/Library/Frameworks/Kerberos.framework/Versions/A/Libraries/libHeimdalProxy.dylib
   0x00007fff68fd0000 	/usr/lib/libheimdal-asn1.dylib
   0x00007fff47f45000 	/System/Library/PrivateFrameworks/CommonAuth.framework/Versions/A/CommonAuth
   0x00007fff45194000 	/System/Library/PrivateFrameworks/AssertionServices.framework/Versions/A/AssertionServices
   0x00007fff45d37000 	/System/Library/PrivateFrameworks/AudioToolboxCore.framework/Versions/A/AudioToolboxCore
   0x00007fff64dee000 	/System/Library/PrivateFrameworks/caulk.framework/Versions/A/caulk
   0x00007fff46300000 	/System/Library/PrivateFrameworks/BaseBoard.framework/Versions/A/BaseBoard
   0x00007fff5d438000 	/System/Library/PrivateFrameworks/RunningBoardServices.framework/Versions/A/RunningBoardServices
   0x00007fff5a449000 	/System/Library/PrivateFrameworks/PersistentConnection.framework/Versions/A/PersistentConnection
   0x00007fff5ce31000 	/System/Library/PrivateFrameworks/ProtocolBuffer.framework/Versions/A/ProtocolBuffer
   0x00007fff47f69000 	/System/Library/PrivateFrameworks/CommonUtilities.framework/Versions/A/CommonUtilities
   0x00007fff4648e000 	/System/Library/PrivateFrameworks/Bom.framework/Versions/A/Bom
   0x00007fff67c24000 	/usr/lib/libAudioToolboxUtility.dylib
   0x00007fff4625e000 	/System/Library/PrivateFrameworks/Backup.framework/Versions/A/Backup
   0x00007fff4b03b000 	/System/Library/PrivateFrameworks/CrashReporterSupport.framework/Versions/A/CrashReporterSupport
   0x00007fff5eeca000 	/System/Library/PrivateFrameworks/Sharing.framework/Versions/A/Sharing
   0x00007fff447a2000 	/System/Library/PrivateFrameworks/Apple80211.framework/Versions/A/Apple80211
   0x00007fff45f7b000 	/System/Library/PrivateFrameworks/AuthKit.framework/Versions/A/AuthKit
   0x00007fff4ad48000 	/System/Library/PrivateFrameworks/CoreUtils.framework/Versions/A/CoreUtils
   0x00007fff3331a000 	/System/Library/Frameworks/CoreWLAN.framework/Versions/A/CoreWLAN
   0x00007fff33f3e000 	/System/Library/Frameworks/IOBluetooth.framework/Versions/A/IOBluetooth
   0x00007fff58004000 	/System/Library/PrivateFrameworks/MobileKeyBag.framework/Versions/A/MobileKeyBag
   0x00007fff49b4d000 	/System/Library/PrivateFrameworks/CorePhoneNumbers.framework/Versions/A/CorePhoneNumbers
   0x00007fff44b92000 	/System/Library/PrivateFrameworks/AppleIDAuthSupport.framework/Versions/A/AppleIDAuthSupport
   0x00007fff37c72000 	/System/Library/Frameworks/Network.framework/Versions/A/Network
   0x00007fff566d2000 	/System/Library/PrivateFrameworks/KeychainCircle.framework/Versions/A/KeychainCircle
   0x00007fff30d6b000 	/System/Library/Frameworks/CoreBluetooth.framework/Versions/A/CoreBluetooth
   0x00007fff60f49000 	/System/Library/PrivateFrameworks/SpeechRecognitionCore.framework/Versions/A/SpeechRecognitionCore
   0x0000000109000000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/server/libjvm.dylib
   0x00000001099db000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/libverify.dylib
   0x00000001099ec000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/libjava.dylib
   0x0000000109a5f000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/libzip.dylib
   0x000000010d16a000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/libnio.dylib
   0x000000010d17c000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/libnet.dylib
   0x000000010dedc000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/libmanagement.dylib
   0x000000010dee9000 	/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/libjaas_unix.dylib
   0x000000010df18000 	/private/var/folders/ky/3y1zvddj5ns3xks640388thh0000gn/T/liblz4-java-2909128287676442335.dylib
   
   VM Arguments:
   jvm_args: -Xmx2g 
   java_command: org.apache.spark.deploy.SparkSubmit --master local[2] --conf spark.driver.memory=2g --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --packages org.apache.spark:spark-avro_2.12:3.0.1 /Users/username/workspace/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar --table-type COPY_ON_WRITE --source-ordering-field ts_ms --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class org.apache.hudi.utilities.sources.AvroKafkaSource --target-base-path /Users/username/Downloads/roi/debezium/by_test/ --target-table users --props ./hudi_base.properties --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   java_class_path (initial): /Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/conf/:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/okhttp-3.12.6.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/jakarta.xml.bind-api-2.3.2.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/machinist_2.12-0.6.8.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/jackson-jaxrs-base-2.9.5.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/commons-lang3-3.9.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/jaxb-runtime-2.3.2.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/kubernetes-model-4.9.2.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/minlog-1.3.0.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/arrow-vector-0.15.1.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/xz-1.5.jar:/U
 sers/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/hive-shims-scheduler-2.3.7.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/javolution-5.5.1.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/spark-repl_2.12-3.0.1.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/hadoop-yarn-server-web-proxy-3.2.0.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/jta-1.1.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/threeten-extra-1.5.0.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/hive-vector-code-gen-2.3.7.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/JTransforms-3.1.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/hive-shims-0.23-2.3.7.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/spark-launcher_2.12-3.0.1.jar:/Users/username/Downloads/spark-3.0.1-bin-hadoop3.2/jars/logging-interceptor-3.12.
   Launcher Type: SUN_STANDARD
   
   Environment Variables:
   JAVA_HOME=/Users/username/.jenv/versions/1.8
   PATH=/usr/local/opt/mysql-client/bin:/usr/local/opt/python@3.7/bin:/usr/local/opt/ncurses/bin:/Users/username/.jenv/shims:/Users/username/.jenv/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
   SHELL=/bin/zsh
   
   Signal Handlers:
   SIGSEGV: [libjvm.dylib+0x595d31], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_ONSTACK|SA_RESTART|SA_SIGINFO
   SIGBUS: [libjvm.dylib+0x595d31], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGFPE: [libjvm.dylib+0x48a2eb], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGPIPE: [libjvm.dylib+0x48a2eb], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGXFSZ: [libjvm.dylib+0x48a2eb], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGILL: [libjvm.dylib+0x48a2eb], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART
   SIGUSR2: [libjvm.dylib+0x48abe2], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO
   SIGHUP: [libjvm.dylib+0x488de1], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGINT: [libjvm.dylib+0x488de1], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGTERM: [libjvm.dylib+0x488de1], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   SIGQUIT: [libjvm.dylib+0x488de1], sa_mask[0]=11111111011111110111111111111111, sa_flags=SA_RESTART|SA_SIGINFO
   
   
   ---------------  S Y S T E M  ---------------
   
   OS:Bsduname:Darwin 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
   rlimit: STACK 8192k, CORE 0k, NPROC 5568, NOFILE 10240, AS infinity
   load average:2.94 2.90 2.71
   
   CPU:total 16 (initial active 16) (8 cores per cpu, 2 threads per core) family 6 model 158 stepping 13, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx
   
   Memory: 4k page, physical 33554432k(30828k free)
   
   /proc/meminfo:
   
   
   vm_info: OpenJDK 64-Bit Server VM (25.265-b01) for bsd-amd64 JRE (1.8.0_265-b01), built on Jul 28 2020 16:14:07 by "jenkins" with gcc 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)
   
   time: Sun Nov 22 10:03:44 2020
   timezone: +08
   elapsed time: 9 seconds (0d 0h 0m 9s)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-738209420


   @vinothchandar thanks for the review! Just pushed a minor change related to README. Yea for the ```hudi-spark-common``` issue, I can create a JIRA for it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (d51d392) into [master](https://codecov.io/gh/apache/hudi/commit/3a91d26d625415b7ccd74f9f32418f8a84aa3e3c?el=desc) (3a91d26) will **decrease** coverage by `43.13%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2208       +/-   ##
   =============================================
   - Coverage     53.49%   10.35%   -43.14%     
   + Complexity     2788       48     -2740     
   =============================================
     Files           355       51      -304     
     Lines         16169     1786    -14383     
     Branches       1650      213     -1437     
   =============================================
   - Hits           8649      185     -8464     
   + Misses         6819     1588     -5231     
   + Partials        701       13      -688     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.35% <20.00%> (-59.75%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `35.54% <100.00%> (-28.92%)` | `11.00 <1.00> (-21.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [308 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-720614167


   > TestCOWDataSource
   
   Hey, I am able to pass the test with spark3 through maven:
   ```
   mvn test -Dscala-2.12 -Pspark3 -Dtest=TestCOWDataSource -DfailIfNoTests=false
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-738209420


   @vinothchandar thanks for the review! Just pushed a minor change related to README. Yea for the ```hudi-spark-common``` issue, created a JIRA for it: https://issues.apache.org/jira/browse/HUDI-1432


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/14c4611857ea796b8d3aef96f867db9cd0ae31f7?el=desc) will **decrease** coverage by `0.11%`.
   > The diff coverage is `52.94%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2208      +/-   ##
   ============================================
   - Coverage     53.68%   53.56%   -0.12%     
   + Complexity     2849     2829      -20     
   ============================================
     Files           359      354       -5     
     Lines         16565    16473      -92     
     Branches       1782     1784       +2     
   ============================================
   - Hits           8893     8824      -69     
   + Misses         6915     6893      -22     
   + Partials        757      756       -1     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.69% <0.00%> (-0.02%)` | `1794.00 <0.00> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.58% <53.33%> (-0.37%)` | `284.00 <0.00> (-20.00)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `70.09% <100.00%> (ø)` | `327.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=) | `50.56% <0.00%> (-0.39%)` | `0.00 <0.00> (ø)` | |
   | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvTWVyZ2VPblJlYWRTbmFwc2hvdFJlbGF0aW9uLnNjYWxh) | `90.58% <ø> (-0.22%)` | `16.00 <0.00> (ø)` | |
   | [...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==) | `57.14% <42.85%> (-4.93%)` | `0.00 <0.00> (ø)` | |
   | [.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtVdGlscy5zY2FsYQ==) | `92.85% <83.33%> (-7.15%)` | `0.00 <0.00> (ø)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.19% <100.00%> (ø)` | `30.00 <1.00> (ø)` | |
   | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `77.67% <0.00%> (-0.90%)` | `22.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-735409910


   Hey @vinothchandar 
   I tried to write a custom transformer using dataset api but got the same error 
   This class org.apache.hudi.Spark3RowDeserializer.deserializeRow has issue.
   
   ```
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-735409910


   Hey @vinothchandar 
   I tried to write a custom transformer using dataset api but got the same error 
   This class SparkRowDeserializer has issue.
   
   ```
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
umehrot2 edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-726718174


   @vinothchandar I took one final pass and it LGTM. You can go ahead with you review.
   
   I just have one pending comment regarding some refactoring where I have cc'd you and it would be good to have your opinion there too before @zhedoubushishi implements it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/14c4611857ea796b8d3aef96f867db9cd0ae31f7?el=desc) will **decrease** coverage by `0.11%`.
   > The diff coverage is `52.94%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2208      +/-   ##
   ============================================
   - Coverage     53.68%   53.56%   -0.12%     
   + Complexity     2849     2829      -20     
   ============================================
     Files           359      354       -5     
     Lines         16565    16473      -92     
     Branches       1782     1784       +2     
   ============================================
   - Hits           8893     8824      -69     
   + Misses         6915     6893      -22     
   + Partials        757      756       -1     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.69% <0.00%> (-0.02%)` | `1794.00 <0.00> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.58% <53.33%> (-0.37%)` | `284.00 <0.00> (-20.00)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `70.09% <100.00%> (ø)` | `327.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=) | `50.56% <0.00%> (-0.39%)` | `0.00 <0.00> (ø)` | |
   | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvTWVyZ2VPblJlYWRTbmFwc2hvdFJlbGF0aW9uLnNjYWxh) | `90.58% <ø> (-0.22%)` | `16.00 <0.00> (ø)` | |
   | [...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==) | `57.14% <42.85%> (-4.93%)` | `0.00 <0.00> (ø)` | |
   | [.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtVdGlscy5zY2FsYQ==) | `92.85% <83.33%> (-7.15%)` | `0.00 <0.00> (ø)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.19% <100.00%> (ø)` | `30.00 <1.00> (ø)` | |
   | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `77.67% <0.00%> (-0.90%)` | `22.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rmpifer commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
rmpifer commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517179203



##########
File path: hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -24,35 +24,13 @@ import org.apache.hudi.avro.HoodieAvroUtils
 import org.apache.hudi.common.model.HoodieKey
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.avro.SchemaConverters
-import org.apache.spark.sql.catalyst.encoders.RowEncoder
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}
 
 import scala.collection.JavaConverters._
 
 object AvroConversionUtils {
 
-  def createRdd(df: DataFrame, structName: String, recordNamespace: String): RDD[GenericRecord] = {
-    val avroSchema = convertStructTypeToAvroSchema(df.schema, structName, recordNamespace)
-    createRdd(df, avroSchema, structName, recordNamespace)
-  }
-
-  def createRdd(df: DataFrame, avroSchema: Schema, structName: String, recordNamespace: String)
-  : RDD[GenericRecord] = {
-    // Use the Avro schema to derive the StructType which has the correct nullability information
-    val dataType = SchemaConverters.toSqlType(avroSchema).dataType.asInstanceOf[StructType]
-    val encoder = RowEncoder.apply(dataType).resolveAndBind()
-    val deserializer = HoodieSparkUtils.createDeserializer(encoder)
-    df.queryExecution.toRdd.map(row => deserializer.deserializeRow(row))
-      .mapPartitions { records =>
-        if (records.isEmpty) Iterator.empty
-        else {
-          val convertor = AvroConversionHelper.createConverterToAvro(dataType, structName, recordNamespace)
-          records.map { x => convertor(x).asInstanceOf[GenericRecord] }
-        }
-      }
-  }
-
   def createRddForDeletes(df: DataFrame, rowField: String, partitionField: String): RDD[HoodieKey] = {

Review comment:
       If we are moving these methods to HoodieSparkUtils we may want to also refactor other similar methods in this class so they are grouped together. i.e. `createRddForDeletes` and `createDataFrame`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-735409910


   tried to write a custom transformer using dataset api but got the same error 
   I guess something related to spark-sql?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (1698576) into [master](https://codecov.io/gh/apache/hudi/commit/36ce5bcd92fa6cb2fb88677eae1a7a846f606e3f?el=desc) (36ce5bc) will **decrease** coverage by `1.19%`.
   > The diff coverage is `83.33%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2208      +/-   ##
   ============================================
   - Coverage     53.55%   52.35%   -1.20%     
   + Complexity     2773     2617     -156     
   ============================================
     Files           348      322      -26     
     Lines         16118    14691    -1427     
     Branches       1642     1476     -166     
   ============================================
   - Hits           8632     7692     -940     
   + Misses         6788     6386     -402     
   + Partials        698      613      -85     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `38.50% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `55.25% <0.00%> (-0.03%)` | `0.00 <0.00> (ø)` | |
   | hudihadoopmr | `32.94% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `65.30% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `70.06% <100.00%> (ø)` | `0.00 <3.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.19% <100.00%> (ø)` | `30.00 <1.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.54% <100.00%> (ø)` | `49.00 <2.00> (ø)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `86.48% <100.00%> (ø)` | `11.00 <0.00> (ø)` | |
   | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `77.11% <0.00%> (-1.70%)` | `23.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 merged pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
umehrot2 merged pull request #2208:
URL: https://github.com/apache/hudi/pull/2208


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-720896607


   > Were you able to run tests from within intellij w/ spark3?
   
   @nsivabalan have you tried activating the `spark3` maven profile by default when running the tests in intellij ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-732523090


   I tried to run deltastreamer with sqltransformer 
   
   Hi everyone,
   I am running spark3 https://github.com/apache/hudi/pull/2208
   with deltastreamer and sqltranformer for debezium data
   ``` 
   spark-submit \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   --driver-memory 2g \
   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
   --conf spark.sql.hive.convertMetastoreParquet=false \
   --packages org.apache.spark:spark-avro_2.12:3.0.1 \
   ~/workspace/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar \
   --table-type MERGE_ON_READ \
   --source-ordering-field ts_ms \
   --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --target-base-path /Users/users/Downloads/roi/debezium/by_test/ \
   --target-table users \
   --props ./hudi_base.properties \
   --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   hoodie.upsert.shuffle.parallelism=2
   hoodie.insert.shuffle.parallelism=2
   hoodie.bulkinsert.shuffle.parallelism=2
   # Key fields, for kafka example
   hoodie.datasource.write.storage.type=MERGE_ON_READ
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=ts_ms
   hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
   # schema provider configs
   hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/dbz1.by_test.users-value/versions/latest
   #Kafka props
   hoodie.deltastreamer.source.kafka.topic=dbz1.by_test.users
   metadata.broker.list=localhost:9092
   bootstrap.servers=localhost:9092
   auto.offset.reset=earliest
   schema.registry.url=http://localhost:8081
   hoodie.deltastreamer.transformer.sql=SELECT ts_ms, op, after.* FROM <SRC> WHERE op IN ('u', 'c')
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-720447577






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (2afb343) into [master](https://codecov.io/gh/apache/hudi/commit/62b392b49c13455199e0372204dedf8a371b452c?el=desc) (62b392b) will **decrease** coverage by `7.40%`.
   > The diff coverage is `16.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2208      +/-   ##
   ============================================
   - Coverage     53.50%   46.09%   -7.41%     
   + Complexity     2788     2107     -681     
   ============================================
     Files           355      289      -66     
     Lines         16169    12673    -3496     
     Branches       1650     1240     -410     
   ============================================
   - Hits           8651     5842    -2809     
   + Misses         6818     6401     -417     
   + Partials        700      430     -270     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `38.50% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `55.10% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.35% <20.00%> (-59.75%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `35.54% <100.00%> (-28.92%)` | `11.00 <1.00> (-21.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | ... and [72 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-782984396


   Commenting here so that all interested folks will get notified. We are hitting an issue related to spark2.4.4 vs spark3 while trying out quick start in hudi for MOR table with spark_12 bundle. More details are [here](https://issues.apache.org/jira/browse/HUDI-1568). Running into NoSuchMethodError: 'void org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init> during read of MOR table. 
   
   Can someone help me out as to am I missing something while trying out quick start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r513736349



##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
##########
@@ -173,4 +176,17 @@ public static InternalRow getInternalRowWithError(String partitionPath) {
         .withBulkInsertParallelism(2);
   }
 
+  private static InternalRow serializeRow(ExpressionEncoder encoder, Row row)
+      throws InvocationTargetException, IllegalAccessException, NoSuchMethodException, ClassNotFoundException {
+    // TODO remove reflection if Spark 2.x support is dropped

Review comment:
       let's file a tracking JIRA for this? 

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -96,4 +98,16 @@ object AvroConversionUtils {
     val name = HoodieAvroUtils.sanitizeName(tableName)
     (s"${name}_record", s"hoodie.${name}")
   }
+
+  private def deserializeRow(encoder: ExpressionEncoder[Row], internalRow: InternalRow): Row = {
+    // TODO remove reflection if Spark 2.x support is dropped
+    if (SPARK_VERSION.startsWith("2.")) {

Review comment:
       This was discussed here before? https://github.com/apache/hudi/pull/1760#issuecomment-685998284
   
   using reflection in the fast path, will cause perf issues? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (2d55a99) into [master](https://codecov.io/gh/apache/hudi/commit/36ce5bcd92fa6cb2fb88677eae1a7a846f606e3f?el=desc) (36ce5bc) will **decrease** coverage by `43.14%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2208       +/-   ##
   =============================================
   - Coverage     53.55%   10.41%   -43.15%     
   + Complexity     2773       48     -2725     
   =============================================
     Files           348       50      -298     
     Lines         16118     1777    -14341     
     Branches       1642      211     -1431     
   =============================================
   - Hits           8632      185     -8447     
   + Misses         6788     1579     -5209     
   + Partials        698       13      -685     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <20.00%> (-59.66%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `36.41% <100.00%> (-27.78%)` | `11.00 <1.00> (-19.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [301 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (2d55a99) into [master](https://codecov.io/gh/apache/hudi/commit/36ce5bcd92fa6cb2fb88677eae1a7a846f606e3f?el=desc) (36ce5bc) will **decrease** coverage by `43.14%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2208       +/-   ##
   =============================================
   - Coverage     53.55%   10.41%   -43.15%     
   + Complexity     2773       48     -2725     
   =============================================
     Files           348       50      -298     
     Lines         16118     1777    -14341     
     Branches       1642      211     -1431     
   =============================================
   - Hits           8632      185     -8447     
   + Misses         6788     1579     -5209     
   + Partials        698       13      -685     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <20.00%> (-59.66%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `36.41% <100.00%> (-27.78%)` | `11.00 <1.00> (-19.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [301 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-735409910


   try to write a custom transformer but got the same error 
   I guess something related to spark-sql?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r533589313



##########
File path: hudi-spark2/src/main/java/org/apache/hudi/DataSourceUtilsForSpark2.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi;
+
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.util.CommitUtils;
+import org.apache.hudi.config.HoodieCompactionConfig;
+import org.apache.hudi.config.HoodieIndexConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.index.HoodieIndex;
+
+import java.util.Map;
+
+/**
+ * Utilities used throughout the data source.
+ * TODO: This file is partially copied from org.apache.hudi.DataSourceUtils.
+ * Should be removed if Spark 2.x support is dropped.

Review comment:
       I created a new module ```hudi-spark-common``` to avoid creating xxxForSpark2 classes. 
   So the current structure of hudi-spark would be:
   
   hudi-spark-datasource
                    |
                    -----------------hudi-spark
                    |
                   -----------------hudi-spark-common
                    |
                   -----------------hudi-spark2
                    |
                   -----------------hudi-spark3
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (2869692) into [master](https://codecov.io/gh/apache/hudi/commit/36ce5bcd92fa6cb2fb88677eae1a7a846f606e3f?el=desc) (36ce5bc) will **decrease** coverage by `43.14%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2208       +/-   ##
   =============================================
   - Coverage     53.55%   10.41%   -43.15%     
   + Complexity     2773       48     -2725     
   =============================================
     Files           348       50      -298     
     Lines         16118     1777    -14341     
     Branches       1642      211     -1431     
   =============================================
   - Hits           8632      185     -8447     
   + Misses         6788     1579     -5209     
   + Partials        698       13      -685     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <20.00%> (-59.66%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `36.41% <100.00%> (-27.78%)` | `11.00 <1.00> (-19.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [301 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-735409910


   tried to write a custom transformer using dataset api but got the same error 
   ```
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2
   j  org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$1(Lorg/apache/hudi/client/utils/SparkRowDeserializer;Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/Row;+2


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (2afb343) into [master](https://codecov.io/gh/apache/hudi/commit/62b392b49c13455199e0372204dedf8a371b452c?el=desc) (62b392b) will **decrease** coverage by `43.14%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2208       +/-   ##
   =============================================
   - Coverage     53.50%   10.35%   -43.15%     
   + Complexity     2788       48     -2740     
   =============================================
     Files           355       51      -304     
     Lines         16169     1786    -14383     
     Branches       1650      213     -1437     
   =============================================
   - Hits           8651      185     -8466     
   + Misses         6818     1588     -5230     
   + Partials        700       13      -687     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.35% <20.00%> (-59.75%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `35.54% <100.00%> (-28.92%)` | `11.00 <1.00> (-21.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [308 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rmpifer commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
rmpifer commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517172659



##########
File path: hudi-spark2/src/main/scala/org/apache/hudi/DataSourceOptionsForSpark2.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
+
+/**
+ * Options supported for writing hoodie tables.
+ * TODO: This file is partially copied from org.apache.hudi.DataSourceWriteOptions.
+ * Should be removed if Spark 2.x support is dropped.
+ */
+object DataSourceWriteOptionsForSpark2 {

Review comment:
       I'm not sure this is best to have copies `DataSourceWriteOptions` and `DataSourceUtils`. I see in `hudi-client` there is a a module `hudi-spark-client`. Could we refactor these files to go there or in some new `hudi-spark-common`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r514596118



##########
File path: packaging/hudi-utilities-bundle/pom.xml
##########
@@ -105,6 +106,7 @@
                   <include>io.prometheus:simpleclient_common</include>
                   <include>com.yammer.metrics:metrics-core</include>
                   <include>org.apache.spark:spark-streaming-kafka-0-10_${scala.binary.version}</include>
+                  <include>org.apache.spark:spark-token-provider-kafka-0-10_${scala.binary.version}</include>

Review comment:
       For my understanding, why is this needed ?

##########
File path: LICENSE
##########
@@ -246,6 +246,8 @@ This product includes code from Apache Spark
 
 * org.apache.hudi.AvroConversionHelper copied from classes in org/apache/spark/sql/avro package
 
+* org.apache.hudi.HoodieSparkUtils.scala copied from org.apache.spark.deploy.SparkHadoopUtil.scala

Review comment:
       Perhaps we can be more specific that we `copied some methods` ?

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -121,6 +122,9 @@ private[hudi] object HoodieSparkSqlWriter {
       // short-circuit if bulk_insert via row is enabled.
       // scalastyle:off
       if (parameters(ENABLE_ROW_WRITER_OPT_KEY).toBoolean) {
+        if (SPARK_VERSION.startsWith("3.")) {
+          throw new HoodieException("Bulk insert via row is not compatible with Spark 3, it is only compatible with Spark 2!")
+        }

Review comment:
       Is this not possible through delta streamer ? Seems like not.

##########
File path: pom.xml
##########
@@ -100,6 +104,7 @@
     <prometheus.version>0.8.0</prometheus.version>
     <http.version>4.4.1</http.version>
     <spark.version>2.4.4</spark.version>
+    <spark2.version>2.4.4</spark2.version>

Review comment:
       I would suggest just keeping `spark.version` here. Override the `spark.version` respectively in `hudi-spark2` and `hudi-spark3` modules.

##########
File path: hudi-spark2/src/main/java/org/apache/hudi/internal/DefaultSource.java
##########
@@ -18,7 +18,7 @@
 
 package org.apache.hudi.internal;
 
-import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceUtilsForSpark2;

Review comment:
       What is the need to move the `hudi datasource` itself to `hudi-spark2` ? I think we should leave it under `hudi-spark` and later if we want to have separate datasource implementations we can create separately under `hudi-spark2` and `hudi-spark3` modules. Thoughts ?

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieMergeOnReadTestUtils.java
##########
@@ -85,7 +85,7 @@
         .collect(Collectors.toList()));
 
     return inputPaths.stream().map(path -> {
-      setInputPath(jobConf, path);
+      FileInputFormat.setInputPaths(jobConf, path);

Review comment:
       As discussed internally regarding this in the code review, can you confirm if this is actually converting paths to point to local file system and not HDFS ? Also would be good to explain why you did this for reference in the description.

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -96,4 +98,16 @@ object AvroConversionUtils {
     val name = HoodieAvroUtils.sanitizeName(tableName)
     (s"${name}_record", s"hoodie.${name}")
   }
+
+  private def deserializeRow(encoder: ExpressionEncoder[Row], internalRow: InternalRow): Row = {
+    // TODO remove reflection if Spark 2.x support is dropped
+    if (SPARK_VERSION.startsWith("2.")) {

Review comment:
       +1 Lets have two separate implementations of the Row Deserializer for spark 2 and spark 3, as was done in https://github.com/apache/hudi/pull/1760/files




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-732523090


   I tried to run deltastreamer with sqltransformer 
   
   Hi everyone,
   I am running spark3 https://github.com/apache/hudi/pull/2208
   with deltastreamer and sqltranformer for debezium data
   ``` 
   spark-submit \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   --driver-memory 2g \
   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
   --conf spark.sql.hive.convertMetastoreParquet=false \
   --packages org.apache.spark:spark-avro_2.12:3.0.1 \
   ~/workspace/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar \
   --table-type MERGE_ON_READ \
   --source-ordering-field ts_ms \
   --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --target-base-path /Users/users/Downloads/roi/debezium/by_test/ \
   --target-table users \
   --props ./hudi_base.properties \
   --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   hoodie.upsert.shuffle.parallelism=2
   hoodie.insert.shuffle.parallelism=2
   hoodie.bulkinsert.shuffle.parallelism=2
   # Key fields, for kafka example
   hoodie.datasource.write.storage.type=MERGE_ON_READ
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=ts_ms
   hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
   # schema provider configs
   hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/dbz1.by_test.users-value/versions/latest
   #Kafka props
   hoodie.deltastreamer.source.kafka.topic=dbz1.by_test.users
   metadata.broker.list=localhost:9092
   bootstrap.servers=localhost:9092
   auto.offset.reset=earliest
   schema.registry.url=http://localhost:8081
   hoodie.deltastreamer.transformer.sql=SELECT ts_ms, op, after.* FROM <SRC> WHERE op IN ('u', 'c')
   ```
   
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x000000010f4cbad0, pid=33960, tid=0x0000000000013e03
   #
   # JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-b01)
   # Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode bsd-amd64 compressed oops)
   # Problematic frame:
   # V  [libjvm.dylib+0xcbad0]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-726718174


   @vinothchandar I took one final pass and it LGTM. You can go ahead with you review.
   
   I just have one pending comment https://github.com/apache/hudi/pull/2208/#discussion_r522894340 regarding some refactoring and it would be good to have your opinion there too before @zhedoubushishi implements it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-736969008


   > @zhedoubushishi let me take a pass at this again. thanks for updating.
   > 
   > @giaosudau Need to reproduce this myself to comment further. Does the crash happen deterministically when you try to write? @zhedoubushishi any thoughts?
   
   Sorry for noise - it due to decode of Debezium Schema.
   I need to customize it instead of using Schema Registry.
   After I customized it it is working fine. 
   I removed my comment. Please go head for other issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-733204015


   @giaosudau that seems like JVM crash. Not sure what in this PR could crash that.
   Do you have more diagnostic info? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517548998



##########
File path: hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -24,35 +24,13 @@ import org.apache.hudi.avro.HoodieAvroUtils
 import org.apache.hudi.common.model.HoodieKey
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.avro.SchemaConverters
-import org.apache.spark.sql.catalyst.encoders.RowEncoder
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}
 
 import scala.collection.JavaConverters._
 
 object AvroConversionUtils {
 
-  def createRdd(df: DataFrame, structName: String, recordNamespace: String): RDD[GenericRecord] = {
-    val avroSchema = convertStructTypeToAvroSchema(df.schema, structName, recordNamespace)
-    createRdd(df, avroSchema, structName, recordNamespace)
-  }
-
-  def createRdd(df: DataFrame, avroSchema: Schema, structName: String, recordNamespace: String)
-  : RDD[GenericRecord] = {
-    // Use the Avro schema to derive the StructType which has the correct nullability information
-    val dataType = SchemaConverters.toSqlType(avroSchema).dataType.asInstanceOf[StructType]
-    val encoder = RowEncoder.apply(dataType).resolveAndBind()
-    val deserializer = HoodieSparkUtils.createDeserializer(encoder)
-    df.queryExecution.toRdd.map(row => deserializer.deserializeRow(row))
-      .mapPartitions { records =>
-        if (records.isEmpty) Iterator.empty
-        else {
-          val convertor = AvroConversionHelper.createConverterToAvro(dataType, structName, recordNamespace)
-          records.map { x => convertor(x).asInstanceOf[GenericRecord] }
-        }
-      }
-  }
-
   def createRddForDeletes(df: DataFrame, rowField: String, partitionField: String): RDD[HoodieKey] = {

Review comment:
       Good point. It seems ```createRddForDeletes``` is never used in Hudi, I'll just remove it.
   ```createDataFrame``` is tricky because it is used in ```hudi-spark-client``` so I cannot move it to ```hudi-spark``` module.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-720447577


   hey folks: I wanted to try this out locally. pulled in this branch and was able to compile successfully with this command 
   ```
   mvn package -DskipTests -Dscala-2.12 -Pspark3
   ```
   But after this, I could not run scala tests in hudi-spark from within intellij. I was able to run java tests though in hudi-spark. 
   I do understand that all datasource tests are supposed to fail, but it looks like there is some problem in running the test itself.
   
   For instance, when I run a test from HoodieSparkSqlWriterSuite, I hit the following issue 
   ```
   An exception or error caused a run to abort: org.scalactic.Tolerance.$init$(Lorg/scalactic/Tolerance;)V 
   java.lang.NoSuchMethodError: org.scalactic.Tolerance.$init$(Lorg/scalactic/Tolerance;)V
   	at org.apache.hudi.functional.HoodieSparkSqlWriterSuite.<init>(HoodieSparkSqlWriterSuite.scala:42)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at java.lang.Class.newInstance(Class.java:442)
   	at org.scalatest.tools.Runner$.genSuiteConfig(Runner.scala:1422)
   	at org.scalatest.tools.Runner$$anonfun$31.apply(Runner.scala:1236)
   	at org.scalatest.tools.Runner$$anonfun$31.apply(Runner.scala:1235)
   	at scala.collection.immutable.List.map(List.scala:284)
   	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1235)
   	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
   	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
   	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
   	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
   	at org.scalatest.tools.Runner$.run(Runner.scala:850)
   	at org.scalatest.tools.Runner.run(Runner.scala)
   	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:133)
   	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:27)
   
   ```
   
   And when I try to run tests from TestCOWDataSource, I hit the following exception.
   ```
   java.lang.NoSuchMethodError: scala.collection.JavaConversions$.deprecated$u0020asScalaBuffer(Ljava/util/List;)Lscala/collection/mutable/Buffer;
   
   	at org.apache.hudi.functional.TestCOWDataSource.testShortNameStorage(TestCOWDataSource.scala:65)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
   	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:87)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:53)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:66)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:51)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:87)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:66)
   	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:69)
   	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
   	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
   	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
   ```
   Were you able to run tests from within intellij w/ spark3? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r522894340



##########
File path: hudi-spark2/src/main/scala/org/apache/hudi/DataSourceOptionsForSpark2.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
+
+/**
+ * Options supported for writing hoodie tables.
+ * TODO: This file is partially copied from org.apache.hudi.DataSourceWriteOptions.
+ * Should be removed if Spark 2.x support is dropped.
+ */
+object DataSourceWriteOptionsForSpark2 {

Review comment:
       Yeah I don't really think this is manageable, specially as new properties get added it will be a pain to make sure spark2 and spark3 code is in sync. `hudi-spark-common` module under `hudi-spark` seems like the way to me. But lets get vinoth's suggestion on this as well before you implement cc @vinothchandar 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-782984396


   Commenting here so that all interested folks will get notified. We are hitting an issue related to spark2.4.4 vs spark3 while trying out quick start in hudi for MOR table with spark_12 bundle. More details are [here](https://issues.apache.org/jira/browse/HUDI-1568). Running into NoSuchMethodError: 'void org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init> during read of MOR table. 
   I saw similar error was already quoted [here](https://github.com/apache/hudi/pull/1760#issuecomment-713993418) while spark 3 work was ongoing. so thought someone could help out. 
   Can someone help me out as to what am I missing while trying out quick start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517564830



##########
File path: packaging/hudi-utilities-bundle/pom.xml
##########
@@ -105,6 +106,7 @@
                   <include>io.prometheus:simpleclient_common</include>
                   <include>com.yammer.metrics:metrics-core</include>
                   <include>org.apache.spark:spark-streaming-kafka-0-10_${scala.binary.version}</include>
+                  <include>org.apache.spark:spark-token-provider-kafka-0-10_${scala.binary.version}</include>

Review comment:
       If ```<include>org.apache.spark:spark-token-provider-kafka-0-10_${scala.binary.version}</include>``` is not found when building with Spark 2, maven would just ignore it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (2af55b3) into [master](https://codecov.io/gh/apache/hudi/commit/430d4b428e7c5b325c7414a187f9cda158c2758a?el=desc) (430d4b4) will **decrease** coverage by `43.14%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2208       +/-   ##
   =============================================
   - Coverage     53.54%   10.39%   -43.15%     
   + Complexity     2770       48     -2722     
   =============================================
     Files           348       50      -298     
     Lines         16109     1779    -14330     
     Branches       1643      211     -1432     
   =============================================
   - Hits           8626      185     -8441     
   + Misses         6785     1581     -5204     
   + Partials        698       13      -685     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.39% <20.00%> (-59.70%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `36.41% <100.00%> (-27.78%)` | `11.00 <1.00> (-19.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [318 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r516985501



##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
##########
@@ -173,4 +176,17 @@ public static InternalRow getInternalRowWithError(String partitionPath) {
         .withBulkInsertParallelism(2);
   }
 
+  private static InternalRow serializeRow(ExpressionEncoder encoder, Row row)
+      throws InvocationTargetException, IllegalAccessException, NoSuchMethodException, ClassNotFoundException {
+    // TODO remove reflection if Spark 2.x support is dropped
+    if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) {
+      Method spark2method = encoder.getClass().getMethod("toRow", Object.class);
+      return (InternalRow) spark2method.invoke(encoder, row);

Review comment:
       The problem here is ```hudi-spark2``` already depends on ```hudi-common``` and ```hudi-client```. Say if I create Spark2RowSerializer under ```hudi-spark2```, I also need to make ```hudi-client``` depends on ```hudi-spark2``` and as a result, it will bring a dependency loop.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r528434950



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -113,9 +113,6 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
       hadoopConf = sqlContext.sparkSession.sessionState.newHadoopConf()
     )
 
-    // Follow the implementation of Spark internal HadoopRDD to handle the broadcast configuration.

Review comment:
       is this change needed for this PR>. some context on why this was needed? 

##########
File path: hudi-spark2/src/main/java/org/apache/hudi/DataSourceUtilsForSpark2.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi;
+
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.util.CommitUtils;
+import org.apache.hudi.config.HoodieCompactionConfig;
+import org.apache.hudi.config.HoodieIndexConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.index.HoodieIndex;
+
+import java.util.Map;
+
+/**
+ * Utilities used throughout the data source.
+ * TODO: This file is partially copied from org.apache.hudi.DataSourceUtils.
+ * Should be removed if Spark 2.x support is dropped.

Review comment:
       would be good to have a JIRA with all these follow ups when Spark 2.x support is dropped 

##########
File path: hudi-spark2/src/main/scala/org/apache/hudi/DataSourceOptionsForSpark2.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
+
+/**
+ * Options supported for writing hoodie tables.
+ * TODO: This file is partially copied from org.apache.hudi.DataSourceWriteOptions.
+ * Should be removed if Spark 2.x support is dropped.
+ */
+object DataSourceWriteOptionsForSpark2 {

Review comment:
       yes `hudi-spark-common` is the right approach IMO as well.  

##########
File path: hudi-spark3/pom.xml
##########
@@ -0,0 +1,160 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <parent>
+    <artifactId>hudi</artifactId>
+    <groupId>org.apache.hudi</groupId>
+    <version>0.6.1-SNAPSHOT</version>
+  </parent>
+  <modelVersion>4.0.0</modelVersion>
+
+  <artifactId>hudi-spark3_2.12</artifactId>
+  <packaging>jar</packaging>
+
+  <properties>
+    <main.basedir>${project.parent.basedir}</main.basedir>
+  </properties>
+
+  <build>
+    <resources>
+      <resource>
+        <directory>src/main/resources</directory>
+      </resource>
+    </resources>
+    <pluginManagement>
+      <plugins>
+        <plugin>
+          <groupId>net.alchim31.maven</groupId>
+          <artifactId>scala-maven-plugin</artifactId>
+          <version>${scala-maven-plugin.version}</version>
+          <configuration>
+            <args>
+              <arg>-nobootcp</arg>
+            </args>
+          </configuration>
+        </plugin>
+        <plugin>
+          <groupId>org.apache.maven.plugins</groupId>
+          <artifactId>maven-compiler-plugin</artifactId>
+        </plugin>
+      </plugins>
+    </pluginManagement>
+
+    <plugins>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-dependency-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>copy-dependencies</id>
+            <phase>prepare-package</phase>
+            <goals>
+              <goal>copy-dependencies</goal>
+            </goals>
+            <configuration>
+              <outputDirectory>${project.build.directory}/lib</outputDirectory>
+              <overWriteReleases>true</overWriteReleases>
+              <overWriteSnapshots>true</overWriteSnapshots>
+              <overWriteIfNewer>true</overWriteIfNewer>
+            </configuration>
+          </execution>
+        </executions>
+      </plugin>
+      <plugin>
+        <groupId>net.alchim31.maven</groupId>
+        <artifactId>scala-maven-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>scala-compile-first</id>
+            <phase>process-resources</phase>
+            <goals>
+              <goal>add-source</goal>
+              <goal>compile</goal>
+            </goals>
+          </execution>
+          <execution>
+            <id>scala-test-compile</id>
+            <phase>process-test-resources</phase>
+            <goals>
+              <goal>testCompile</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-compiler-plugin</artifactId>
+        <executions>
+          <execution>
+            <phase>compile</phase>
+            <goals>
+              <goal>compile</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-jar-plugin</artifactId>
+        <executions>
+          <execution>
+            <goals>
+              <goal>test-jar</goal>
+            </goals>
+            <phase>test-compile</phase>
+          </execution>
+        </executions>
+        <configuration>
+          <skip>false</skip>
+        </configuration>
+      </plugin>
+      <plugin>
+        <groupId>org.apache.rat</groupId>
+        <artifactId>apache-rat-plugin</artifactId>
+      </plugin>
+      <plugin>
+        <groupId>org.scalastyle</groupId>
+        <artifactId>scalastyle-maven-plugin</artifactId>
+      </plugin>
+      <plugin>
+        <groupId>org.jacoco</groupId>
+        <artifactId>jacoco-maven-plugin</artifactId>
+      </plugin>
+    </plugins>
+  </build>
+
+  <dependencies>
+    <dependency>
+      <groupId>org.scala-lang</groupId>
+      <artifactId>scala-library</artifactId>
+      <version>2.12.10</version>

Review comment:
       should we still pull this into a property above?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-740945829


   @zhedoubushishi looks like we need to rebase this again. 
   
   @umehrot2 please go ahead and merge once you are happy with this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r516975337



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -96,4 +98,16 @@ object AvroConversionUtils {
     val name = HoodieAvroUtils.sanitizeName(tableName)
     (s"${name}_record", s"hoodie.${name}")
   }
+
+  private def deserializeRow(encoder: ExpressionEncoder[Row], internalRow: InternalRow): Row = {
+    // TODO remove reflection if Spark 2.x support is dropped
+    if (SPARK_VERSION.startsWith("2.")) {

Review comment:
       Done. Added a new module ```hudi-spark3``` to avoid using reflection.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517558332



##########
File path: hudi-spark2/src/main/scala/org/apache/hudi/DataSourceOptionsForSpark2.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
+
+/**
+ * Options supported for writing hoodie tables.
+ * TODO: This file is partially copied from org.apache.hudi.DataSourceWriteOptions.
+ * Should be removed if Spark 2.x support is dropped.
+ */
+object DataSourceWriteOptionsForSpark2 {

Review comment:
       Yea this will avoid us hard copy ```DataSourceWriteOptions``` and ```DataSourceUtils```. My concern is Spark Datasource related files ideally should under ```hudi-spark```. So not sure which is the best practice here..
   
   Maybe create a new module like ```hudi-spark-common``` under ```hudi-spark```?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
zhedoubushishi commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517549982



##########
File path: packaging/hudi-utilities-bundle/pom.xml
##########
@@ -105,6 +107,7 @@
                   <include>io.prometheus:simpleclient_common</include>
                   <include>com.yammer.metrics:metrics-core</include>
                   <include>org.apache.spark:spark-streaming-kafka-0-10_${scala.binary.version}</include>
+                  <include>org.apache.spark:spark-token-provider-kafka-0-10_${scala.binary.version}</include>

Review comment:
       okay I'll override scala version inside ```hudi-spark3```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rmpifer commented on a change in pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
rmpifer commented on a change in pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#discussion_r517159673



##########
File path: packaging/hudi-utilities-bundle/pom.xml
##########
@@ -105,6 +106,7 @@
                   <include>io.prometheus:simpleclient_common</include>
                   <include>com.yammer.metrics:metrics-core</include>
                   <include>org.apache.spark:spark-streaming-kafka-0-10_${scala.binary.version}</include>
+                  <include>org.apache.spark:spark-token-provider-kafka-0-10_${scala.binary.version}</include>

Review comment:
       This was needed when using Kafka + DeltaStreamer in Spark 3.
   ```
   java.lang.NoClassDefFoundError: org/apache/spark/kafka010/KafkaConfigUpdater
       at org.apache.spark.streaming.kafka010.InternalKafkaConsumer.createConsumer(KafkaDataConsumer.scala:115)
   ```
   
   Can we validate this doesn't cause issues when building with Spark 2. I believe this exists only in Spark 3
   
   https://mvnrepository.com/artifact/org.apache.spark/spark-token-provider-kafka-0-10




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] giaosudau edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
giaosudau edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-735409910


   tried to write a custom transformer but got the same error 
   I guess something related to spark-sql?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (2afb343) into [master](https://codecov.io/gh/apache/hudi/commit/62b392b49c13455199e0372204dedf8a371b452c?el=desc) (62b392b) will **decrease** coverage by `1.21%`.
   > The diff coverage is `83.33%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2208      +/-   ##
   ============================================
   - Coverage     53.50%   52.29%   -1.22%     
   + Complexity     2788     2630     -158     
   ============================================
     Files           355      329      -26     
     Lines         16169    14742    -1427     
     Branches       1650     1484     -166     
   ============================================
   - Hits           8651     7709     -942     
   + Misses         6818     6419     -399     
   + Partials        700      614      -86     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `38.50% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `55.10% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | hudihadoopmr | `32.94% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `65.30% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `70.10% <100.00%> (ø)` | `0.00 <3.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.45% <100.00%> (ø)` | `32.00 <1.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.54% <100.00%> (ø)` | `49.00 <2.00> (ø)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `86.48% <100.00%> (ø)` | `11.00 <0.00> (ø)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-718240937


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=h1) Report
   > Merging [#2208](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=desc) (1698576) into [master](https://codecov.io/gh/apache/hudi/commit/36ce5bcd92fa6cb2fb88677eae1a7a846f606e3f?el=desc) (36ce5bc) will **decrease** coverage by `43.14%`.
   > The diff coverage is `20.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2208/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2208       +/-   ##
   =============================================
   - Coverage     53.55%   10.41%   -43.15%     
   + Complexity     2773       48     -2725     
   =============================================
     Files           348       50      -298     
     Lines         16118     1777    -14341     
     Branches       1642      211     -1431     
   =============================================
   - Hits           8632      185     -8447     
   + Misses         6788     1579     -5209     
   + Partials        698       13      -685     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.41% <20.00%> (-59.66%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2208?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `0.00% <0.00%> (-70.55%)` | `0.00 <0.00> (-49.00)` | |
   | [...i/utilities/deltastreamer/SourceFormatAdapter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvU291cmNlRm9ybWF0QWRhcHRlci5qYXZh) | `0.00% <0.00%> (-86.49%)` | `0.00 <0.00> (-11.00)` | |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `36.41% <100.00%> (-27.78%)` | `11.00 <1.00> (-19.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [301 more](https://codecov.io/gh/apache/hudi/pull/2208/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org