You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/21 02:39:49 UTC
[GitHub] [hudi] xushiyan commented on a diff in pull request #5378: [HUDI-3934] Fix `Spark32HoodieParquetFileFormat` not being compatible w/ Spark 3.2.0

xushiyan commented on code in PR #5378:
URL: https://github.com/apache/hudi/pull/5378#discussion_r854721036


##########
hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_1Adapter.scala:
##########
@@ -23,7 +23,7 @@ import org.apache.spark.SPARK_VERSION
 import org.apache.spark.sql.avro.{HoodieAvroDeserializer, HoodieAvroSerializer, HoodieSpark3_1AvroDeserializer, HoodieSpark3_1AvroSerializer}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, Spark312HoodieParquetFileFormat}
+import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, Spark31HoodieParquetFileFormat}

Review Comment:
   this renaming seems to be fine. @xiaorixiaoyao is there any particular reason that it was only meant for Spark312?



##########
hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java:
##########
@@ -85,11 +87,14 @@ public static <T extends HoodieRecordPayload> T loadPayload(String recordPayload
    * Creates an instance of the given class. Use this version when dealing with interface types as constructor args.
    */
   public static Object loadClass(String clazz, Class<?>[] constructorArgTypes, Object... constructorArgs) {
-    try {
-      return getClass(clazz).getConstructor(constructorArgTypes).newInstance(constructorArgs);
-    } catch (InstantiationException | IllegalAccessException | InvocationTargetException | NoSuchMethodException e) {
-      throw new HoodieException("Unable to instantiate class " + clazz, e);
-    }
+    return newInstanceUnchecked(getClass(clazz), constructorArgTypes, constructorArgs);
+  }
+
+  /**
+   * Creates an instance of the given class. Constructor arg types are inferred.
+   */
+  public static Object loadClass(String clazz, Object... constructorArgs) {
+    return newInstanceUnchecked(getClass(clazz), constructorArgs);

Review Comment:
   this is affecting a lot of code paths. can't we make a specialized one just for this patch and leave the original as is? we should mitigate the risk here.



##########
hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java:
##########
@@ -85,11 +87,14 @@ public static <T extends HoodieRecordPayload> T loadPayload(String recordPayload
    * Creates an instance of the given class. Use this version when dealing with interface types as constructor args.
    */
   public static Object loadClass(String clazz, Class<?>[] constructorArgTypes, Object... constructorArgs) {
-    try {
-      return getClass(clazz).getConstructor(constructorArgTypes).newInstance(constructorArgs);
-    } catch (InstantiationException | IllegalAccessException | InvocationTargetException | NoSuchMethodException e) {
-      throw new HoodieException("Unable to instantiate class " + clazz, e);
-    }
+    return newInstanceUnchecked(getClass(clazz), constructorArgTypes, constructorArgs);
+  }
+
+  /**
+   * Creates an instance of the given class. Constructor arg types are inferred.
+   */
+  public static Object loadClass(String clazz, Object... constructorArgs) {
+    return newInstanceUnchecked(getClass(clazz), constructorArgs);

Review Comment:
   I don't even see this `newInstanceUnchecked` was used in this patch. can we revert the change? or create a special one if you need it here?



##########
hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32HoodieParquetFileFormat.scala:
##########
@@ -225,6 +239,10 @@ class Spark32HoodieParquetFileFormat(private val shouldAppendPartitionValues: Bo
       if (enableVectorizedReader) {
         val vectorizedReader =
           if (shouldUseInternalSchema) {
+            val int96RebaseSpec =
+              DataSourceUtils.int96RebaseSpec(footerFileMetaData.getKeyValueMetaData.get, int96RebaseModeInRead)
+            val datetimeRebaseSpec =
+              DataSourceUtils.datetimeRebaseSpec(footerFileMetaData.getKeyValueMetaData.get, datetimeRebaseModeInRead)

Review Comment:
   this implies schema evolution only works with 3.2.1 ? cc @xiaorixiaoyao



##########
hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark31HoodieParquetFileFormat.scala:
##########
@@ -61,7 +61,7 @@ import java.net.URI
  *   <li>Schema on-read</li>
  * </ol>
  */
-class Spark312HoodieParquetFileFormat(private val shouldAppendPartitionValues: Boolean) extends ParquetFileFormat {
+class Spark31HoodieParquetFileFormat(private val shouldAppendPartitionValues: Boolean) extends ParquetFileFormat {

Review Comment:
   the changes to 31 parquet format is purely refactoring right? looking ok but i would rather be super paranoid at this stage to introduce more diffs than needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org