You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/07 02:42:47 UTC

[GitHub] [hudi] xushiyan commented on a change in pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

xushiyan commented on a change in pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#discussion_r820337449



##########
File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
##########
@@ -69,6 +71,7 @@
  * Helper class to do common stuff across Avro.
  */
 public class HoodieAvroUtils {
+  private static final Logger LOG = LogManager.getLogger(HoodieAvroUtils.class);

Review comment:
       LOG is not used. can we revert?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/debezium/AbstractDebeziumAvroPayload.java
##########
@@ -76,7 +75,12 @@ public AbstractDebeziumAvroPayload(Option<GenericRecord> record) {
     boolean delete = false;
     if (insertRecord instanceof GenericRecord) {
       GenericRecord record = (GenericRecord) insertRecord;
-      Object value = record.get(DebeziumConstants.FLATTENED_OP_COL_NAME);
+      Object value;
+      if (record.getSchema().getField(DebeziumConstants.FLATTENED_OP_COL_NAME) == null) {
+        value = null;
+      } else {
+        value = record.get(DebeziumConstants.FLATTENED_OP_COL_NAME);
+      }

Review comment:
       the new check does not look clean; it looks like whenever we try to get a column's value, we need to check its nullability? are we able to retrieve the value the without this kind of check

##########
File path: hudi-spark-datasource/hudi-spark/pom.xml
##########
@@ -364,6 +395,12 @@
      <scope>provided</scope>
     </dependency>
 
+    <dependency>
+      <groupId>org.apache.hive</groupId>
+      <artifactId>hive-storage-api</artifactId>
+      <version>2.7.2</version>
+    </dependency>
+

Review comment:
       this means compile scope for hudi-spark. how does this affect the bundle jar? do we shade it? want to understand the risk of setting a special dependency here

##########
File path: packaging/hudi-presto-bundle/pom.xml
##########
@@ -198,6 +198,7 @@
     <dependency>
       <groupId>org.apache.avro</groupId>
       <artifactId>avro</artifactId>
+      <version>${avro.version}</version>

Review comment:
       same avro version question

##########
File path: hudi-common/pom.xml
##########
@@ -112,6 +112,7 @@
     <dependency>
       <groupId>org.apache.avro</groupId>
       <artifactId>avro</artifactId>
+      <version>${avro.version}</version>

Review comment:
       do we need this? thought it inherits the same version from the root pom

##########
File path: hudi-examples/pom.xml
##########
@@ -194,6 +194,7 @@
     <dependency>
       <groupId>org.apache.avro</groupId>
       <artifactId>avro</artifactId>
+      <version>${avro.version}</version>

Review comment:
       same question: do we need this? thought it inherits the same version from the root pom
   
   

##########
File path: azure-pipelines.yml
##########
@@ -24,9 +24,10 @@ pool:
 variables:
   MAVEN_CACHE_FOLDER: $(Pipeline.Workspace)/.m2/repository
   MAVEN_OPTS: '-Dmaven.repo.local=$(MAVEN_CACHE_FOLDER) -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true'
-  SPARK_VERSION: '2.4.4'
+  SPARK_VERSION: '3.2.0'

Review comment:
       this needs to upgrade to 3.2.1 to align with master version

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java
##########
@@ -186,7 +186,11 @@ public static Writable avroToArrayWritable(Object value, Schema schema) {
         Writable[] recordValues = new Writable[schema.getFields().size()];
         int recordValueIndex = 0;
         for (Schema.Field field : schema.getFields()) {
-          recordValues[recordValueIndex++] = avroToArrayWritable(record.get(field.name()), field.schema());
+          if (record.getSchema().getField(field.name()) == null) {
+            recordValues[recordValueIndex++] = null;
+          } else {
+            recordValues[recordValueIndex++] = avroToArrayWritable(record.get(field.name()), field.schema());
+          }

Review comment:
       same here: null check makes the code cumbersome. can we try to avoid this? 

##########
File path: hudi-spark-datasource/hudi-spark3/src/test/java/org/apache/hudi/spark3/internal/TestReflectUtil.java
##########
@@ -42,7 +43,7 @@ public void testDataSourceWriterExtraCommitMetadata() throws Exception {
     InsertIntoStatement newStatment = ReflectUtil.createInsertInto(
         statement.table(),
         statement.partitionSpec(),
-        scala.collection.immutable.List.empty(),
+        JavaConverters.collectionAsScalaIterableConverter(new ArrayList<String>()).asScala().toSeq(),

Review comment:
       so you need a seq why not `scala.collection.immutable.List.empty().toSeq()` ?

##########
File path: hudi-utilities/pom.xml
##########
@@ -194,6 +194,13 @@
       <version>${fasterxml.version}</version>
     </dependency>
 
+    <dependency>
+      <groupId>org.apache.avro</groupId>
+      <artifactId>avro</artifactId>
+      <version>1.8.2</version>
+      <scope>provided</scope>
+    </dependency>

Review comment:
       concern on the impact here: why we set an old version just for utilities?

##########
File path: hudi-spark-datasource/hudi-spark/pom.xml
##########
@@ -274,6 +280,7 @@
     <dependency>
       <groupId>org.apache.avro</groupId>
       <artifactId>avro</artifactId>
+      <version>${avro.version}</version>

Review comment:
       same question: do we need this? thought it inherits the same version from the root pom
   
   

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteNonDefaultsWithLatestAvroPayload.java
##########
@@ -131,10 +130,10 @@ public void testDeletedRecord() throws IOException {
   @Test
   public void testNullColumn() throws IOException {
     Schema avroSchema = Schema.createRecord(Arrays.asList(
-            new Schema.Field("id", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE),
-            new Schema.Field("name", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE),
-            new Schema.Field("age", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE),
-            new Schema.Field("job", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE)
+            new Schema.Field("id", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null),
+            new Schema.Field("name", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null),
+            new Schema.Field("age", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null),
+            new Schema.Field("job", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null)

Review comment:
       we should prefer to use `org.apache.avro.Schema.Field#NULL_DEFAULT_VALUE` whenever we want to set default as null

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestHoodieIncrSource.java
##########
@@ -62,6 +63,7 @@ public void tearDown() throws IOException {
     cleanupResources();
   }
 
+  @Disabled

Review comment:
       should be helpful to set a reason

##########
File path: hudi-hadoop-mr/pom.xml
##########
@@ -41,6 +41,7 @@
     <dependency>
       <groupId>org.apache.avro</groupId>
       <artifactId>avro</artifactId>
+      <version>${avro.version}</version>

Review comment:
       ditto

##########
File path: hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestOrcBootstrap.java
##########
@@ -169,11 +170,13 @@ public Schema generateNewDataSetAndReturnSchema(long timestamp, int numRecords,
     return AvroOrcUtils.createAvroSchemaWithDefaultValue(orcSchema, "test_orc_record", null, true);
   }
 
+  @Disabled
   @Test
   public void testMetadataBootstrapNonpartitionedCOW() throws Exception {
     testBootstrapCommon(false, false, EffectiveMode.METADATA_BOOTSTRAP_MODE);
   }

Review comment:
       you can disable for the whole class instead of annotating each method, and add a note in the annotation like `@Disabled("<describe the reason>")`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org