You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/31 19:36:14 UTC

[GitHub] [hudi] yihua commented on a change in pull request #3893: [HUDI-2656] Generalize HoodieIndex for flexible record data type

yihua commented on a change in pull request #3893:
URL: https://github.com/apache/hudi/pull/3893#discussion_r795967769



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java
##########
@@ -218,10 +216,10 @@ public boolean isImplicitWithStorage() {
   /**
    * Tag the <rowKey, filename> back to the original HoodieRecord List.
    */
-  protected HoodieData<HoodieRecord<T>> tagLocationBacktoRecords(
+  protected <R> HoodieData<HoodieRecord<R>> tagLocationBacktoRecords(
       HoodiePairData<HoodieKey, HoodieRecordLocation> keyFilenamePair,
-      HoodieData<HoodieRecord<T>> records) {
-    HoodiePairData<HoodieKey, HoodieRecord<T>> keyRecordPairs =
+      HoodieData<HoodieRecord<R>> records) {
+    HoodiePairData<HoodieKey, HoodieRecord<?>> keyRecordPairs =

Review comment:
       should this be `HoodiePairData<HoodieKey, HoodieRecord<R>>`?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java
##########
@@ -85,7 +86,7 @@ public void write(GenericRecord oldRecord) {
       }
 
       // This is a new insert
-      HoodieRecord<T> hoodieRecord = new HoodieRecord<>(keyToNewRecords.get(keyToPreWrite));
+      HoodieRecord<T> hoodieRecord = new HoodieAvroRecord<>(keyToNewRecords.get(keyToPreWrite));

Review comment:
       Similar here for using `.newInstance()`?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -390,7 +391,7 @@ public boolean canWrite(HoodieRecord record) {
 
   @Override
   public void write(HoodieRecord record, Option<IndexedRecord> insertValue) {
-    Option<Map<String, String>> recordMetadata = record.getData().getMetadata();
+    Option<Map<String, String>> recordMetadata = ((HoodieAvroRecord) record).getData().getMetadata();

Review comment:
       should be `((HoodieRecordPayload) record.getData()).getMetadata();`?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -324,7 +325,7 @@ public void write(GenericRecord oldRecord) {
     if (keyToNewRecords.containsKey(key)) {
       // If we have duplicate records that we are updating, then the hoodie record will be deflated after
       // writing the first record. So make a copy of the record to be merged
-      HoodieRecord<T> hoodieRecord = new HoodieRecord<>(keyToNewRecords.get(key));
+      HoodieRecord<T> hoodieRecord = new HoodieAvroRecord<T>(keyToNewRecords.get(key));

Review comment:
       Can this be replaced by `keyToNewRecords.get(key).newInstance()`?

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/TestHoodieIndexConfigs.java
##########
@@ -132,47 +115,6 @@ public void testCreateIndexWithException() {
     assertTrue(thrown2.getMessage().contains("Unable to instantiate class"));
   }
 
-  public static class DummyHoodieIndex<T extends HoodieRecordPayload<T>> extends SparkHoodieIndex<T> {

Review comment:
       Maybe rewrite this for new APIs?

##########
File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/FlinkHoodieIndex.java
##########
@@ -48,21 +47,22 @@ protected FlinkHoodieIndex(HoodieWriteConfig config) {
   @PublicAPIMethod(maturity = ApiMaturityLevel.DEPRECATED)
   public abstract List<WriteStatus> updateLocation(List<WriteStatus> writeStatuses,
                                                    HoodieEngineContext context,
-                                                   HoodieTable<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> hoodieTable) throws HoodieIndexException;
+                                                   HoodieTable hoodieTable) throws HoodieIndexException;

Review comment:
       Similar for engine-specific HoodieIndex classes to remove deprecated API methods altogether (in a separate PR).

##########
File path: hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/testutils/HoodieFlinkWriteableTestTable.java
##########
@@ -130,7 +131,7 @@ public HoodieFlinkWriteableTestTable withInserts(String partition, String fileId
       header.put(HeaderMetadataType.SCHEMA, schema.toString());
       logWriter.appendBlock(new HoodieAvroDataBlock(groupedRecords.stream().map(r -> {
         try {
-          GenericRecord val = (GenericRecord) r.getData().getInsertValue(schema).get();
+          GenericRecord val = (GenericRecord) ((HoodieAvroRecord) r).getData().getInsertValue(schema).get();

Review comment:
       Use `HoodieRecordPayload` here?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+public class HoodieAvroRecord<T extends HoodieRecordPayload> extends HoodieRecord<T> {

Review comment:
       As discussed, this is more of an intermediate solution for row writer, before RFC-46 revamps it completely.

##########
File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/HoodieWriteableTestTable.java
##########
@@ -175,7 +177,7 @@ public HoodieWriteableTestTable withInserts(String partition, String fileId, Lis
       header.put(HoodieLogBlock.HeaderMetadataType.SCHEMA, schema.toString());
       logWriter.appendBlock(new HoodieAvroDataBlock(groupedRecords.stream().map(r -> {
         try {
-          GenericRecord val = (GenericRecord) r.getData().getInsertValue(schema).get();
+          GenericRecord val = (GenericRecord) ((HoodieAvroRecord) r).getData().getInsertValue(schema).get();

Review comment:
       similar here.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java
##########
@@ -124,29 +123,29 @@ public HoodieGlobalSimpleIndex(HoodieWriteConfig config, Option<BaseKeyGenerator
 
     return incomingRecords.leftOuterJoin(existingRecordByRecordKey).values()
         .flatMap(entry -> {
-          HoodieRecord<T> inputRecord = entry.getLeft();
+          HoodieRecord<R> inputRecord = entry.getLeft();
           Option<Pair<String, HoodieRecordLocation>> partitionPathLocationPair = Option.ofNullable(entry.getRight().orElse(null));
-          List<HoodieRecord<T>> taggedRecords;
+          List<HoodieRecord<R>> taggedRecords;
 
           if (partitionPathLocationPair.isPresent()) {
             String partitionPath = partitionPathLocationPair.get().getKey();
             HoodieRecordLocation location = partitionPathLocationPair.get().getRight();
             if (config.getGlobalSimpleIndexUpdatePartitionPath() && !(inputRecord.getPartitionPath().equals(partitionPath))) {
               // Create an empty record to delete the record in the old partition
-              HoodieRecord<T> deleteRecord = new HoodieRecord(new HoodieKey(inputRecord.getRecordKey(), partitionPath), new EmptyHoodieRecordPayload());
+              HoodieRecord<R> deleteRecord = new HoodieAvroRecord(new HoodieKey(inputRecord.getRecordKey(), partitionPath), new EmptyHoodieRecordPayload());

Review comment:
       Similar here for revisiting later on.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapRecordConsumer.java
##########
@@ -39,7 +40,8 @@ public BootstrapRecordConsumer(HoodieBootstrapHandle bootstrapHandle) {
   @Override
   protected void consumeOneRecord(HoodieRecord record) {
     try {
-      bootstrapHandle.write(record, record.getData().getInsertValue(bootstrapHandle.getWriterSchemaWithMetaFields()));
+      bootstrapHandle.write(record, ((HoodieAvroRecord) record).getData()

Review comment:
       use `((HoodieRecordPayload) record.getData())` as well?

##########
File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/HoodieWriteableTestTable.java
##########
@@ -141,7 +143,7 @@ public HoodieWriteableTestTable withInserts(String partition, String fileId, Lis
           config, schema, contextSupplier)) {
         int seqId = 1;
         for (HoodieRecord record : records) {
-          GenericRecord avroRecord = (GenericRecord) record.getData().getInsertValue(schema).get();
+          GenericRecord avroRecord = (GenericRecord) ((HoodieAvroRecord) record).getData().getInsertValue(schema).get();

Review comment:
       use `((HoodieRecordPayload) record.getData())`?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieGlobalBloomIndex.java
##########
@@ -109,29 +110,29 @@ public HoodieGlobalBloomIndex(HoodieWriteConfig config, BaseHoodieBloomIndexHelp
 
     // Here as the records might have more data than rowKeys (some rowKeys' fileId is null), so we do left outer join.
     return incomingRowKeyRecordPairs.leftOuterJoin(existingRecordKeyToRecordLocationHoodieKeyMap).values().flatMap(record -> {
-      final HoodieRecord<T> hoodieRecord = record.getLeft();
+      final HoodieRecord<R> hoodieRecord = record.getLeft();
       final Option<Pair<HoodieRecordLocation, HoodieKey>> recordLocationHoodieKeyPair = record.getRight();
       if (recordLocationHoodieKeyPair.isPresent()) {
         // Record key matched to file
         if (config.getBloomIndexUpdatePartitionPath()
             && !recordLocationHoodieKeyPair.get().getRight().getPartitionPath().equals(hoodieRecord.getPartitionPath())) {
           // Create an empty record to delete the record in the old partition
-          HoodieRecord<T> deleteRecord = new HoodieRecord(recordLocationHoodieKeyPair.get().getRight(),
+          HoodieRecord<R> deleteRecord = new HoodieAvroRecord(recordLocationHoodieKeyPair.get().getRight(),

Review comment:
       This needs to be revisited to make it Avro agnostic later on.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java
##########
@@ -60,7 +57,7 @@ protected HoodieIndex(HoodieWriteConfig config) {
   @Deprecated
   @PublicAPIMethod(maturity = ApiMaturityLevel.DEPRECATED)
   public I tagLocation(I records, HoodieEngineContext context,
-                       HoodieTable<T, I, K, O> hoodieTable) throws HoodieIndexException {
+                       HoodieTable hoodieTable) throws HoodieIndexException {

Review comment:
       @xushiyan @alexeykudinkin Since this is anyway backward incompatible, should we just remove the deprecated public API methods and get rid of `I` and `O` as well?  The reason to keep these methods and the generics is to adapt for users extending these APIs.  If you want to change the generics, I'd prefer that all such generics changes in relation to all public APIs should get in 0.11.0 release in one shot.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org