You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/25 03:04:26 UTC

[GitHub] [hudi] wzx140 commented on a diff in pull request #7021: [Minor] fix multi deser avro payload

wzx140 commented on code in PR #7021:
URL: https://github.com/apache/hudi/pull/7021#discussion_r1003952529


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java:
##########
@@ -189,14 +193,37 @@ public Option<Map<String, String>> getMetadata() {
 
   @Override
   public Option<HoodieAvroIndexedRecord> toIndexedRecord(Schema recordSchema, Properties props) throws IOException {
-    Option<IndexedRecord> avroData = getData().getInsertValue(recordSchema, props);
+    Option<IndexedRecord> avroData = getCachedDeserializedRecord(recordSchema, props);
     if (avroData.isPresent()) {
       return Option.of(new HoodieAvroIndexedRecord(avroData.get()));
     } else {
       return Option.empty();
     }
   }
 
+  private Option<IndexedRecord> getCachedDeserializedRecord(Schema recordSchema, Properties props) throws IOException {
+    // Check schema identical
+    if (this.cachedDeserializedRecord != null && this.cachedDeserializedRecord.isPresent()
+        && !compareSchema(cachedDeserializedRecord.get().getSchema(), recordSchema)) {
+      this.cachedDeserializedRecord = null;
+    }
+    if (this.cachedDeserializedRecord == null) {
+      this.cachedDeserializedRecord = this.data.getInsertValue(recordSchema, props);
+    }
+    return this.cachedDeserializedRecord;
+  }
+
+  private static Boolean compareSchema(Schema left, Schema right) {
+    if (left == null || right == null) {
+      return false;
+    }
+    Pair<Schema, Schema> schemaPair = Pair.of(left, right);
+    if (!SCHEMA_COMPARE_MAP.containsKey(schemaPair)) {

Review Comment:
   1. SCHEMA_COMPARE_MAP is hashmap and its get/containsKey function is O(1). We just compare Pair once.
   2. As you said before, schema compare will kill perf-gain. We need to cache the compare result.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org