You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/07 00:15:51 UTC

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6883: [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata

nsivabalan commented on code in PR #6883:
URL: https://github.com/apache/hudi/pull/6883#discussion_r989564188


##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieBloomFilterWriteSupport.java:
##########
@@ -0,0 +1,78 @@
+package org.apache.hudi.avro;
+
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.bloom.HoodieDynamicBoundedBloomFilter;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_AVRO_BLOOM_FILTER_METADATA_KEY;
+
+/**
+ * This is write-support utility base-class taking up handling of
+ *
+ * <ul>
+ *   <li>Adding record keys to the Bloom Filter</li>
+ *   <li>Keeping track of min/max record key (w/in single file)</li>
+ * </ul>
+ *
+ * @param <T> record-key type being ingested by this clas
+ */
+public abstract class HoodieBloomFilterWriteSupport<T extends Comparable<T>> {
+
+  public static final String HOODIE_MIN_RECORD_KEY_FOOTER = "hoodie_min_record_key";
+  public static final String HOODIE_MAX_RECORD_KEY_FOOTER = "hoodie_max_record_key";
+  public static final String HOODIE_BLOOM_FILTER_TYPE_CODE = "hoodie_bloom_filter_type_code";
+
+  private final BloomFilter bloomFilter;
+
+  private T minRecordKey;
+  private T maxRecordKey;
+
+  public HoodieBloomFilterWriteSupport(BloomFilter bloomFilter) {
+    this.bloomFilter = bloomFilter;
+  }
+
+  public void addKey(T recordKey) {
+    bloomFilter.add(getUTF8Bytes(recordKey));

Review Comment:
   previously, we were adding strings as is into bloom filter (HoodieAvroWriteSupport), but now w/ this patch, we are switching to adding UTF8Bytes? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org