You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/07 00:54:06 UTC

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6883: [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata

alexeykudinkin commented on code in PR #6883:
URL: https://github.com/apache/hudi/pull/6883#discussion_r989585629


##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieBloomFilterWriteSupport.java:
##########
@@ -0,0 +1,78 @@
+package org.apache.hudi.avro;
+
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.bloom.HoodieDynamicBoundedBloomFilter;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_AVRO_BLOOM_FILTER_METADATA_KEY;
+
+/**
+ * This is write-support utility base-class taking up handling of
+ *
+ * <ul>
+ *   <li>Adding record keys to the Bloom Filter</li>
+ *   <li>Keeping track of min/max record key (w/in single file)</li>
+ * </ul>
+ *
+ * @param <T> record-key type being ingested by this clas
+ */
+public abstract class HoodieBloomFilterWriteSupport<T extends Comparable<T>> {
+
+  public static final String HOODIE_MIN_RECORD_KEY_FOOTER = "hoodie_min_record_key";
+  public static final String HOODIE_MAX_RECORD_KEY_FOOTER = "hoodie_max_record_key";
+  public static final String HOODIE_BLOOM_FILTER_TYPE_CODE = "hoodie_bloom_filter_type_code";
+
+  private final BloomFilter bloomFilter;
+
+  private T minRecordKey;
+  private T maxRecordKey;
+
+  public HoodieBloomFilterWriteSupport(BloomFilter bloomFilter) {
+    this.bloomFilter = bloomFilter;
+  }
+
+  public void addKey(T recordKey) {
+    bloomFilter.add(getUTF8Bytes(recordKey));

Review Comment:
   Bloom Filter always adds keys as UTF8, though previously we're just passing Strings into it (except in Row-writer)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org