You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/19 00:38:43 UTC

[GitHub] [iceberg] yyanyy commented on a change in pull request #2464: Core: exclude NaN from upper/lower bound of floating columns in Parquet/ORC

yyanyy commented on a change in pull request #2464:
URL: https://github.com/apache/iceberg/pull/2464#discussion_r654724611



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java
##########
@@ -99,6 +104,10 @@ public static Metrics footerMetrics(ParquetMetadata metadata, Stream<FieldMetric
     MessageType parquetTypeWithIds = getParquetTypeWithIds(metadata, nameMapping);
     Schema fileSchema = ParquetSchemaUtil.convertAndPrune(parquetTypeWithIds);
 
+    Map<Integer, FieldMetrics> fieldMetricsMap = Optional.ofNullable(fieldMetrics)
+        .map(stream -> stream.collect(Collectors.toMap(FieldMetrics::id, Function.identity())))
+        .orElseGet(HashMap::new);

Review comment:
       I was trying to be defensive in handling null stream here which is why I had this Optional wrapper and `orElse`, but you are right the current implementation wouldn't produce a null stream. I'll add a precondition check for null stream and then directly use the stream to create the map. 

##########
File path: core/src/main/java/org/apache/iceberg/FloatFieldMetrics.java
##########
@@ -19,25 +19,17 @@
 
 package org.apache.iceberg;
 
-import java.nio.ByteBuffer;
-
 /**
  * Iceberg internally tracked field level metrics, used by Parquet and ORC writers only.
  * <p>
  * Parquet/ORC keeps track of most metrics in file statistics, and only NaN counter is actually tracked by writers.
  * This wrapper ensures that metrics not being updated by those writers will not be incorrectly used, by throwing
  * exceptions when they are accessed.
  */
-public class FloatFieldMetrics extends FieldMetrics {
-
-  /**
-   * Constructor for creating a FieldMetrics with only NaN counter.
-   * @param id field id being tracked by the writer
-   * @param nanValueCount number of NaN values, will only be non-0 for double or float field.
-   */
-  public FloatFieldMetrics(int id,
-                           long nanValueCount) {
-    super(id, 0L, 0L, nanValueCount, null, null);
+public class FloatFieldMetrics extends FieldMetrics<Number> {

Review comment:
       Sounds good! I was hesitant to do that since I think there are too many duplicated code, I guess I was trying too hard to eliminate duplications... 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org