You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/06/14 05:27:15 UTC

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #199: ORC metrics

rdsr commented on a change in pull request #199: ORC metrics
URL: https://github.com/apache/incubator-iceberg/pull/199#discussion_r293663179
 
 

 ##########
 File path: orc/src/main/java/org/apache/iceberg/orc/OrcMetrics.java
 ##########
 @@ -45,16 +61,82 @@ public static Metrics fromInputFile(InputFile file, Configuration config) {
     try {
       final Reader orcReader = OrcFile.createReader(new Path(file.location()),
           OrcFile.readerOptions(config));
+      final Schema schema = TypeConversion.fromOrc(orcReader.getSchema());
+
+      ColumnStatistics[] colStats = orcReader.getStatistics();
+      Map<Integer, Long> columSizes = Maps.newHashMapWithExpectedSize(colStats.length);
+      Map<Integer, Long> valueCounts = Maps.newHashMapWithExpectedSize(colStats.length);
+      Map<Integer, ByteBuffer> lowerBounds = Maps.newHashMap();
+      Map<Integer, ByteBuffer> upperBounds = Maps.newHashMap();
+
+      for(Types.NestedField col : schema.columns()) {
+        final int i = col.fieldId();
+        columSizes.put(i, colStats[i].getBytesOnDisk());
+        valueCounts.put(i, colStats[i].getNumberOfValues());
+
+        Optional<ByteBuffer> orcMin = fromOrcMin(col, colStats[i]);
+        orcMin.ifPresent(byteBuffer -> lowerBounds.put(i, byteBuffer));
+        Optional<ByteBuffer> orcMax = fromOrcMax(col, colStats[i]);
+        orcMax.ifPresent(byteBuffer -> upperBounds.put(i, byteBuffer));
+      }
 
-      // TODO: implement rest of the methods for ORC metrics
       return new Metrics(orcReader.getNumberOfRows(),
-          null,
-          null,
+          columSizes,
+          valueCounts,
           Collections.emptyMap(),
-          null,
-          null);
+          lowerBounds,
+          upperBounds);
     } catch (IOException ioe) {
       throw new RuntimeIOException(ioe, "Failed to read footer of file: %s", file);
     }
   }
+
+  private static Optional<ByteBuffer> fromOrcMin(Types.NestedField column,
+                                                 ColumnStatistics columnStats) {
+    ByteBuffer min = null;
+    if (columnStats instanceof IntegerColumnStatistics) {
+      IntegerColumnStatistics intColStats = (IntegerColumnStatistics) columnStats;
+      if (column.type().typeId() == Type.TypeID.INTEGER) {
+        min = toByteBuffer(column.type(), (int) intColStats.getMinimum());
+      } else {
+        min = toByteBuffer(column.type(), intColStats.getMinimum());
+      }
+    } else if (columnStats instanceof DoubleColumnStatistics) {
+      min = toByteBuffer(column.type(), ((DoubleColumnStatistics) columnStats).getMinimum());
+    } else if (columnStats instanceof StringColumnStatistics) {
+      min = toByteBuffer(column.type(), ((StringColumnStatistics) columnStats).getMinimum());
 
 Review comment:
   I just hit this exception. 
   `
   java.lang.NullPointerException
     at java.nio.CharBuffer.wrap(CharBuffer.java:487)
     at org.apache.iceberg.types.Conversions.toByteBuffer(Conversions.java:96)
     at org.apache.iceberg.orc.OrcMetrics.fromOrcMin(OrcMetrics.java:107)
     at org.apache.iceberg.orc.OrcMetrics.fromInputFile(OrcMetrics.java:77)
     at org.apache.iceberg.orc.OrcMetrics.fromInputFile(OrcMetrics.java:57)
   `
   Seems like StringColumnstatistics can return null for min/max .  What other non-primitive columns can return null for min/max?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org