You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/07/03 22:25:00 UTC

[jira] [Commented] (PARQUET-1341) Null count is suppressed when columns have no min or max and use unsigned sort order

    [ https://issues.apache.org/jira/browse/PARQUET-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532022#comment-16532022 ] 

ASF GitHub Bot commented on PARQUET-1341:
-----------------------------------------

rdblue closed pull request #499: PARQUET-1341: Fix null count stats in unsigned-sort columns.
URL: https://github.com/apache/parquet-mr/pull/499
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java b/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
index ff3d6cb3d..d2225052d 100644
--- a/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
+++ b/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
@@ -621,9 +621,6 @@ private static boolean isMinMaxStatsSupported(PrimitiveType type) {
           statsBuilder.withMin(min);
           statsBuilder.withMax(max);
         }
-        if (formatStats.isSetNull_count()) {
-          statsBuilder.withNumNulls(formatStats.null_count);
-        }
       } else {
         boolean isSet = formatStats.isSetMax() && formatStats.isSetMin();
         boolean maxEqualsMin = isSet ? Arrays.equals(formatStats.getMin(), formatStats.getMax()) : false;
@@ -639,11 +636,12 @@ private static boolean isMinMaxStatsSupported(PrimitiveType type) {
             statsBuilder.withMin(formatStats.min.array());
             statsBuilder.withMax(formatStats.max.array());
           }
-          if (formatStats.isSetNull_count()) {
-            statsBuilder.withNumNulls(formatStats.null_count);
-          }
         }
       }
+
+      if (formatStats.isSetNull_count()) {
+        statsBuilder.withNumNulls(formatStats.null_count);
+      }
     }
     return statsBuilder.build();
   }
diff --git a/parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java b/parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java
index b3eebd6ae..1474525ba 100644
--- a/parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java
+++ b/parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java
@@ -617,7 +617,9 @@ public void testIgnoreStatsWithSignedSortOrder() {
         StatsHelper.V1.toParquetStatistics(stats),
         binaryType);
 
-    Assert.assertTrue("Stats should be empty: " + convertedStats, convertedStats.isEmpty());
+    Assert.assertFalse("Stats should not include min/max: " + convertedStats, convertedStats.hasNonNullValue());
+    Assert.assertTrue("Stats should have null count: " + convertedStats, convertedStats.isNumNullsSet());
+    Assert.assertEquals("Stats should have 3 nulls: " + convertedStats, 3L, convertedStats.getNumNulls());
   }
 
   @Test


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Null count is suppressed when columns have no min or max and use unsigned sort order
> ------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1341
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1341
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.10.0
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)