You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/02/09 20:36:14 UTC

[GitHub] [arrow] wgtmac commented on pull request #34107: GH-34106: Fix updating page stats for WriteArrowDictionary

wgtmac commented on PR #34107:
URL: https://github.com/apache/arrow/pull/34107#issuecomment-1424783602

   Converted to draft because I hit another issue: https://github.com/apache/arrow/issues/14870. The C++ parquet reader does not parse column statistics correctly here: https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_reader.cc#L214
   ```cpp
   // Extracts encoded statistics from V1 and V2 data page headers
   template <typename H>
   EncodedStatistics ExtractStatsFromHeader(const H& header) {
     EncodedStatistics page_statistics;
     if (!header.__isset.statistics) {
       return page_statistics;
     }
     const format::Statistics& stats = header.statistics;
     if (stats.__isset.max) {
       page_statistics.set_max(stats.max);
     }
     if (stats.__isset.min) {
       page_statistics.set_min(stats.min);
     }
     if (stats.__isset.null_count) {
       page_statistics.set_null_count(stats.null_count);
     }
     if (stats.__isset.distinct_count) {
       page_statistics.set_distinct_count(stats.distinct_count);
     }
     return page_statistics;
   }
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org