You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/06 14:44:12 UTC

[GitHub] [arrow] wgtmac commented on a diff in pull request #14556: PARQUET-2211: [C++] Print ColumnMetaData.encoding_stats field

wgtmac commented on code in PR #14556:
URL: https://github.com/apache/arrow/pull/14556#discussion_r1014841622


##########
cpp/src/parquet/printer.cc:
##########
@@ -39,6 +39,25 @@ namespace parquet {
 
 class ColumnReader;
 
+namespace {
+
+void PrintPageEncodingStats(std::ostream& stream,
+                            const std::vector<PageEncodingStats>& encoding_stats) {
+  for (size_t i = 0; i < encoding_stats.size(); ++i) {
+    const auto& encoding = encoding_stats.at(i);
+    stream << EncodingToString(encoding.encoding);
+    if (encoding.page_type == parquet::PageType::DICTIONARY_PAGE) {
+      // Explicitly tell if this encoding comes from a dictionary page
+      stream << "(DICT_PAGE)";

Review Comment:
   The main idea is to tell this encoding comes from the dictionary page. IIUC, both dictionary page and data page use PLAIN_DICTIONARY when dictionary encoding is applied in the Parquet 1.0. While in Parquet 2.0, dictionary page uses PLAIN and data page uses RLE_DICTIONARY. So it is difficult to tell where the PLAIN_DICTIONARY or PLAIN encoding comes from.  Please check this for detail:　https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org