You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/05/04 07:50:56 UTC

[GitHub] [arrow] wgtmac commented on a diff in pull request #35428: GH-35423: [C++][Parquet][BugFix] Parquet Encoding Fixing mismatched buffer size

wgtmac commented on code in PR #35428:
URL: https://github.com/apache/arrow/pull/35428#discussion_r1184663311


##########
cpp/src/parquet/column_reader.cc:
##########
@@ -857,7 +857,10 @@ class ColumnReaderImplBase {
   // first page with this encoding.
   void InitializeDataDecoder(const DataPage& page, int64_t levels_byte_size) {
     const uint8_t* buffer = page.data() + levels_byte_size;
-    const int64_t data_size = page.size() - levels_byte_size;
+    // PageReader may reuse the underlying buffer, so data_size
+    // should use `page.uncompressed_size() - levels_byte_size` rather
+    // than `page.size() - level_byte_size`.
+    const int64_t data_size = page.uncompressed_size() - levels_byte_size;

Review Comment:
   nit: it would be helpful to comment this is now a precise uncompressed data size, compared to an upper bound in the past.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org