You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/12/07 13:31:53 UTC

Re: [PR] Parquet: Ensure page statistics are written only when conifgured from the Arrow Writer [arrow-rs]

tustvold commented on code in PR #5181:
URL: https://github.com/apache/arrow-rs/pull/5181#discussion_r1418970317


##########
parquet/src/column/writer/mod.rs:
##########
@@ -764,19 +764,22 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> {
 
         self.column_metrics.num_column_nulls += self.page_metrics.num_page_nulls;
 
-        let page_statistics = match (values_data.min_value, values_data.max_value) {
-            (Some(min), Some(max)) => {
-                update_min(&self.descr, &min, &mut self.column_metrics.min_column_value);
-                update_max(&self.descr, &max, &mut self.column_metrics.max_column_value);
-                Some(ValueStatistics::new(
-                    Some(min),
-                    Some(max),
-                    None,
-                    self.page_metrics.num_page_nulls,
-                    false,
-                ))
-            }
-            _ => None,
+        let page_statistics = if let (Some(min), Some(max)) =

Review Comment:
   Looking at the code it would appear the way this was intended to work is for the encoder itself to not compute page statistics unless EnabledStatistics::Page. If `EnabledStatistics::Chunk` the column stats are computed at a higher-level in write_batch_internal. What do you think about pushing this condition into `ByteArrayEncoder` like it is for `ColumnValueEncoderImpl`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org