You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/17 15:24:54 UTC

[GitHub] [arrow-rs] crepererum opened a new issue #306: All-null column get wrong parquet null-counts

crepererum opened a new issue #306:
URL: https://github.com/apache/arrow-rs/issues/306


   **Describe the bug**
   When serializing an all-null arrow array to parquet, the null-count in the stats is always 0.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   ```rust
   #[test]
   fn statistics_null_counts_only_nulls() {
       // check that null-count statistics for "only NULL"-columns are correct
       let values = Arc::new(UInt64Array::from(vec![
           None,
           None,
       ]));
       let file = one_column_roundtrip("null_counts", values, true);
   
       // check statistics are valid
       let reader = SerializedFileReader::new(file).unwrap();
       let metadata = reader.metadata();
       assert_eq!(metadata.num_row_groups(), 1);
       let row_group = metadata.row_group(0);
       assert_eq!(row_group.num_columns(), 1);
       let column = row_group.column(0);
       let stats = column.statistics().unwrap();
       assert_eq!(stats.null_count(), 2);  // <<< this fails, null count is 0
   }
   ```
   
   **Expected behavior**
   For all-null columns the null-count should be the same as the number of rows.
   
   **Additional context**
   Tested on `c863a2c44bffa5c092a49e07910d5e9225483193`.
   
   **I am claiming this issue since I have a fix ready.**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao closed issue #306: All-null column get wrong parquet null-counts

Posted by GitBox <gi...@apache.org>.
sunchao closed issue #306:
URL: https://github.com/apache/arrow-rs/issues/306


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org