You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/03 20:53:54 UTC

[GitHub] [arrow-rs] yordan-pavlov commented on a change in pull request #1130: Fix reading of dictionary encoded pages with null values (#1111)

yordan-pavlov commented on a change in pull request #1130:
URL: https://github.com/apache/arrow-rs/pull/1130#discussion_r777699023



##########
File path: parquet/src/arrow/arrow_array_reader.rs
##########
@@ -420,6 +456,24 @@ impl<'a, C: ArrayConverter + 'a> ArrowArrayReader<'a, C> {
         }
     }
 
+    fn count_def_level_values(
+        column_desc: &ColumnDescriptor,
+        level_decoder: crate::encodings::levels::LevelDecoder,
+        num_values: usize,
+    ) -> Result<usize> {
+        let mut def_level_decoder = LevelValueDecoder::new(level_decoder);
+        let def_level_array =

Review comment:
       Yes - def levels are decoded a second time for this fix and an i16 array and a boolean array are created to count the non-null values, but they only live for a very short time and the negative effect on performance is surprisingly small (3% to 8% in my benchmark run) see here: https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003718555 ; even after this change the `ArrowArrayReader` is still often several times faster for decoding strings compared to the old `ArrayReader`, this hasn't changed much.
   
   It's probably possible to make this more efficient, but it would require more thinking and more time for a bigger change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org