You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/07 17:29:22 UTC

[GitHub] [arrow-rs] jorgecarleitao commented on a change in pull request #416: Fix out of bounds read in bit chunk iterator

jorgecarleitao commented on a change in pull request #416:
URL: https://github.com/apache/arrow-rs/pull/416#discussion_r646800597



##########
File path: arrow/src/util/bit_chunk_iterator.rs
##########
@@ -137,14 +137,16 @@ impl Iterator for BitChunkIterator<'_> {
         // so when reading as u64 on a big-endian machine, the bytes need to be swapped
         let current = unsafe { std::ptr::read_unaligned(raw_data.add(index)).to_le() };
 
-        let combined = if self.bit_offset == 0 {
+        let bit_offset = self.bit_offset;
+
+        let combined = if bit_offset == 0 {
             current
         } else {
-            let next =
-                unsafe { std::ptr::read_unaligned(raw_data.add(index + 1)).to_le() };
+            let next = unsafe {
+                std::ptr::read_unaligned(raw_data.add(index + 1) as *const u8) as u64

Review comment:
       Since this is not the remainder, don't we potentially need to read more than 8 bits? I.e. doesn't this index contain between 1 and 63 bits that need to be "merged" into `current`?
   
   I get a feeling that this will ignore all bits after the 8th and less than 64. At least this is what I remember from fixing it in arrow2 [here](https://github.com/jorgecarleitao/arrow2/blob/main/src/bitmap/utils/chunk_iterator/mod.rs#L149).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org