You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/03 01:54:38 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #1995: Simplify ColumnReader::read_batch

tustvold opened a new pull request, #1995:
URL: https://github.com/apache/arrow-rs/pull/1995

   # Which issue does this PR close?
   
   Miscellaneous cleanup whilst working on #1792
   
   # Rationale for this change
    
   The existing code had redundant result returns, duplicated logic, and confusing semantics.
   
   # What changes are included in this PR?
   
   Changes the semantics of ColumnReader::read_batch to error if called on data without the necessary levels buffers, and to return the number of levels read even for columns that have max_def_level == 0 && max_rep_level == 0. This is technically a breaking change, although I think it makes the API a lot less confusing.
   
   Cleanups some additional code
   
   # Are there any user-facing changes?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold merged pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold merged PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r912419463


##########
parquet/src/util/test_common/page_util.rs:
##########
@@ -100,11 +100,7 @@ impl DataPageBuilder for DataPageBuilderImpl {
     }
 
     fn add_def_levels(&mut self, max_levels: i16, def_levels: &[i16]) {
-        assert!(
-            self.num_values == def_levels.len() as u32,
-            "Must call `add_rep_levels() first!`"
-        );
-
+        self.num_values = def_levels.len() as u32;

Review Comment:
   This logic was ill-formed, as it prevented having def levels without rep levels. Removing this sanity check is harmless, especially as this is just used for testing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

alamb commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r913709039


##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -218,32 +218,32 @@ where
     /// The implementation has side effects. It will create a new buffer to hold those
     /// definition level values that have already been read into memory but not counted
     /// as record values, e.g. those from `self.num_values` to `self.values_written`.
-    pub fn consume_def_levels(&mut self) -> Result<Option<Buffer>> {
-        Ok(match self.def_levels.as_mut() {
+    pub fn consume_def_levels(&mut self) -> Option<Buffer> {

Review Comment:
   The changes in this file are cleanups to make the signature  infallible when it always returns `Ok`, right? (and it is also an API change, and perhaps also fixes clippy)



##########
parquet/src/column/reader.rs:
##########
@@ -1036,7 +1005,7 @@ mod tests {
         } else {
             0
         };
-        let max_rep_level = if def_levels.is_some() {

Review Comment:
   👍 



##########
parquet/src/column/reader.rs:
##########
@@ -163,26 +163,18 @@ where
         }
     }
 
-    /// Reads a batch of values of at most `batch_size`.
+    /// Reads a batch of values of at most `batch_size`, returning a tuple containing the

Review Comment:
   To be honest, I don't fully understand the subtle difference between `corresponding number of levels, i.e, the total number of values including nulls, empty lists, etc..` and `the actual number of levels read.`  but it seems like a good change to me and I would defer to you for what the most appropriate semantics should be



##########
parquet/src/util/test_common/page_util.rs:
##########
@@ -100,11 +100,7 @@ impl DataPageBuilder for DataPageBuilderImpl {
     }
 
     fn add_def_levels(&mut self, max_levels: i16, def_levels: &[i16]) {
-        assert!(
-            self.num_values == def_levels.len() as u32,
-            "Must call `add_rep_levels() first!`"
-        );
-
+        self.num_values = def_levels.len() as u32;

Review Comment:
   Double checked `DataPageBuilder` is used for testing:
   
    https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/apache/arrow-rs%24+DataPageBuilderImpl&patternType=literal
   
   https://docs.rs/arrow/17.0.0/arrow/?search=DataPageBuilder
   
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r912420266


##########
parquet/src/column/reader.rs:
##########
@@ -205,94 +197,74 @@ where
 
         // Read exhaustively all pages until we read all batch_size values/levels
         // or there are no more values/levels to read.
-        while max(values_read, levels_read) < batch_size {
+        while levels_read < batch_size {
             if !self.has_next()? {
                 break;
             }
 
             // Batch size for the current iteration
-            let iter_batch_size = {
-                // Compute approximate value based on values decoded so far
-                let mut adjusted_size = min(
-                    batch_size,
-                    (self.num_buffered_values - self.num_decoded_values) as usize,
-                );
-
-                // Adjust batch size by taking into account how much data there
-                // to read. As batch_size is also smaller than value and level
-                // slices (if available), this ensures that available space is not
-                // exceeded.
-                adjusted_size = min(adjusted_size, batch_size - values_read);
-                adjusted_size = min(adjusted_size, batch_size - levels_read);
-
-                adjusted_size
-            };
+            let iter_batch_size = (batch_size - levels_read)
+                .min((self.num_buffered_values - self.num_decoded_values) as usize);
 
             // If the field is required and non-repeated, there are no definition levels
-            let (num_def_levels, null_count) = match def_levels.as_mut() {
-                Some(levels) if self.descr.max_def_level() > 0 => {
+            let null_count = match self.descr.max_def_level() > 0 {
+                true => {
+                    let levels = def_levels
+                        .as_mut()
+                        .ok_or_else(|| general_err!("must specify definition levels"))?;
+
                     let num_def_levels = self
                         .def_level_decoder
                         .as_mut()
                         .expect("def_level_decoder be set")
-                        .read(*levels, levels_read..levels_read + iter_batch_size)?;
+                        .read(levels, levels_read..levels_read + iter_batch_size)?;
+
+                    if num_def_levels != iter_batch_size {
+                        return Err(general_err!("insufficient definition levels read from column - expected {}, got {}", iter_batch_size, num_def_levels));
+                    }
 
-                    let null_count = levels.count_nulls(
+                    levels.count_nulls(
                         levels_read..levels_read + num_def_levels,
                         self.descr.max_def_level(),
-                    );
-                    (num_def_levels, null_count)
+                    )
                 }
-                _ => (0, 0),
+                false => 0,
             };
 
-            let num_rep_levels = match rep_levels.as_mut() {
-                Some(levels) if self.descr.max_rep_level() > 0 => self
+            if self.descr.max_rep_level() > 0 {
+                let levels = rep_levels
+                    .as_mut()
+                    .ok_or_else(|| general_err!("must specify repetition levels"))?;
+
+                let rep_levels = self
                     .rep_level_decoder
                     .as_mut()
                     .expect("rep_level_decoder be set")
-                    .read(levels, levels_read..levels_read + iter_batch_size)?,
-                _ => 0,
-            };
+                    .read(levels, levels_read..levels_read + iter_batch_size)?;
 
-            // At this point we have read values, definition and repetition levels.
-            // If both definition and repetition levels are defined, their counts
-            // should be equal. Values count is always less or equal to definition levels.
-            if num_def_levels != 0
-                && num_rep_levels != 0

Review Comment:
   This is the source of #1997 - the value of 0 is used as a sentinel for no levels present, which prevents detecting the case of no actual values left



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r913749600


##########
parquet/src/column/reader.rs:
##########
@@ -163,26 +163,18 @@ where
         }
     }
 
-    /// Reads a batch of values of at most `batch_size`.
+    /// Reads a batch of values of at most `batch_size`, returning a tuple containing the

Review Comment:
   The difference arises because if a column is repeated / nullable, the corresponding level data is omitted. The result is that whilst there is still the same number of levels, it technically won't read any level data



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

codecov-commenter commented on PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#issuecomment-1172997926

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1995?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1995](https://codecov.io/gh/apache/arrow-rs/pull/1995?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (12c044f) into [master](https://codecov.io/gh/apache/arrow-rs/commit/c47d14ce9bdd91d68661e67a843d60e2c5799061?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c47d14c) will **increase** coverage by `0.04%`.
   > The diff coverage is `92.30%`.
   
   > :exclamation: Current head 12c044f differs from pull request most recent head 8b57c0c. Consider uploading reports for the commit 8b57c0c to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1995      +/-   ##
   ==========================================
   + Coverage   83.54%   83.58%   +0.04%     
   ==========================================
     Files         221      222       +1     
     Lines       57395    57467      +72     
   ==========================================
   + Hits        47950    48036      +86     
   + Misses       9445     9431      -14     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1995?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [parquet/src/arrow/arrow\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyb3dfcmVhZGVyLnJz) | `96.87% <ø> (ø)` | |
   | [parquet/src/arrow/async\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXN5bmNfcmVhZGVyLnJz) | `0.00% <ø> (ø)` | |
   | [parquet/src/column/reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvY29sdW1uL3JlYWRlci5ycw==) | `68.31% <81.81%> (-0.30%)` | :arrow_down: |
   | [parquet/src/arrow/array\_reader/byte\_array.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyYXlfcmVhZGVyL2J5dGVfYXJyYXkucnM=) | `85.71% <100.00%> (ø)` | |
   | [...et/src/arrow/array\_reader/byte\_array\_dictionary.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyYXlfcmVhZGVyL2J5dGVfYXJyYXlfZGljdGlvbmFyeS5ycw==) | `83.91% <100.00%> (ø)` | |
   | [parquet/src/arrow/array\_reader/null\_array.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyYXlfcmVhZGVyL251bGxfYXJyYXkucnM=) | `88.00% <100.00%> (ø)` | |
   | [parquet/src/arrow/array\_reader/primitive\_array.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyYXlfcmVhZGVyL3ByaW1pdGl2ZV9hcnJheS5ycw==) | `89.65% <100.00%> (ø)` | |
   | [parquet/src/arrow/record\_reader/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvcmVjb3JkX3JlYWRlci9tb2QucnM=) | `94.73% <100.00%> (+1.16%)` | :arrow_up: |
   | [parquet/src/util/test\_common/page\_util.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvdXRpbC90ZXN0X2NvbW1vbi9wYWdlX3V0aWwucnM=) | `88.54% <100.00%> (-0.12%)` | :arrow_down: |
   | [...row/src/array/builder/string\_dictionary\_builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIvc3RyaW5nX2RpY3Rpb25hcnlfYnVpbGRlci5ycw==) | `91.36% <0.00%> (-0.37%)` | :arrow_down: |
   | ... and [23 more](https://codecov.io/gh/apache/arrow-rs/pull/1995/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1995?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1995?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c47d14c...8b57c0c](https://codecov.io/gh/apache/arrow-rs/pull/1995?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r912420202


##########
parquet/src/column/reader.rs:
##########
@@ -205,94 +197,74 @@ where
 
         // Read exhaustively all pages until we read all batch_size values/levels
         // or there are no more values/levels to read.
-        while max(values_read, levels_read) < batch_size {
+        while levels_read < batch_size {
             if !self.has_next()? {
                 break;
             }
 
             // Batch size for the current iteration
-            let iter_batch_size = {
-                // Compute approximate value based on values decoded so far
-                let mut adjusted_size = min(
-                    batch_size,
-                    (self.num_buffered_values - self.num_decoded_values) as usize,
-                );
-
-                // Adjust batch size by taking into account how much data there
-                // to read. As batch_size is also smaller than value and level
-                // slices (if available), this ensures that available space is not
-                // exceeded.
-                adjusted_size = min(adjusted_size, batch_size - values_read);
-                adjusted_size = min(adjusted_size, batch_size - levels_read);
-
-                adjusted_size
-            };
+            let iter_batch_size = (batch_size - levels_read)
+                .min((self.num_buffered_values - self.num_decoded_values) as usize);
 
             // If the field is required and non-repeated, there are no definition levels
-            let (num_def_levels, null_count) = match def_levels.as_mut() {
-                Some(levels) if self.descr.max_def_level() > 0 => {
+            let null_count = match self.descr.max_def_level() > 0 {
+                true => {
+                    let levels = def_levels
+                        .as_mut()
+                        .ok_or_else(|| general_err!("must specify definition levels"))?;
+
                     let num_def_levels = self
                         .def_level_decoder
                         .as_mut()
                         .expect("def_level_decoder be set")
-                        .read(*levels, levels_read..levels_read + iter_batch_size)?;
+                        .read(levels, levels_read..levels_read + iter_batch_size)?;
+
+                    if num_def_levels != iter_batch_size {
+                        return Err(general_err!("insufficient definition levels read from column - expected {}, got {}", iter_batch_size, num_def_levels));

Review Comment:
   This, and similar checks below, will prevent #1997



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r912419423


##########
parquet/src/column/reader.rs:
##########
@@ -1300,44 +1272,10 @@ mod tests {
                 );
             }
 
-            if def_levels.is_none() && rep_levels.is_none() {

Review Comment:
   Here we see the breaking change in behaviour



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r912419192


##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -365,42 +361,15 @@ mod tests {
     use arrow::buffer::Buffer;
 
     use crate::basic::Encoding;
-    use crate::column::page::Page;
-    use crate::column::page::PageReader;
     use crate::data_type::Int32Type;
-    use crate::errors::Result;
     use crate::schema::parser::parse_message_type;
     use crate::schema::types::SchemaDescriptor;
-    use crate::util::test_common::page_util::{DataPageBuilder, DataPageBuilderImpl};
+    use crate::util::test_common::page_util::{
+        DataPageBuilder, DataPageBuilderImpl, InMemoryPageReader,
+    };
 
     use super::RecordReader;
 
-    struct TestPageReader {

Review Comment:
   This just duplicated InMemoryPageReader



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r912419382


##########
parquet/src/column/reader.rs:
##########
@@ -163,26 +163,18 @@ where
         }
     }
 
-    /// Reads a batch of values of at most `batch_size`.
+    /// Reads a batch of values of at most `batch_size`, returning a tuple containing the

Review Comment:
   This is the breaking change, we could make this non-breaking by always returning 0 if max_def_level == 0 && max_rep_level == 0, but I personally think this is a little bit confusing. This make the behaviour consistent, instead of potentially varying based on the column definition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1995: Simplify ColumnReader::read_batch

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #1995:
URL: https://github.com/apache/arrow-rs/pull/1995#discussion_r912419399


##########
parquet/src/column/reader.rs:
##########
@@ -1300,44 +1272,10 @@ mod tests {
                 );
             }
 
-            if def_levels.is_none() && rep_levels.is_none() {
-                assert!(
-                    curr_levels_read == 0,
-                    "expected to read 0 levels, found {}",
-                    curr_levels_read
-                );
-            } else if def_levels.is_some() && max_def_level > 0 {
-                assert!(
-                    curr_levels_read >= curr_values_read,
-                    "expected levels read to be greater than values read"
-                );
-            }
-        }
-    }
-
-    struct TestPageReader {

Review Comment:
   Another duplicate of InMemoryPageReader



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org