You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by uw...@apache.org on 2018/04/18 11:04:32 UTC

[parquet-cpp] branch master updated: PARQUET-1272: Return correct row count for nested columns in ScanFileContents

This is an automated email from the ASF dual-hosted git repository.

uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-cpp.git


The following commit(s) were added to refs/heads/master by this push:
     new aa7a5e5  PARQUET-1272: Return correct row count for nested columns in ScanFileContents
aa7a5e5 is described below

commit aa7a5e5f34f2eada56e5d2ae896d85fe2a139747
Author: Korn, Uwe <Uw...@blue-yonder.com>
AuthorDate: Wed Apr 18 13:04:26 2018 +0200

    PARQUET-1272: Return correct row count for nested columns in ScanFileContents
    
    Stumbled over this while adding lists to the `alltypes_sample` in `test_parquet.py` in Arrow.
    
    Author: Korn, Uwe <Uw...@blue-yonder.com>
    
    Closes #457 from xhochy/PARQUET-1272 and squashes the following commits:
    
    45efe1c [Korn, Uwe] PARQUET-1272: Return correct row count for nested columns in ScanFileContents
---
 src/parquet/file_reader.cc | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/parquet/file_reader.cc b/src/parquet/file_reader.cc
index 983d2d0..0632872 100644
--- a/src/parquet/file_reader.cc
+++ b/src/parquet/file_reader.cc
@@ -347,9 +347,18 @@ int64_t ScanFileContents(std::vector<int> columns, const int32_t column_batch_si
 
       int64_t values_read = 0;
       while (col_reader->HasNext()) {
-        total_rows[col] +=
+        int64_t levels_read =
             ScanAllValues(column_batch_size, def_levels.data(), rep_levels.data(),
                           values.data(), &values_read, col_reader.get());
+        if (col_reader->descr()->max_repetition_level() > 0) {
+          for (int64_t i = 0; i < levels_read; i++) {
+            if (rep_levels[i] == 0) {
+              total_rows[col]++;
+            }
+          }
+        } else {
+          total_rows[col] += levels_read;
+        }
       }
       col++;
     }

-- 
To stop receiving notification emails like this one, please contact
uwe@apache.org.