You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by uw...@apache.org on 2018/04/18 11:04:32 UTC
[parquet-cpp] branch master updated: PARQUET-1272: Return correct
row count for nested columns in ScanFileContents
This is an automated email from the ASF dual-hosted git repository.
uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-cpp.git
The following commit(s) were added to refs/heads/master by this push:
new aa7a5e5 PARQUET-1272: Return correct row count for nested columns in ScanFileContents
aa7a5e5 is described below
commit aa7a5e5f34f2eada56e5d2ae896d85fe2a139747
Author: Korn, Uwe <Uw...@blue-yonder.com>
AuthorDate: Wed Apr 18 13:04:26 2018 +0200
PARQUET-1272: Return correct row count for nested columns in ScanFileContents
Stumbled over this while adding lists to the `alltypes_sample` in `test_parquet.py` in Arrow.
Author: Korn, Uwe <Uw...@blue-yonder.com>
Closes #457 from xhochy/PARQUET-1272 and squashes the following commits:
45efe1c [Korn, Uwe] PARQUET-1272: Return correct row count for nested columns in ScanFileContents
---
src/parquet/file_reader.cc | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/parquet/file_reader.cc b/src/parquet/file_reader.cc
index 983d2d0..0632872 100644
--- a/src/parquet/file_reader.cc
+++ b/src/parquet/file_reader.cc
@@ -347,9 +347,18 @@ int64_t ScanFileContents(std::vector<int> columns, const int32_t column_batch_si
int64_t values_read = 0;
while (col_reader->HasNext()) {
- total_rows[col] +=
+ int64_t levels_read =
ScanAllValues(column_batch_size, def_levels.data(), rep_levels.data(),
values.data(), &values_read, col_reader.get());
+ if (col_reader->descr()->max_repetition_level() > 0) {
+ for (int64_t i = 0; i < levels_read; i++) {
+ if (rep_levels[i] == 0) {
+ total_rows[col]++;
+ }
+ }
+ } else {
+ total_rows[col] += levels_read;
+ }
}
col++;
}
--
To stop receiving notification emails like this one, please contact
uwe@apache.org.