You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/06 10:29:52 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #7108: PARQUET-1857: [C++] Do not fail to read unencrypted files with over 32767 row groups. Change some DCHECKs causing segfaults to throw exceptions

pitrou commented on a change in pull request #7108:
URL: https://github.com/apache/arrow/pull/7108#discussion_r420688407



##########
File path: cpp/src/parquet/arrow/reader_internal.h
##########
@@ -76,7 +76,8 @@ class FileColumnIterator {
       return nullptr;
     }
 
-    auto row_group_reader = reader_->RowGroup(row_groups_.front());
+    int row_group_index = row_groups_.front();
+    auto row_group_reader = reader_->RowGroup(row_group_index);

Review comment:
       Does this change make any difference? I was expecting a type issue, but it seems like `row_groups_.front()` should be an `int` anyway.

##########
File path: python/pyarrow/tests/test_parquet.py
##########
@@ -284,6 +284,12 @@ def test_special_chars_filename(tempdir, use_legacy_dataset):
     assert table_read.equals(table)
 
 
+@pytest.mark.slow
+def test_file_with_over_int16_max_row_groups():

Review comment:
       Is this test supposed to succeed? Or is the 16-bit limitation only for encrypted files?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org