You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/06 17:26:09 UTC

[GitHub] [arrow] tachyonwill commented on a change in pull request #11984: PARQUET-2109: [C++] Check if Parquet page has too few values

tachyonwill commented on a change in pull request #11984:
URL: https://github.com/apache/arrow/pull/11984#discussion_r779717505



##########
File path: cpp/src/parquet/column_reader.cc
##########
@@ -970,6 +970,9 @@ int64_t TypedColumnReaderImpl<DType>::ReadBatchWithDictionary(
   // Read dictionary indices.
   *indices_read = ReadDictionaryIndices(indices_to_read, indices);
   int64_t total_indices = std::max(num_def_levels, *indices_read);
+  if (total_indices == 0 && batch_size != 0) {
+    ParquetException::EofException("Read 0 values");

Review comment:
       The PR doesn't change the behavior on length 0 pages(assuming the page is correctly formed). At the start of the ReadBatch* methods, HasNext() is called and we gracefully bail out if it returns false. Size 0 pages will cause HasNext() to return false, hence we stop. Is this the right thing to do? I don't know. It can cause weird behavior and looking at some parquet-mr JIRAs, size 0 pages might not be entirely legal.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org