You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/25 06:24:45 UTC

[GitHub] [arrow] maqister commented on a change in pull request #11984: PARQUET-2109: Check if Parquet page has too few values

maqister commented on a change in pull request #11984:
URL: https://github.com/apache/arrow/pull/11984#discussion_r775112597



##########
File path: cpp/src/parquet/column_reader.cc
##########
@@ -940,7 +940,7 @@ int64_t TypedColumnReaderImpl<DType>::ReadBatchWithDictionary(
     int64_t* indices_read, const T** dict, int32_t* dict_len) {
   bool has_dict_output = dict != nullptr && dict_len != nullptr;
   // Similar logic as ReadValues to get pages.
-  if (!HasNext()) {
+  if (batch_size == 0 || !HasNext()) {

Review comment:
       this change breaks use-case in my company where we use this API with batch_size = 0 explicitly, just to obtain dictionary alone. we use it for our loading .parquet files into properitary in-memory column store flow.
   
   it is ok from our perspective to change it as we can just add dedicated API to obtain dictionary in our fork of the code.
   
   i am just bringing this up in case there are other devs impacted by the public API change.
   
   https://www.hyrumslaw.com/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org