You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/02/16 21:59:09 UTC

[GitHub] [arrow] wjones127 commented on pull request #34112: GH-34138: [C++][Parquet] Fix parsing stats from min_value/max_value

wjones127 commented on PR #34112:
URL: https://github.com/apache/arrow/pull/34112#issuecomment-1433773456

   @westonpace that makes sense.
   
   > When only one of min and max exists, it usually happens when a binary value has an extreme length or a floating value has NaN. In this case, the stats provide little value and make it tricker to use.
   
   It seems like we do have handling for these two cases. See Weston's message for NaN handling and `max_statistics_size` on [WriterProperties](https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterPropertiesE). Based on that, I'd actually prefer we keep the ability to parse just the min or max if only one is available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org