You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Zoltán Borók-Nagy (JIRA)" <ji...@apache.org> on 2018/02/20 16:10:00 UTC
[jira] [Created] (IMPALA-6542) Fix inconsistent write path of
Parquet min/max statistics
Zoltán Borók-Nagy created IMPALA-6542:
-----------------------------------------
Summary: Fix inconsistent write path of Parquet min/max statistics
Key: IMPALA-6542
URL: https://issues.apache.org/jira/browse/IMPALA-6542
Project: IMPALA
Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy
If the first value of a column chunk is NaN, then mix_value = max_value = NaN.
If the first value of a column chunk is not NaN, i.e. it is an ordinary number or +/-infinity, then in the end min_value != NaN and max_value != NaN.
Until the Parquet community doesn't agree on the ordering of floating point numbers, we can make our write path consistent.
A quick fix is to ignore NaNs when calculating min/max statistics, except for the case when all the values are NaN.
This way we can use min/max statistics and still the results remain correct, because only binary predicates that contain constants are tested against min/max statistics. In other words, if we want to get NaNs back by a predicate (e.g. 'NOT x < 3', 'x != x'), min/max statistics won't be used, ie. we will get the NaNs as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)