You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/24 13:25:00 UTC

[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

    [ https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449841#comment-16449841 ] 

ASF GitHub Bot commented on PARQUET-1246:
-----------------------------------------

gszadovszky opened a new pull request #468: PARQUET-1246: Ignore float/double statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/468
 
 
   Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics:
   - Ignoring statistics in case of it contains a NaN value.
   - Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics.
   
   Author: Gabor Szadovszky <ga...@cloudera.com>
   
   Closes #461 from gszadovszky/PARQUET-1246 and squashes the following commits:
   
   20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments
   3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN
   
   This change is based on 0a86429939075984edce5e3b8195dfb7f9e3ab6b but is not a clean cherry-pick.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Ignore float/double statistics in case of NaN
> ---------------------------------------------
>
>                 Key: PARQUET-1246
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1246
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.8.1
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>             Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, therefore NaN values can cause skipping valid values when filtering. See PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)