You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/02/04 09:22:00 UTC

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

    [ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684133#comment-17684133 ] 

ASF GitHub Bot commented on PARQUET-2237:
-----------------------------------------

yabola opened a new pull request, #1023:
URL: https://github.com/apache/parquet-mr/pull/1023

   Bloomfilter needs to load from filesystem, it may costs time and space. If we can  exactly determine the existence/nonexistence of the value from other filters , then we can avoid using Bloomfilter to Improve performance.
    
   When the minMax values in  StatisticsFilter is same, we can exactly determine the existence/nonexistence of the value.
   When we have page dictionaries, we can also determine the existence/nonexistence of the value.




> Improve performance when filters in RowGroupFilter can match exactly
> --------------------------------------------------------------------
>
>                 Key: PARQUET-2237
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2237
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Mars
>            Priority: Major
>
> Bloomfilter needs to load from filesystem, it may costs time and space. If we can  exactly determine the existence/nonexistence of the value from other filters , then we can avoid using Bloomfilter to Improve performance.
>  
> When the minMax values in  StatisticsFilter is same, we can exactly determine the existence/nonexistence of the value.
> When we have page dictionaries, we can also determine the existence/nonexistence of the value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)