You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/18 11:32:34 UTC

[GitHub] [arrow-rs] alamb commented on issue #1191: Parquet Scan Filter

alamb commented on issue #1191:
URL: https://github.com/apache/arrow-rs/issues/1191#issuecomment-1015324441


   > This would allow IOx, or potentially DataFusion depending on where the logic for this eventually sits, to do the following for pushing down predicates, in addition to the current row group filtering:
   
   I think it would be best to implement in DataFusion if at all possible -- this logic and need is not at all specific to IOx and so the community would benefit (and also likely help maintain this) if it were in IOx
   
   One thing to consider  for the strategy described is that it may actually *slow down* parquet scanning for non selective predicates (e.g. predicates that filter out none of the rows). 
   
   Another thing is that the order in which the predicates are evaluated may make a difference (e.g. if there is a predicate that filters out all but a few rows, and a predicate that doesn't filter any, applying the predicate that filters out most of the rows first is likely to be faster
   
   I think the challenge of non-selective predicates and order should not be handled by the actual parquet reader (that is the query engine should specify what predicates and in what order to apply them). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org