You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "simonvandel (via GitHub)" <gi...@apache.org> on 2023/06/15 20:01:28 UTC

[GitHub] [arrow-datafusion] simonvandel commented on issue #6672: Optimization: Avoid sort for already sorted Parquet files that do not overlap values on condition

simonvandel commented on issue #6672:
URL: https://github.com/apache/arrow-datafusion/issues/6672#issuecomment-1593655087

   > I think this is likely the solution that would be the fastest for querying because then time predicates could be used to prune out entire row groups and you would have lower file opening overhead
   
   Thanks, I'll try this.
   
   
   
   > I am marking this as a question as I am not sure it is really a bug -- though please let me know if you disagree
   
   My bad, it was a question. 
   
   Although one could argue it is also a feature request for an inbuilt optimization that removes sorts if it can detect non-overlaps using either hints or directly looking at min/max statistics on inputs.
   Do you think that is reasonable, or is it too specific for just my use case? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org