You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/04/27 14:36:20 UTC

[GitHub] [iceberg] RussellSpitzer commented on issue #2527: Spark Dynamic Partition Pruning

RussellSpitzer commented on issue #2527:
URL: https://github.com/apache/iceberg/issues/2527#issuecomment-827656513


   I don't think that it's incompatible with Iceberg from the source
   
   If i'm reading this correctly:
   
   https://github.com/apache/spark/blob/19c7d2f3d8cda8d9bc5dfc1a0bf5d46845b1bc2f/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L130-L132
   
   There are basically two ways of estimating the filtering effectiveness
      * a stats based one which you are correct we would not trigger as we don't keep "distinct count" stats
      * a fallback method which just uses a user defined constant
        *. conf.dynamicPartitionPruningFallbackFilterRatio
        
   In either case it then multiples the effectiveness against the size the plan reports (which we do report)
   
   https://github.com/apache/spark/blob/19c7d2f3d8cda8d9bc5dfc1a0bf5d46845b1bc2f/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L154
   
     
      
      


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org