You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/01 18:36:13 UTC

[GitHub] [iceberg] cccs-jc commented on issue #2527: Spark Dynamic Partition Pruning

cccs-jc commented on issue #2527:
URL: https://github.com/apache/iceberg/issues/2527#issuecomment-830675833


   I created a mock fact table and a mock dimension table using a traditional Hive catalog. I was able to activate the dynamic partition pruning optimization. It's quite easy to identify in the spark UI. The query runs very fast then dpp is used.
   
   I then used the same mock data generator functions to create tables using iceberg. I partition the fact table in the same was as with traditional Hive. I run the exact same join however spark uses a sort-merge-join instead of the dynamic partition pruning optimization. It does not even use a Broadcast Join which surprised me.
   
   I can reproduce the issue quite easily. What information would be useful to put in this issue?
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org