You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/10 08:44:53 UTC

[GitHub] [iceberg] SusurHe opened a new issue, #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

SusurHe opened a new issue, #5487:
URL: https://github.com/apache/iceberg/issues/5487

   ### Apache Iceberg version
   
   0.14.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   when i test Merge Into table with spark-3.2 and spark-3.3, i find Dynamic partition pruning is not enabled; 
   For example, when i set `spark.sql.adaptive.coalescePartitions.initialPartitionNum=1024`, it well generate 1024 small files after executing `MERGE INTO`;
   
   ![image](https://user-images.githubusercontent.com/51081799/183857149-250bcafd-58ea-4eda-9272-a4f5f6e0c3ef.png)
   ![image](https://user-images.githubusercontent.com/51081799/183854824-4f577598-66eb-43a7-8465-3025ad4b9f75.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

Posted by GitBox <gi...@apache.org>.

kbendick commented on issue #5487:
URL: https://github.com/apache/iceberg/issues/5487#issuecomment-1212476159

   Can you please provide a sample `MERGE INTO` query @SusurHe?
   
   When I think of dynamic partition pruning, I think of the ability to filter out specific partitions from the input table during a query based on some kind of `WHERE` clause, not necessarily the results of adaptive query execution.
   
   The dynamic partition pruning I'm thinking of is controlled by the `SupportsRuntimeFiltering` interface, which was added in Iceberg 0.13.0.
   
   If you could provide the `MERGE INTO` query, the results of `EXPLAIN` or the full explain output from the SQL UI, as well as generally the number of partition values that were touched (or at least the create table DDL of the table being merged into), that would help a lot to debug why you're not getting fewer partition files.
   
   However, from what's being provided, it's hard to tell if dynamic partition pruning, i.e. if partitions are pruned from the input before the join / processing, is taking place. But I get the impression that's not what you mean and you're more interested in adaptive query execution.
   
   In either case, you might need to set `spark.sql.adaptive.coalescePartitions.parallelismFirst` to `false`.
   
   From the docs for `spark.sql.adaptive.coalescePartitions.parallelismFirst` found here: https://spark.apache.org/docs/latest/sql-performance-tuning.html
   
   `When true, Spark ignores the target size specified by spark.sql.adaptive.advisoryPartitionSizeInBytes (default 64MB) when coalescing contiguous shuffle partitions, and only respect the minimum partition size specified by spark.sql.adaptive.coalescePartitions.minPartitionSize (default 1MB), to maximize the parallelism. This is to avoid performance regression when enabling adaptive query execution. It's recommended to set this config to false and respect the target size specified by spark.sql.adaptive.advisoryPartitionSizeInBytes.`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] closed issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled
URL: https://github.com/apache/iceberg/issues/5487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SusurHe commented on issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

Posted by GitBox <gi...@apache.org>.

SusurHe commented on issue #5487:
URL: https://github.com/apache/iceberg/issues/5487#issuecomment-1214560889

   @kbendick  Just a basic SQL statement,  like:
   ``` sql
   MERGE INTO t1 a 
   USING (select * from t2) b
   ON a.id = b.id 
   WHEN     MATCHED THEN UPDATE SET *  
   WHEN NOT MATCHED THEN INSERT * ;
   ```
   I think it should be because this code in `extendeddistributionandorderingutils` leads to get `initialpartitionnum` for partitioning; So, we should crop `finalnumpartitions` here;
   
   ``` scala
   val finalNumPartitions = if (numPartitions > 0) {
         numPartitions
       } else {
   // This line of code may lead to the direct acquisition of initialpartitionnum, resulting in too many partitions
         conf.numShufflePartitions
       }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] commented on issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on issue #5487:
URL: https://github.com/apache/iceberg/issues/5487#issuecomment-1454289366

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] commented on issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on issue #5487:
URL: https://github.com/apache/iceberg/issues/5487#issuecomment-1426904736

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org