You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/22 16:54:42 UTC

[GitHub] [spark] Swinky commented on pull request #34062: [SPARK-36819][SQL] Don't insert redundant filters in case static partition pruning can be done

Swinky commented on pull request #34062:
URL: https://github.com/apache/spark/pull/34062#issuecomment-925108409


   > do you mean `InferFiltersFromConstraints` can generate static partition predicates and we don't need to trigger DPP in that case?
   
   @cloud-fan correct, examples below:
   
   dimTable `d` has columns (d1, d2...)
   factTable `f` has columns (f1, f2, f3...) partitioned on f1, f2.
   
   Example 1:
   ```
   
   			join(d1=f1)
   		       /	  \
   	    Filter(d1=100)   FactTable(f)
   	  	    |				
   	        dimTable(d)
   ```
   
   	 PartitionFilters for FactTable: [f1=100, f1 in dpp-subquery] // "f1=100" here is inferred in `InferFiltersFromConstraints`
   	 After Proposed change: [f1=100]
   
   
   Example 2:
   ```
   	 	join(d1=f1, d2=f2)
   		/		\
   	Filter(d1=100)	  FactTable(f)
   	  	|				
   	 dimTable(d)
   ```
   
   	 PartitionFilters for FactTable now: [f1=100, f1 in (d1 values from dpp-subquery1), f2 in (d2 values from dpp-subquery1)] // "f1=100" here is inferred in `InferFiltersFromConstraints`
   	 After Proposed change: [f1=100, f2 in (d2 values from dpp-subquery1)]
   
   
   Example 3:
   ```
   	 			join(d1=f1, d2=f2)
   				/		\
   	        Filter(d1=100 || d3=200)	 FactTable(f)
   	  	            |				
   	               dimTable(d)
   ```
   
   	 PartitionFilters for FactTable now: [f1 in (d1 values from dpp-subquery1), f2 in (d2 values from dpp-subquery1)]
   	 After Proposed change: No change in this case as the filter references in the filter are not a subset of d1 nor it is subset of d3.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org