You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Utkarsh Agarwal (Jira)" <ji...@apache.org> on 2021/10/12 18:01:00 UTC

[jira] [Updated] (SPARK-36978) InferConstraints rule should create IsNotNull constraints on the nested field instead of the root nested type

     [ https://issues.apache.org/jira/browse/SPARK-36978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Utkarsh Agarwal updated SPARK-36978:
------------------------------------
    Description: 
[InferFiltersFromConstraints|https://github.com/apache/spark/blob/05c0fa573881b49d8ead9a5e16071190e5841e1b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1206] optimization rule generates {{IsNotNull}} constraints corresponding to null intolerant predicates. The {{IsNotNull}} constraints are generated on the attribute inside the corresponding predicate. 
 e.g. A predicate {{a > 0}} on an integer column {{a}} will result in a constraint {{IsNotNull(a)}}. On the other hand a predicate on a nested int column {{structCol.b}} where {{structCol}} is a struct column results in a constraint {{IsNotNull(structCol)}}.

This generation of constraints on the root level nested type is extremely conservative as it could lead to materialization of the the entire struct. The constraint should instead be generated on the nested field being referenced by the predicate. In the above example, the constraint should be {{IsNotNull(structCol.b)}} instead of {{IsNotNull(structCol)}}

 

The new constraints also create opportunities for nested pruning. Currently {{IsNotNull(structCol)}} constraint would preclude pruning of {{structCol}}. However the constraint {{IsNotNull(structCol.b)}} could create opportunities to prune {{structCol}}.

  was:
[InferFiltersFromConstraints|https://github.com/apache/spark/blob/05c0fa573881b49d8ead9a5e16071190e5841e1b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1206] optimization rule generates {{IsNotNull}} constraints corresponding to null intolerant predicates. The {{IsNotNull}} constraints are generated on the attribute inside the corresponding predicate. 
e.g. A predicate {{a > 0}}  on an integer column {{a}} will result in a constraint {{IsNotNull(a)}}. On the other hand a predicate on a nested int column {{structCol.b}} where {{structCol}} is a struct column results in a constraint {{IsNotNull(structCol)}}.

This generation of constraints on the root level nested type is extremely conservative as it could lead to materialization of the the entire struct. The constraint should instead be generated on the nested field being referenced by the predicate. In the above example, the constraint should be {{IsNotNull(structCol.b)}} instead of {{IsNotNull(structCol)}}



> InferConstraints rule should create IsNotNull constraints on the nested field instead of the root nested type 
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-36978
>                 URL: https://issues.apache.org/jira/browse/SPARK-36978
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0, 3.2.0
>            Reporter: Utkarsh Agarwal
>            Priority: Major
>
> [InferFiltersFromConstraints|https://github.com/apache/spark/blob/05c0fa573881b49d8ead9a5e16071190e5841e1b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1206] optimization rule generates {{IsNotNull}} constraints corresponding to null intolerant predicates. The {{IsNotNull}} constraints are generated on the attribute inside the corresponding predicate. 
>  e.g. A predicate {{a > 0}} on an integer column {{a}} will result in a constraint {{IsNotNull(a)}}. On the other hand a predicate on a nested int column {{structCol.b}} where {{structCol}} is a struct column results in a constraint {{IsNotNull(structCol)}}.
> This generation of constraints on the root level nested type is extremely conservative as it could lead to materialization of the the entire struct. The constraint should instead be generated on the nested field being referenced by the predicate. In the above example, the constraint should be {{IsNotNull(structCol.b)}} instead of {{IsNotNull(structCol)}}
>  
> The new constraints also create opportunities for nested pruning. Currently {{IsNotNull(structCol)}} constraint would preclude pruning of {{structCol}}. However the constraint {{IsNotNull(structCol.b)}} could create opportunities to prune {{structCol}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org