You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "caican (Jira)" <ji...@apache.org> on 2022/08/11 06:19:00 UTC

[jira] [Created] (SPARK-40045) The order of filtering predicates is not reasonable

caican created SPARK-40045:
------------------------------

             Summary: The order of filtering predicates is not reasonable
                 Key: SPARK-40045
                 URL: https://issues.apache.org/jira/browse/SPARK-40045
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.0, 3.2.0, 3.1.2
            Reporter: caican


{code:java}
select id, data FROM testcat.ns1.ns2.table
where id =2
and md5(data) = '8cde774d6f7333752ed72cacddb05126'
and trim(data) = 'a' {code}
Based on the SQL, we currently get the filters in the following order:

 
{code:java}
// code placeholder{code}
 

In this predicate order, all data needs to participate in the evaluation, even if some data does not meet the later filtering criteria and it may causes spark tasks to execute slowly.

 

So i think that filtering predicates that need to be evaluated should automatically be placed to the far right to avoid data that does not meet the criteria being evaluated.

 

As shown below:
{noformat}
 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org