You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/07/06 15:08:00 UTC
[jira] [Assigned] (SPARK-36027) Push down Filter in case of filter having child as TypedFilter

     [ https://issues.apache.org/jira/browse/SPARK-36027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-36027:
------------------------------------

    Assignee: Apache Spark

> Push down Filter in case of filter having child as TypedFilter
> --------------------------------------------------------------
>
>                 Key: SPARK-36027
>                 URL: https://issues.apache.org/jira/browse/SPARK-36027
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.2
>            Reporter: Saurabh Chawla
>            Assignee: Apache Spark
>            Priority: Major
>
> In case of Filter having child as TypedFilter, Pushdown of Filters does not take place
>  
> {code:java}
> scala> def testUdfFunction(r: String): Boolean = {
>  | r.equals("hello")
>  | }
> testUdfFunction: (r: String)Boolean
> val df= spark.read.parquet("/testDir/testParquetSize/Parquetgzip/")
> df: org.apache.spark.sql.DataFrame = [_1: string, _2: string ... 1 more field]
> {code}
>  
> df.filter(x => testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.executedPlan
>  
> {code:java}
> Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
> +- *(1) Filter $line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3184/1455948476@5ce4af92.apply
>  +- *(1) ColumnarToRow
>  +- FileScan parquet [_1#0,_2#1,_3#2] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/saurabh.chawla/testDir/testParquetSize/Parquetgzip], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_1:string,_2:string,_3:string>
>  
> {code}
> df.filter(x => testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.optimizedPlan
> {code:java}
> Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
> +- TypedFilter $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3191/320569017@37a2806c, interface org.apache.spark.sql.Row, [StructField(_1,StringType,true), StructField(_2,StringType,true), StructField(_3,StringType,true)], createexternalrow(_1#0.toString, _2#1.toString, _3#2.toString, StructField(_1,StringType,true), StructField(_2,StringType,true), StructField(_3,StringType,true))
>  +- Relation[_1#0,_2#1,_3#2] parquet{code}
>  
> There is need to add this change to push down the filter for this scenario
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org