You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Chai Kelun <ch...@hotmail.com> on 2023/05/08 08:24:58 UTC

Question about RexNodeExtractor formatting UDF names

Hi Flink Team:

I have a question about RexNodeExtractor in Flink 1.16.0. I am trying to push down UDFs (with function names in the format ST_XXX, including underscores, e.g. ST_Contains) into TableSourceScan, and I have implemented applyFilters and handling of pushdown functions based on SupportFilterPushdown API. However, line 529 of RexNodeExtractor removes the underscore from the function name, which prevents the operator from being pushed down.
I tried hacking this logic and comparing the query plan:
[Before modify RexNodeExtractor#replace()]

== Abstract Syntax Tree ==
LogicalProject(id=[$0], name=[$1])
+- LogicalFilter(condition=[ST_Contains($2, ST_MakePoint(6, 2))])
   +- LogicalTableScan(table=[[default_catalog, default_database, extTable]])

== Optimized Physical Plan ==
Calc(select=[id, name], where=[ST_Contains(fence, ST_MakePoint(6, 2))])
+- TableSourceScan(table=[[default_catalog, default_database, extTable]], fields=[id, name, fence])

== Optimized Execution Plan ==
Calc(select=[id, name], where=[ST_Contains(fence, ST_MakePoint(6, 2))])
+- TableSourceScan(table=[[default_catalog, default_database, extTable]], fields=[id, name, fence])


[After modify RexNodeExtractor#replace()]
== Abstract Syntax Tree ==
LogicalProject(id=[$0], name=[$1])
+- LogicalFilter(condition=[ST_Contains($2, ST_MakePoint(6, 2))])
   +- LogicalTableScan(table=[[default_catalog, default_database, extTable]])

== Optimized Physical Plan ==
Calc(select=[id, name])
+- TableSourceScan(table=[[default_catalog, default_database, extTable, filter=[ST_Contains(fence, ST_MakePoint(6, 2))]]], fields=[id, name, fence])

== Optimized Execution Plan ==
Calc(select=[id, name])
+- TableSourceScan(table=[[default_catalog, default_database, extTable, filter=[ST_Contains(fence, ST_MakePoint(6, 2))]]], fields=[id, name, fence])

We can see that the operator cannot be pushed down because the function names are formatted. However, according to the Spatial Query standard, the geospatial function names are in the format of ST_xxx. I would like to ask what is the idea behind this design ?

Best regards,
Kelun