You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/13 02:56:48 UTC
[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue, #2725: Filter push down need consider alias columns
Ted-Jiang opened a new issue, #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725
**Describe the bug**
Filter push down not consider alias columns
**To Reproduce**
Logical plan
```
Received plan for execution: Limit: 50000
Projection: #LINEORDER.LO_SHIPMODE, #TEST1
Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #testBITMAPCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_) AS TEST1
Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testBITMAPCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_)]]
Projection: #LO_SHIPMODE, #dummy_LINEORDER_LO_SUPPKEY, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Filter: #LO_SHIPMODE IN ([CAST(Int32(1) AS Utf8), CAST(Int32(2) AS Utf8), CAST(Int32(3) AS Utf8), CAST(Int32(4) AS Utf8), CAST(Int32(5) AS Utf8)])
Projection: #LO_ORDERKEY, #LO_LINENUMBER, #LO_CUSTKEY, #LO_PARTKEY, #dummy_LINEORDER_LO_SUPPKEY, #LO_ORDERDATE, #LO_ORDERPRIORITY, #LO_SHIPPRIORITY, #LO_QUANTITY, #LO_EXTENDEDPRICE, #LO_ORDTOTALPRICE, #LO_DISCOUNT, #LO_TAX, #LO_COMMITDATE, #LO_SHIPMODE, #_KY_COUNT__, #_KY_COUNT_DISTINCT_LINEORDER_LO_ORDERKEY_, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_, #_KY_PERCENTILE_APPROX_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_TOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_BITOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_SUM_TOPN_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_SUM_BITOPN_LINEORDER_LO_EXTENDEDPRICE_, #dummy_LINEORDER__KY_SUM_LINEORDER_LO_EXTENDEDPRICE_, #dummy_LINEORDER__KY_MAX_LINEORDER_LO_EXTENDEDPRICE_
Projection: #ssb@test_udaf_cube_update@17179869183.32 AS LO_ORDERKEY, #ssb@test_udaf_cube_update@17179869183.31 AS LO_LINENUMBER, #ssb@test_udaf_cube_update@17179869183.30 AS LO_CUSTKEY, #ssb@test_udaf_cube_update@17179869183.29 AS LO_PARTKEY, Utf8("1") AS dummy_LINEORDER_LO_SUPPKEY, #ssb@test_udaf_cube_update@17179869183.33 AS LO_ORDERDATE, #ssb@test_udaf_cube_update@17179869183.28 AS LO_ORDERPRIORITY, #ssb@test_udaf_cube_update@17179869183.27 AS LO_SHIPPRIORITY, #ssb@test_udaf_cube_update@17179869183.26 AS LO_QUANTITY, #ssb@test_udaf_cube_update@17179869183.44 AS LO_EXTENDEDPRICE, #ssb@test_udaf_cube_update@17179869183.25 AS LO_ORDTOTALPRICE, #ssb@test_udaf_cube_update@17179869183.24 AS LO_DISCOUNT, #ssb@test_udaf_cube_update@17179869183.23 AS LO_TAX, #ssb@test_udaf_cube_update@17179869183.22 AS LO_COMMITDATE, #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.39 AS _KY_COUNT__, #ssb@test_udaf_cube_update@17179869183.40
AS _KY_COUNT_DISTINCT_LINEORDER_LO_ORDERKEY_, #ssb@test_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_, #ssb@test_udaf_cube_update@17179869183.43 AS _KY_PERCENTILE_APPROX_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.49 AS _KY_APPROX_TOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.46 AS _KY_APPROX_BITOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.47 AS _KY_APPROX_SUM_TOPN_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.48 AS _KY_APPROX_SUM_BITOPN_LINEORDER_LO_EXTENDEDPRICE_, Utf8("1") AS dummy_LINEORDER__KY_SUM_LINEORDER_LO_EXTENDEDPRICE_, Utf8("1") AS dummy_LINEORDER__KY_MAX_LINEORDER_LO_EXTENDEDPRICE_
TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])
```
logical plan optimize
```
Calculated optimized plan: Limit: 50000
Projection: #LINEORDER.LO_SHIPMODE, #TEST1
Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #testBITMAPCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_) AS TEST1
Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testBITMAPCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_)]]
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Filter: #LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])
TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 37]), partial_filters=[#LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])]
```
physical plan optimize
```
create_physical_plan optimized plan: Limit: 50000
Projection: #LINEORDER.LO_SHIPMODE, #TEST1
Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #testBITMAPCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_) AS TEST1
Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testBITMAPCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_)]]
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Filter: #LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])
TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 37]), partial_filters=[#LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])]
```
in logical plan `Filter: #LO_SHIPMODE IN ` is after alias `Projection: #LO_ORDERKEY`
After optimize `Filter` is before `Projection`.
cause :
```
ERROR Could not create physical plan: Error during planning: No field named '<unqualified>.LO_SHIPMODE'. Valid fields are 'ssb@test_udaf_cube_update@17179869183.21', 'ssb@test_udaf_cube_update@17179869183.42'.
```
**Expected behavior**
could find alias col
**Additional context**
Add any other context about the problem here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2725: Filter push down need consider alias columns
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1154384486
Hi @Ted-Jiang
If I read your conclusion corrctly, I think you are saying that the filter added by the filter pushdown rule is remapping the names somehow. Specifically, you are proposing:
```
Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
Filter: #LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")]) <-- **** This filter should be `#ssb@test_udaf_cube_update@17179869183.21 IN ....`
TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 37]), partial_filters=[#LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2725: Filter push down need consider alias columns
Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1153601688
Debug log
```
After apply projection_push_down rule:
Optimized logical plan:
Limit: 50000
Projection: #LINEORDER.LO_SHIPMODE, #ASS
Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #APPROXCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_,UInt8(10)) AS ASS
Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testAPPROXCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, UInt8(10))]]
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
Filter: #LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 36])
After apply filter_push_down rule:
Optimized logical plan:
Limit: 50000
Projection: #LINEORDER.LO_SHIPMODE, #ASS
Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #APPROXCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_,UInt8(10)) AS ASS
Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testAPPROXCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, UInt8(10))]]
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
Filter: #LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])
TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 36]), partial_filters=[#LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])]
After apply limit_push_down rule:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #2725: Filter push down need consider alias columns
Posted by GitBox <gi...@apache.org>.
alamb closed issue #2725: Filter push down need consider alias columns
URL: https://github.com/apache/arrow-datafusion/issues/2725
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2725: Filter push down need consider alias columns
Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1153408881
@alamb PTAL I consider change `in_list` col alias name to origin name in optimize? Is this solution ok ? I need your opinion 😊
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2725: Filter push down need consider alias columns
Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1154631256
> If I read your conclusion corrctly, I think you are saying that the filter added by the filter pushdown rule is remapping the names somehow. Specifically, you are proposing:
You are right! Same idea as you, i am working on it 😊
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org