You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/13 02:56:48 UTC

[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue, #2725: Filter push down need consider alias columns

Ted-Jiang opened a new issue, #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725

   **Describe the bug**
   Filter push down not consider alias columns
   
   **To Reproduce**
   
   Logical plan
   ```
   Received plan for execution: Limit: 50000
     Projection: #LINEORDER.LO_SHIPMODE, #TEST1
       Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #testBITMAPCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_) AS TEST1
         Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testBITMAPCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_)]]
           Projection: #LO_SHIPMODE, #dummy_LINEORDER_LO_SUPPKEY, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
             Filter: #LO_SHIPMODE IN ([CAST(Int32(1) AS Utf8), CAST(Int32(2) AS Utf8), CAST(Int32(3) AS Utf8), CAST(Int32(4) AS Utf8), CAST(Int32(5) AS Utf8)])
               Projection: #LO_ORDERKEY, #LO_LINENUMBER, #LO_CUSTKEY, #LO_PARTKEY, #dummy_LINEORDER_LO_SUPPKEY, #LO_ORDERDATE, #LO_ORDERPRIORITY, #LO_SHIPPRIORITY, #LO_QUANTITY, #LO_EXTENDEDPRICE, #LO_ORDTOTALPRICE, #LO_DISCOUNT, #LO_TAX, #LO_COMMITDATE, #LO_SHIPMODE, #_KY_COUNT__, #_KY_COUNT_DISTINCT_LINEORDER_LO_ORDERKEY_, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_, #_KY_PERCENTILE_APPROX_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_TOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_BITOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_SUM_TOPN_LINEORDER_LO_EXTENDEDPRICE_, #_KY_APPROX_SUM_BITOPN_LINEORDER_LO_EXTENDEDPRICE_, #dummy_LINEORDER__KY_SUM_LINEORDER_LO_EXTENDEDPRICE_, #dummy_LINEORDER__KY_MAX_LINEORDER_LO_EXTENDEDPRICE_
                 Projection: #ssb@test_udaf_cube_update@17179869183.32 AS LO_ORDERKEY, #ssb@test_udaf_cube_update@17179869183.31 AS LO_LINENUMBER, #ssb@test_udaf_cube_update@17179869183.30 AS LO_CUSTKEY, #ssb@test_udaf_cube_update@17179869183.29 AS LO_PARTKEY, Utf8("1") AS dummy_LINEORDER_LO_SUPPKEY, #ssb@test_udaf_cube_update@17179869183.33 AS LO_ORDERDATE, #ssb@test_udaf_cube_update@17179869183.28 AS LO_ORDERPRIORITY, #ssb@test_udaf_cube_update@17179869183.27 AS LO_SHIPPRIORITY, #ssb@test_udaf_cube_update@17179869183.26 AS LO_QUANTITY, #ssb@test_udaf_cube_update@17179869183.44 AS LO_EXTENDEDPRICE, #ssb@test_udaf_cube_update@17179869183.25 AS LO_ORDTOTALPRICE, #ssb@test_udaf_cube_update@17179869183.24 AS LO_DISCOUNT, #ssb@test_udaf_cube_update@17179869183.23 AS LO_TAX, #ssb@test_udaf_cube_update@17179869183.22 AS LO_COMMITDATE, #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.39 AS _KY_COUNT__, #ssb@test_udaf_cube_update@17179869183.40
  AS _KY_COUNT_DISTINCT_LINEORDER_LO_ORDERKEY_, #ssb@test_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_, #ssb@test_udaf_cube_update@17179869183.43 AS _KY_PERCENTILE_APPROX_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.49 AS _KY_APPROX_TOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.46 AS _KY_APPROX_BITOPN_SUM_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.47 AS _KY_APPROX_SUM_TOPN_LINEORDER_LO_EXTENDEDPRICE_, #ssb@test_udaf_cube_update@17179869183.48 AS _KY_APPROX_SUM_BITOPN_LINEORDER_LO_EXTENDEDPRICE_, Utf8("1") AS dummy_LINEORDER__KY_SUM_LINEORDER_LO_EXTENDEDPRICE_, Utf8("1") AS dummy_LINEORDER__KY_MAX_LINEORDER_LO_EXTENDEDPRICE_
                   TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])
   ```
   
   logical plan optimize
   
   ```
   Calculated optimized plan: Limit: 50000
     Projection: #LINEORDER.LO_SHIPMODE, #TEST1
       Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #testBITMAPCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_) AS TEST1
         Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testBITMAPCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_)]]
           Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
             Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
               Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
                 Filter: #LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])
                   TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 37]), partial_filters=[#LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])]
   
   ```
   
   physical plan optimize
   
   ```
                   
   create_physical_plan optimized plan: Limit: 50000
     Projection: #LINEORDER.LO_SHIPMODE, #TEST1
       Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #testBITMAPCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_) AS TEST1
         Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testBITMAPCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_)]]
           Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
             Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
               Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
                 Filter: #LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])
                   TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 37]), partial_filters=[#LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])]
   
   ```
   
   in logical plan `Filter: #LO_SHIPMODE IN ` is after alias `Projection: #LO_ORDERKEY`
   After optimize `Filter` is before `Projection`.
   cause :
   ```
   ERROR  Could not create physical plan: Error during planning: No field named '<unqualified>.LO_SHIPMODE'. Valid fields are 'ssb@test_udaf_cube_update@17179869183.21', 'ssb@test_udaf_cube_update@17179869183.42'.
   ```
   
   **Expected behavior**
   could find alias col
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2725: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1154384486

   Hi @Ted-Jiang 
   
   If I read your conclusion corrctly, I think you are saying that the filter added by the filter pushdown rule is remapping the names somehow. Specifically, you are proposing:
   
   ```
               Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.42 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SUPPKEY_
                 Filter: #LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])  <-- **** This filter should be `#ssb@test_udaf_cube_update@17179869183.21 IN ....`
                   TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 37]), partial_filters=[#LO_SHIPMODE IN ([Utf8("1"), Utf8("2"), Utf8("3"), Utf8("4"), Utf8("5")])]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2725: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1153601688

   Debug log
   ```
   After apply projection_push_down rule:
   
   Optimized logical plan:
   Limit: 50000
     Projection: #LINEORDER.LO_SHIPMODE, #ASS
       Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #APPROXCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_,UInt8(10)) AS ASS
         Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testAPPROXCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, UInt8(10))]]
           Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
             Filter: #LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])
               Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
                 Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
                   TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 36])
   
   After apply filter_push_down rule:
   
   Optimized logical plan:
   Limit: 50000
     Projection: #LINEORDER.LO_SHIPMODE, #ASS
       Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #APPROXCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_,UInt8(10)) AS ASS
         Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[testAPPROXCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, UInt8(10))]]
           Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
             Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
               Projection: #ssb@test_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@test_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
                 Filter: #LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])
                   TableScan: ssb@test_udaf_cube_update@17179869183 projection=Some([12, 36]), partial_filters=[#LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])]
   
   After apply limit_push_down rule:
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #2725: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
alamb closed issue #2725: Filter push down need consider alias columns
URL: https://github.com/apache/arrow-datafusion/issues/2725


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2725: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1153408881

   @alamb  PTAL  I consider change `in_list` col  alias name to origin name in  optimize? Is this solution ok ? I need your opinion 😊


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2725: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2725:
URL: https://github.com/apache/arrow-datafusion/issues/2725#issuecomment-1154631256

   > If I read your conclusion corrctly, I think you are saying that the filter added by the filter pushdown rule is remapping the names somehow. Specifically, you are proposing:
   
   You are right! Same idea as you, i am working on it 😊


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org