You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "yukkit (via GitHub)" <gi...@apache.org> on 2023/04/14 03:33:14 UTC

[GitHub] [arrow-datafusion] yukkit opened a new issue, #6001: Incorrect column pruning in sql with window operations

yukkit opened a new issue, #6001:
URL: https://github.com/apache/arrow-datafusion/issues/6001

   ### Describe the bug
   
   As the title
   
   ### To Reproduce
   
   ```sql
   ❯ explain  select sum(case when latitude < 50.0 then latitude else 0 end) over (partition by name) from readings;
   +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
   +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: SUM(CASE WHEN readings.latitude < Float64(50) THEN readings.latitude ELSE Int64(0) END) PARTITION BY [readings.name] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING                                                                                                                                                                                                                                                                                                                                                                                                                     |
   |               |   WindowAggr: windowExpr=[[SUM(CASE WHEN readings.latitude < Float64(50) THEN readings.latitude ELSE Float64(0) END) PARTITION BY [readings.name] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS SUM(CASE WHEN readings.latitude < Float64(50) THEN readings.latitude ELSE Int64(0) END) PARTITION BY [readings.name] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]                                                                                                                                                                                                                 |
   |               |     TableScan: readings projection=[time, name, fleet, driver, model, device_version, latitude, longitude, elevation, velocity, heading, grade, fuel_consumption, load_capacity, fuel_capacity, nominal_fuel_consumption]                                                                                                                                                                                                                                                                                                                                                                                     |
   | physical_plan | ProjectionExec: expr=[SUM(CASE WHEN readings.latitude < Float64(50) THEN readings.latitude ELSE Int64(0) END) PARTITION BY [readings.name] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@16 as SUM(CASE WHEN readings.latitude < Float64(50) THEN readings.latitude ELSE Int64(0) END) PARTITION BY [readings.name] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]                                                                                                                                                                                                                      |
   |               |   WindowAggExec: wdw=[SUM(CASE WHEN readings.latitude < Float64(50) THEN readings.latitude ELSE Int64(0) END) PARTITION BY [readings.name] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Ok(Field { name: "SUM(CASE WHEN readings.latitude < Float64(50) THEN readings.latitude ELSE Int64(0) END) PARTITION BY [readings.name] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)) }] |
   |               |     SortExec: expr=[name@1 ASC NULLS LAST]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
   |               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
   |               |         RepartitionExec: partitioning=Hash([Column { name: "name", index: 1 }], 8), input_partitions=1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
   |               |           ParquetExec: limit=None, partitions={1 group: [[Users/yukkit/Documents/tmp/data/parquet/part-297.parquet]]}, projection=[time, name, fleet, driver, model, device_version, latitude, longitude, elevation, velocity, heading, grade, fuel_consumption, load_capacity, fuel_capacity, nominal_fuel_consumption]                                                                                                                                                                                                                                                                                      |
   |               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
   +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.026 seconds.
   ```
   
   ### Expected behavior
   
   Push down only `latitude` and `name`  to TableScan.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener closed issue #6001: Incorrect column pruning in sql with window operations

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener closed issue #6001: Incorrect column pruning in sql with window operations
URL: https://github.com/apache/arrow-datafusion/issues/6001


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org