You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jorge (Jira)" <ji...@apache.org> on 2020/08/09 16:46:00 UTC
[jira] [Created] (ARROW-9678) [Rust] [DataFusion] Improve
projection push down to remove unused columns
Jorge created ARROW-9678:
----------------------------
Summary: [Rust] [DataFusion] Improve projection push down to remove unused columns
Key: ARROW-9678
URL: https://issues.apache.org/jira/browse/ARROW-9678
Project: Apache Arrow
Issue Type: New Feature
Components: Rust, Rust - DataFusion
Reporter: Jorge
Assignee: Jorge
Currently, the projection push down only removes columns that are never referenced in the plan. However, sometimes a projection declares columns that themselves are never used.
This issue is about improving the projection push-down to remove any column that is not logically required by the plan.
Failing unit-test with the idea:
{code:java}
#[test]
fn table_unused_column() -> Result<()> {
let table_scan = test_table_scan()?;
assert_eq!(3, table_scan.schema().fields().len());
assert_fields_eq(&table_scan, vec!["a", "b", "c"]);
// we never use "b" in the first projection => remove it
let plan = LogicalPlanBuilder::from(&table_scan)
.project(vec![col("c"), col("a"), col("b")])?
.filter(col("c").gt(&lit(1)))?
.project(vec![col("c"), col("a")])?
.build()?;
assert_fields_eq(&plan, vec!["c", "a"]);
let expected = "\
Projection: #c, #a\
\n Selection: #c Gt Int32(1)\
\n Projection: #c, #a\
\n TableScan: test projection=Some([0, 2])";
assert_optimized_plan_eq(&plan, expected);
Ok(())
}
{code}
This issue was firstly identified by [~andygrove] [here|https://github.com/ballista-compute/ballista/issues/320].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)