You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jorge (Jira)" <ji...@apache.org> on 2020/08/09 16:46:00 UTC

[jira] [Created] (ARROW-9678) [Rust] [DataFusion] Improve projection push down to remove unused columns

Jorge created ARROW-9678:
----------------------------

             Summary: [Rust] [DataFusion] Improve projection push down to remove unused columns
                 Key: ARROW-9678
                 URL: https://issues.apache.org/jira/browse/ARROW-9678
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Rust, Rust - DataFusion
            Reporter: Jorge
            Assignee: Jorge


Currently, the projection push down only removes columns that are never referenced in the plan. However, sometimes a projection declares columns that themselves are never used.

This issue is about improving the projection push-down to remove any column that is not logically required by the plan.

Failing unit-test with the idea:

{code:java}
    #[test]
    fn table_unused_column() -> Result<()> {
        let table_scan = test_table_scan()?;
        assert_eq!(3, table_scan.schema().fields().len());
        assert_fields_eq(&table_scan, vec!["a", "b", "c"]);

        // we never use "b" in the first projection => remove it
        let plan = LogicalPlanBuilder::from(&table_scan)
            .project(vec![col("c"), col("a"), col("b")])?
            .filter(col("c").gt(&lit(1)))?
            .project(vec![col("c"), col("a")])?
            .build()?;

        assert_fields_eq(&plan, vec!["c", "a"]);

        let expected = "\
        Projection: #c, #a\
        \n  Selection: #c Gt Int32(1)\
        \n    Projection: #c, #a\
        \n      TableScan: test projection=Some([0, 2])";

        assert_optimized_plan_eq(&plan, expected);

        Ok(())
    }
{code}

This issue was firstly identified by [~andygrove] [here|https://github.com/ballista-compute/ballista/issues/320].




--
This message was sent by Atlassian Jira
(v8.3.4#803005)