You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jörn Horstmann (Jira)" <ji...@apache.org> on 2020/10/10 07:47:00 UTC

[jira] [Assigned] (ARROW-10243) [Rust] [Datafusion] Optimize literal expression evaluation

     [ https://issues.apache.org/jira/browse/ARROW-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jörn Horstmann reassigned ARROW-10243:
--------------------------------------

    Assignee: Jörn Horstmann

> [Rust] [Datafusion] Optimize literal expression evaluation
> ----------------------------------------------------------
>
>                 Key: ARROW-10243
>                 URL: https://issues.apache.org/jira/browse/ARROW-10243
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust, Rust - DataFusion
>            Reporter: Jörn Horstmann
>            Assignee: Jörn Horstmann
>            Priority: Major
>         Attachments: flamegraph.svg
>
>
> While benchmarking the tpch query I noticed that the physical literal expression takes up a sizable amount of time. I think the creation of the corresponding array for numeric literals can be speed up by creating Buffer and ArrayData directly without going through a builder. That also allows to skip building a null bitmap for non-null literals.
> I'm also thinking whether it might be possible to cache the created array. For queries without a WHERE clause, I'd expect all batches except the last to have the same length. I'm not sure though where to store the cached value.
> Another possible optimization could be to cast literals already on the logical plan side. In the tpch query the literal `1` is of type `u64` in the logical plan and then needs to be processed by a cast kernel to convert to `f64` for usage in an arithmetic expression.
> The attached flamegraph is of 10 runs of tpch, with the data being loaded into memory before running the queries (See ARROW-10240).
> {code}
> flamegraph ./target/release/tpch --iterations 10 --path ../tpch-dbgen --format tbl --query 1 --batch-size 4096 -c1 --load
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)