You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2020/12/12 04:54:00 UTC
[jira] [Resolved] (ARROW-10453) [Rust] [DataFusion] Performance
degredation after removing specialization
[ https://issues.apache.org/jira/browse/ARROW-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jorge Leitão resolved ARROW-10453.
----------------------------------
Assignee: Jorge Leitão
Resolution: Fixed
> [Rust] [DataFusion] Performance degredation after removing specialization
> -------------------------------------------------------------------------
>
> Key: ARROW-10453
> URL: https://issues.apache.org/jira/browse/ARROW-10453
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust, Rust - DataFusion
> Affects Versions: 3.0.0
> Reporter: Andy Grove
> Assignee: Jorge Leitão
> Priority: Major
> Fix For: 3.0.0
>
>
> The following commit caused a pretty large drop in performance for the TPC-H benchmark running against a SF=100 data set.
> {code:java}
> 29e9d13481ea6acc3f74cda108ed34ef8a411ba2 is the first bad commit
> commit 29e9d13481ea6acc3f74cda108ed34ef8a411ba2
> Author: Jorge C. Leitao <jo...@gmail.com>
> Date: Sun Oct 18 21:05:48 2020 +0200 ARROW-10002: [Rust] Remove trait specialization from arrow crate
>
> This PR removes trait specialization by leveraging the compiler to remove trivial `if` statements.
>
> I verified that the assembly code was the same in a [simple example](https://rust.godbolt.org/z/qrcW8W). I do not know if this generalizes to our use-case, but I suspect so as LLVM is (hopefully) removing trivial branches like `if a != a`.
>
> The change `get_data_type()` to `DATA_TYPE` is not necessary. I did it before realizing this. IMO it makes it more explicit that this is not a function, but a constant, but we can revert it.
>
> Closes #8485 from jorgecarleitao/simp_types
>
> Authored-by: Jorge C. Leitao <jo...@gmail.com>
> Signed-off-by: Neville Dipale <ne...@gmail.com>:040000 040000 cbdaf3c9e924ec0e51d178df73169956b2bf723f 87c79e17378196b61dce9c5373e008ee94620d58 M rust
> {code}
> Benchmark command:
> {code:java}
> cargo run --release --bin tpch -- --iterations 3 --path /mnt/tpch/parquet-100GB --format parquet --query 1 --batch-size 4096 --concurrency 24{code}
> Before this commit:
> {code:java}
> Query 1 iteration 0 took 13629 ms
> Query 1 iteration 1 took 13450 ms
> Query 1 iteration 2 took 13465 ms {code}
> After this commit:
> {code:java}
> Query 1 iteration 0 took 18586 ms
> Query 1 iteration 1 took 18297 ms
> Query 1 iteration 2 took 18253 ms {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)