You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/12/07 03:35:00 UTC
[jira] [Updated] (ARROW-14479) [C++][Compute] Hash Join microbenchmarks
[ https://issues.apache.org/jira/browse/ARROW-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-14479:
-----------------------------------
Labels: pull-request-available (was: )
> [C++][Compute] Hash Join microbenchmarks
> ----------------------------------------
>
> Key: ARROW-14479
> URL: https://issues.apache.org/jira/browse/ARROW-14479
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 7.0.0
> Reporter: Michal Nowakiewicz
> Assignee: Sasha Krassovsky
> Priority: Major
> Labels: pull-request-available
> Fix For: 7.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Implement a series of microbenchmarks giving a good picture of the performance of hash join implemented in Arrow across different set of dimensions.
> Compare the performance against some other product(s).
> Add scripts for generating useful visual reports giving a good picture of the costs of hash join.
> Examples of dimensions to explore in microbenchmarks:
> * number of duplicate keys on build side
> * relative size of build side to probe side
> * selectivity of the join
> * number of key columns
> * number of payload columns
> * filtering performance for semi- and anti- joins
> * dense integer key vs sparse integer key vs string key
> * build size
> * scaling of build, filtering, probe
> * inner vs left outer, inner vs right outer
> * left semi vs right semi, left anti vs right anti, left outer vs right outer
> * non-uniform key distribution
> * monotonic key values in input, partitioned key values in input (with and without per batch min-max metadata)
> * chain of multiple hash joins
> * overhead of Bloom filter for non-selective Bloom filter
--
This message was sent by Atlassian Jira
(v8.20.1#820001)