You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2021/04/25 13:55:00 UTC

[jira] [Closed] (ARROW-12253) [Rust] [Ballista] Implement scalable joins

     [ https://issues.apache.org/jira/browse/ARROW-12253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Grove closed ARROW-12253.
------------------------------
    Resolution: Won't Fix

Moved to https://github.com/apache/arrow-datafusion/issues/63

> [Rust] [Ballista] Implement scalable joins
> ------------------------------------------
>
>                 Key: ARROW-12253
>                 URL: https://issues.apache.org/jira/browse/ARROW-12253
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust - Ballista
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>             Fix For: 5.0.0
>
>
> The main issue limiting scalability in Ballista today is that joins are implemented as hash joins where each partition of the probe side causes the entire left side to be loaded into memory.
> To make this scalable we need to hash partition left and right inputs so that we can join the left and right partitions in parallel.
> There is already work underway in DataFusion to implement this that we can leverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)