You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2020/12/08 19:22:00 UTC
[jira] [Reopened] (ARROW-10844) [Rust] [DataFusion] join of two
DataFrames is not possible
[ https://issues.apache.org/jira/browse/ARROW-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jorge Leitão reopened ARROW-10844:
----------------------------------
> [Rust] [DataFusion] join of two DataFrames is not possible
> ----------------------------------------------------------
>
> Key: ARROW-10844
> URL: https://issues.apache.org/jira/browse/ARROW-10844
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust, Rust - DataFusion
> Affects Versions: 3.0.0
> Reporter: Jorge Leitão
> Priority: Blocker
> Fix For: 3.0.0
>
>
> The pseudo code
>
> {code:java}
> df = context.createDataFrame(...)
> df1 = context.createDataFrame(...)
> df.join(df1, ...)
> {code}
>
> currently does not work because we clone the {{ExecutionContextState}} from the context to the `df`, causing the left and right to share a different context state. In particular, `left` will not have the table registered on the right, which means that its `collect` will fail.
> We may need an Arc<Mutex<{{ExecutionContextState}}>> to share a common mutable state across multiple DataFrames. Alternatively, not require tables to be registered in the context to be used by DataFrames.
> Note that the current example in `DataFrame::join` docs works because it shares the same table. This won't happen if e.g. we use in-memory tables.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)