You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2020/12/08 19:22:00 UTC

[jira] [Reopened] (ARROW-10844) [Rust] [DataFusion] join of two DataFrames is not possible

     [ https://issues.apache.org/jira/browse/ARROW-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jorge Leitão reopened ARROW-10844:
----------------------------------

> [Rust] [DataFusion] join of two DataFrames is not possible
> ----------------------------------------------------------
>
>                 Key: ARROW-10844
>                 URL: https://issues.apache.org/jira/browse/ARROW-10844
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>    Affects Versions: 3.0.0
>            Reporter: Jorge Leitão
>            Priority: Blocker
>             Fix For: 3.0.0
>
>
> The pseudo code
>  
> {code:java}
> df = context.createDataFrame(...)
> df1 = context.createDataFrame(...)
> df.join(df1, ...)
> {code}
>  
> currently does not work because we clone the {{ExecutionContextState}} from the context to the `df`, causing the left and right to share a different context state. In particular, `left` will not have the table registered on the right, which means that its `collect` will fail.
> We may need an Arc<Mutex<{{ExecutionContextState}}>> to share a common mutable state across multiple DataFrames. Alternatively, not require tables to be registered in the context to be used by DataFrames.
> Note that the current example in `DataFrame::join` docs works because it shares the same table. This won't happen if e.g. we use in-memory tables.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)