You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/11/18 08:03:00 UTC

[jira] [Comment Edited] (ARROW-14740) [Python] duckdb helper functions

    [ https://issues.apache.org/jira/browse/ARROW-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445716#comment-17445716 ] 

Joris Van den Bossche edited comment on ARROW-14740 at 11/18/21, 8:02 AM:
--------------------------------------------------------------------------

Looking at the R code you linked to now, I see that also for R you are indeed calling the register method of the R duckdb package itself: https://github.com/apache/arrow/blob/641554b0bcce587549bfcfd0cde3cb4bc23054aa/r/R/duckdb.R#L55-L75

But in R you have some additional logic to eg add the group variables, which is something that doesn't exist in Python. So in Python it  might simpler and boil down to:

{code:python}
def to_duckdb(table, con, table_name):
    con.register_arrow(table_name, table)
{code}

EDIT: I see the R version also provides defaults for {{con}} and {{table_name}}.

The {{auto_disconnect}} logic you have in R, is that something relevant for Python as well?



was (Author: jorisvandenbossche):
Looking at the R code you linked to now, I see that also for R you are indeed calling the register method of the R duckdb package itself: https://github.com/apache/arrow/blob/641554b0bcce587549bfcfd0cde3cb4bc23054aa/r/R/duckdb.R#L55-L75

But in R you have some additional logic to eg add the group variables, which is something that doesn't exist in Python. So in Python it  might simpler and boil down to:

{code:python}
def to_duckdb(table, con, table_name):
    con.register_arrow(table_name, table)
{code}

The {{auto_disconnect}} logic you have in R, is that something relevant for Python as well?


> [Python] duckdb helper functions
> --------------------------------
>
>                 Key: ARROW-14740
>                 URL: https://issues.apache.org/jira/browse/ARROW-14740
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Jonathan Keane
>            Priority: Major
>
> In the R package, [we have an integration with DuckDB|https://github.com/apache/arrow/blob/master/r/R/duckdb.R] that uses the C-Data and C-Stream interface, we include a handful of helper functions that handle the conversion for end users (including setting up the DuckDB connection, registering the arrow data, etc.)
> Should we also have some helper functions in pyarrow?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)