You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2022/09/21 14:51:00 UTC

[jira] [Commented] (ARROW-17802) [R] Merging multi file datasets on particular columns that are present in all the datasets.

    [ https://issues.apache.org/jira/browse/ARROW-17802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607798#comment-17607798 ] 

Nicola Crane commented on ARROW-17802:
--------------------------------------

[~nanimesh] Can you give an example using small toy/example datasets showing the kind of thing you mean?  Some example inputs and an example output that you'd like to achieve, perhaps.  The code example in the description doesn't help me see what you're trying to achieve here.

> [R] Merging multi file datasets on particular columns that are present in all the datasets.
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17802
>                 URL: https://issues.apache.org/jira/browse/ARROW-17802
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: N Gautam Animesh
>            Priority: Major
>
> While working with multi file datasets, I came across an issue where I wanted to merge specific columns from all the datasets and work on them.
> Though I was not able to do so, I want to know whether there is any work around for merging multi file datasets around some specific columns?
> Please look into it and do let me know if there's anything regarding this.
> {code:java}
> system.time({
>   df <- open_dataset('C:/Test/Files/test', format = "arrow")
>   df <- df %>% collect() %>%
>   #merging logic so as to select only specified column(s)
>   #write_dataset(df, 'C:/Test/Files/test', format = "arrow")
> }) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)