You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "N Gautam Animesh (Jira)" <ji...@apache.org> on 2022/09/21 14:38:00 UTC

[jira] [Created] (ARROW-17802) Merging multi file datasets on particular columns that are present in all the datasets.

N Gautam Animesh created ARROW-17802:
----------------------------------------

             Summary: Merging multi file datasets on particular columns that are present in all the datasets.
                 Key: ARROW-17802
                 URL: https://issues.apache.org/jira/browse/ARROW-17802
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: N Gautam Animesh


While working with multi file datasets, I came across an issue where I wanted to merge specific columns from all the datasets and work on them.
Though I was not able to do so, I want to know whether there is any work around for merging multi file datasets around some specific columns?
Please look into it and do let me know if there's anything regarding this.
{code:java}
system.time({
  df <- open_dataset('C:/Test/Files/test', format = "arrow")
  df <- df %>% collect() %>%
  #merging logic so as to select only specified column(s)
  #write_dataset(df, 'C:/Test/Files/test', format = "arrow")
}) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)