You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "N Gautam Animesh (Jira)" <ji...@apache.org> on 2022/09/21 11:54:00 UTC

[jira] [Created] (ARROW-17796) Using cbind when merging multi datasets using open_dataset on a directory.

N Gautam Animesh created ARROW-17796:
----------------------------------------

             Summary: Using cbind when merging multi datasets using open_dataset on a directory.
                 Key: ARROW-17796
                 URL: https://issues.apache.org/jira/browse/ARROW-17796
             Project: Apache Arrow
          Issue Type: Task
            Reporter: N Gautam Animesh


I was wondering if we can use cbind stating particular column names when merging multi datasets using open_dataset(), so that we can bind only those particular cols.

I was using open_dataset to read multi datasets in a particular directory and wanted to merge  these multi datasets based on some particular columns that are common to all the datasets.

Is it possible to merge these datasets column wise, since by default open_dataset is merging all the datasets one after the other row-wise?

Do let me know if there's anything like this or any other work around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)