You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Martin du Toit (Jira)" <ji...@apache.org> on 2022/04/06 11:23:00 UTC
[jira] [Created] (ARROW-16133) [R][Python] Convert python dataset to R dataset
Martin du Toit created ARROW-16133:
--------------------------------------
Summary: [R][Python] Convert python dataset to R dataset
Key: ARROW-16133
URL: https://issues.apache.org/jira/browse/ARROW-16133
Project: Apache Arrow
Issue Type: Wish
Components: Python, R
Reporter: Martin du Toit
Hi.
I can open an arrow dataset from R using reticulate, but I need to use that dataset further in R. How can I convert the Python arrow dataset to a R arrow dataset for further processing?
{code:r}
reticulate::py_discover_config()
reticulate::py_available(initialize = TRUE)
pd <- reticulate::import("pandas", convert = FALSE)
adlfs <- reticulate::import("adlfs", convert = FALSE)
pa <- reticulate::import("pyarrow", convert = FALSE)
pyds <- reticulate::import("pyarrow.dataset", convert = FALSE)
pafs <- reticulate::import("pyarrow.filesystem", convert = FALSE)
dl_path = "investmentaccountingdata/rawdata/transactions/transactions-xxx/v1.1"
format_name <- "transactions_transactions-xxx_v1.1"
config <- get_config()
datalake_secret <- config$get_datalake_secret()
account_name <- datalake_secret$storname
account_key <- datalake_secret$storkey
dm_file_type <- dmfile_create_from_name(format_name = format_name)
format_all <- dpl_arrow_format_get(dm_file_type)
fs = adlfs$AzureBlobFileSystem(account_name=account_name, account_key=account_key)
# Works as expected
fs$ls("/")
schema_file <- dpl_arrow_schema_get_dm(dm_file_type, all_char = TRUE, pyarrow = pa)
ds <- pyds$dataset(source = dl_path, filesystem=fs, partitioning="hive", format="csv", schema = schema_file)
# This works as expected
files <- ds$files
files <- py_to_r(files)
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)