You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/12/15 22:25:00 UTC

[jira] [Updated] (ARROW-9235) [R] Support for `connection` class when reading and writing files

     [ https://issues.apache.org/jira/browse/ARROW-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson updated ARROW-9235:
-----------------------------------
    Fix Version/s:     (was: 3.0.0)
                   4.0.0

> [R] Support for `connection` class when reading and writing files
> -----------------------------------------------------------------
>
>                 Key: ARROW-9235
>                 URL: https://issues.apache.org/jira/browse/ARROW-9235
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: Michael Quinn
>            Priority: Major
>             Fix For: 4.0.0
>
>
> We have an internal filesystem that we interact with through objects that inherit from the connection class. These files aren't necessarily local, making it slightly more complicated to read and write parquet files, for example.
> For now, we're generating raw vectors and using that to create the file. For example, to read files
> {noformat}
> ReadParquet <- function(filename, ...) {}}
>    file <-file(filename,"rb")
>    on.exit(close(file))
>    raw <- readBin(file, "raw", FileInfo(filename)$size)
>    return(arrow::read_parquet(raw, ...))
> }
> {noformat}
> And to write,
> {noformat}
> WriteParquet <- function(df, filepath, ...) {
>    stream <- BufferOutputStream$create()
>    write_parquet(df, stream, ...)
>    raw <- stream$finish()$data()
>    file <- file(filepath, "wb")
>    on.exit(close(file)
>    writeBin(raw, file)
>    return(invisible())
> }
> {noformat}
> At the C++ level, we are interacting with ` R_new_custom_connection` defined here:
>  [https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h]
> I've been very impressed with how feature-rich arrow is. It would be nice to see this API supported as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)