You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/04/23 20:36:00 UTC

[jira] [Comment Edited] (ARROW-9235) [R] Support for `connection` class when reading and writing files

    [ https://issues.apache.org/jira/browse/ARROW-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331019#comment-17331019 ] 

Ian Cook edited comment on ARROW-9235 at 4/23/21, 8:35 PM:
-----------------------------------------------------------

Since Arrow does not yet support HTTP URIs in file reading functions (ARROW-7594), if we do implement support for connections in the R package, we should consider detecting HTTP URI strings and using connections to handle them.


was (Author: icook):
Since Arrow does not yet support HTTP URIs in file reading/writing functions (ARROW-7594), if we do implement support for connections in the R package, we should consider detecting HTTP URI strings and using connections to handle them.

> [R] Support for `connection` class when reading and writing files
> -----------------------------------------------------------------
>
>                 Key: ARROW-9235
>                 URL: https://issues.apache.org/jira/browse/ARROW-9235
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: Michael Quinn
>            Priority: Major
>             Fix For: 5.0.0
>
>
> We have an internal filesystem that we interact with through objects that inherit from the connection class. These files aren't necessarily local, making it slightly more complicated to read and write parquet files, for example.
> For now, we're generating raw vectors and using that to create the file. For example, to read files
> {noformat}
> ReadParquet <- function(filename, ...) {}}
>    file <-file(filename,"rb")
>    on.exit(close(file))
>    raw <- readBin(file, "raw", FileInfo(filename)$size)
>    return(arrow::read_parquet(raw, ...))
> }
> {noformat}
> And to write,
> {noformat}
> WriteParquet <- function(df, filepath, ...) {
>    stream <- BufferOutputStream$create()
>    write_parquet(df, stream, ...)
>    raw <- stream$finish()$data()
>    file <- file(filepath, "wb")
>    on.exit(close(file)
>    writeBin(raw, file)
>    return(invisible())
> }
> {noformat}
> At the C++ level, we are interacting with ` R_new_custom_connection` defined here:
>  [https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h]
> I've been very impressed with how feature-rich arrow is. It would be nice to see this API supported as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)