You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/02/08 07:30:00 UTC

[jira] [Created] (ARROW-4512) [R] Stream reader/writer API that takes socket stream

Hyukjin Kwon created ARROW-4512:
-----------------------------------

             Summary: [R] Stream reader/writer API that takes socket stream
                 Key: ARROW-4512
                 URL: https://issues.apache.org/jira/browse/ARROW-4512
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
    Affects Versions: 0.12.0
            Reporter: Hyukjin Kwon


I have been working on Spark integration with Arrow.

I realised that there are no ways to use socket as input to use Arrow stream format. For instance,
I want to something like:

{code}
connStream <- socketConnection(port = 9999, blocking = TRUE, open = "wb")

rdf_slices <- # a list of data frames.

stream_writer <- NULL
tryCatch({
  for (rdf_slice in rdf_slices) {
    batch <- record_batch(rdf_slice)
    if (is.null(stream_writer)) {
      stream_writer <- RecordBatchStreamWriter(connStream, batch$schema)  # Here, looks there's no way to use socket.
    }

    stream_writer$write_batch(batch)
  }
},
finally = {
  if (!is.null(stream_writer)) {
    stream_writer$close()
  }
})
{code}


Likewise, I cannot find a way to iterate the stream batch by batch

{code}
RecordBatchStreamReader(connStream)$batches()  # Here, looks there's no way to use socket.
{code}

This looks easily possible in Python side but looks missing in R APIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)