You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/17 17:48:20 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue, #2935: Streaming CSV/JSON Read

tustvold opened a new issue, #2935:
URL: https://github.com/apache/arrow-datafusion/issues/2935

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   Currently `CsvOpener` and `JsonOpener` call [GetResult::bytes](https://docs.rs/object_store/latest/object_store/enum.GetResult.html#method.bytes) which downloads the entire file, prior to feeding it to the appropriate arrow reader.
   
   This is not ideal:
   
   * Adds decode latency as must buffer full payload before reading
   * May read more data than necessary (#2930)
   
   Following on from #2677 we now support streaming responses from object storage
   
   **Describe the solution you'd like**
   
   The underlying challenge is to take arbitrary `Stream<Bytes>` and convert it into a `Stream<Bytes>` where each stream element contains complete rows, as delimited by a newline character. Once we have this `DelimitedStream`, it is trivial to feed each of these byte chunks individually into the corresponding decoder.
   
   **Describe alternatives you've considered**
   
   We could not do this
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold closed issue #2935: Streaming CSV/JSON Object Store Read

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #2935: Streaming CSV/JSON Object Store Read
URL: https://github.com/apache/arrow-datafusion/issues/2935


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org