You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/05/12 16:28:00 UTC

[jira] [Commented] (ARROW-10425) [Python] Support reading (compressed) CSV file from remote file / binary blob

    [ https://issues.apache.org/jira/browse/ARROW-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343347#comment-17343347 ] 

Antoine Pitrou commented on ARROW-10425:
----------------------------------------

I'm not sure what this issue is about. Currently the following works:
{code:python}
>>> data = b"""a,b,c\n1,2,3\n"""
>>> csv.read_csv(pa.BufferReader(data)).to_pandas()
   a  b  c
0  1  2  3
>>> csv.read_csv(io.BytesIO(data)).to_pandas()
   a  b  c
0  1  2  3
{code}

[~jorisvandenbossche] Could you elaborate a bit?

> [Python] Support reading (compressed) CSV file from remote file / binary blob
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-10425
>                 URL: https://issues.apache.org/jira/browse/ARROW-10425
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: csv
>
> From https://stackoverflow.com/questions/64588076/how-can-i-read-a-csv-gz-file-with-pyarrow-from-a-file-object
> Currently {{pyarrow.csv.rad_csv}} happily takes a path to a compressed file and automatically decompresses it, but AFAIK this only works for local paths. 
> It would be nice to in general support reading CSV from remote files (with URI / specifying a filesystem), and in that case also support compression. 
> In addition we could also read a compressed file from a BytesIO / file-like object, but not sure we want that (as it would required a keyword to indicate the used compression).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)