You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/05/12 16:28:00 UTC
[jira] [Commented] (ARROW-10425) [Python] Support reading
(compressed) CSV file from remote file / binary blob
[ https://issues.apache.org/jira/browse/ARROW-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343347#comment-17343347 ]
Antoine Pitrou commented on ARROW-10425:
----------------------------------------
I'm not sure what this issue is about. Currently the following works:
{code:python}
>>> data = b"""a,b,c\n1,2,3\n"""
>>> csv.read_csv(pa.BufferReader(data)).to_pandas()
a b c
0 1 2 3
>>> csv.read_csv(io.BytesIO(data)).to_pandas()
a b c
0 1 2 3
{code}
[~jorisvandenbossche] Could you elaborate a bit?
> [Python] Support reading (compressed) CSV file from remote file / binary blob
> -----------------------------------------------------------------------------
>
> Key: ARROW-10425
> URL: https://issues.apache.org/jira/browse/ARROW-10425
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
> Labels: csv
>
> From https://stackoverflow.com/questions/64588076/how-can-i-read-a-csv-gz-file-with-pyarrow-from-a-file-object
> Currently {{pyarrow.csv.rad_csv}} happily takes a path to a compressed file and automatically decompresses it, but AFAIK this only works for local paths.
> It would be nice to in general support reading CSV from remote files (with URI / specifying a filesystem), and in that case also support compression.
> In addition we could also read a compressed file from a BytesIO / file-like object, but not sure we want that (as it would required a keyword to indicate the used compression).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)