You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Jordan Samuels (JIRA)" <ji...@apache.org> on 2019/07/18 03:13:00 UTC
[jira] [Created] (ARROW-5974) read_csv returns truncated read for
some valid gzip files
Jordan Samuels created ARROW-5974:
-------------------------------------
Summary: read_csv returns truncated read for some valid gzip files
Key: ARROW-5974
URL: https://issues.apache.org/jira/browse/ARROW-5974
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.14.0
Reporter: Jordan Samuels
If two gzipped files are concatenated together, the result is a valid gzip file. However, it appears that pyarrow.csv.read_csv will only read the portion related to the first file.
If the repro script [here|https://gist.github.com/jordansamuels/d69f1c22c58418f5dfa0785b9ecd211e] is run, the output is:
{{$ python repro.py}}
{{pyarrow.csv only reads one row:}}
{{ x}}
{{0 1}}
{{pandas reads two rows:}}
{{ x}}
{{0 1}}
{{1 2}}
{{pyarrow version: 0.14.0}}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)