You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Jefffrey (via GitHub)" <gi...@apache.org> on 2023/03/31 08:54:38 UTC

[GitHub] [arrow-datafusion] Jefffrey commented on issue #5657: Request for documentation for compressed CSV/JSON support

Jefffrey commented on issue #5657:
URL: https://github.com/apache/arrow-datafusion/issues/5657#issuecomment-1491561571

   Specific issue seems to be in this function:
   
   https://github.com/apache/arrow-datafusion/blob/667f19ebad216b7592af5a91b70a24fb21c3bb64/datafusion/core/src/datasource/listing/table.rs#L431-L444
   
   Because the file extension is `.csv.bz2` and not just `.csv` it doesn't list the file hence leading to inferring schema from an empty list of files, leading to empty schema.
   
   As a temporary workaround I renamed the file from `summary.csv.bz2` to `summary.csv` and this seemed to be picked up properly, however it ran into another issue:
   
   `Error: ArrowError(CsvError("decompression not finished but EOF reached"))`
   
   This specifically stems from here:
   
   https://github.com/apache/arrow-datafusion/blob/667f19ebad216b7592af5a91b70a24fb21c3bb64/datafusion/core/src/datasource/file_format/csv.rs#L208-L215
   
   Haven't looked into it too much, but seems similar to these issues:
   
   - https://github.com/apache/arrow-datafusion/issues/1736
   - https://github.com/apache/arrow-datafusion/issues/5041


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org