You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/11/14 09:10:21 UTC

Re: [I] [Python] Reading empty CSV file in parallel hangs [arrow]

jorisvandenbossche commented on issue #38676:
URL: https://github.com/apache/arrow/issues/38676#issuecomment-1809811295

   > First: "cannot infer number of columns". This is because the file has no newline at all. If you add a newline at the end, the error disappears. I wonder if such files exist in the wild, but would be good to add support for them.
   
   No idea if that occurs much in the wild, I think in the pandas test suite this was used for testing empty CSV files (no actual data, only header), which I assume will occur from time to time. But so it seems that the pandas csv reader can handle both the case with or without a newline (`data = "x,y,z"` or `data = "x,y,z\n"`) and result in an empty DataFrame with three columns in both cases (ad it just happened to test with the one without newline). 
   While our csv reader will give an empty table for the second case (with newline), but not for the first. Not being familiar with the CSV code (and aside from fixing the deadlock), but would it make sense to also support both cases here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org