You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/02/25 14:39:44 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #5383: The output of write_csv and write_json methods is confusing.

alamb commented on issue #5383:
URL: https://github.com/apache/arrow-datafusion/issues/5383#issuecomment-1445133312

   Thank you @Jefffrey for the analysis
   
   > Not sure if in writing logic its possible to check if a partition is empty before attempting to write to disk?
   
   I think it would be best to defer creating the files until there is actually some data (aka don't create the writer until we have at least a single record batch to write)
   
   The other thing we can do would would be to add some way to the dataframe / write_csv API to say "I want the results in a single partiton/file" -- perhaps by adding `DataFrame::repartititon` or something so the user can control if they want multiple files (potentially faster to write) or a single file (slower to write, but easier to use)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org