You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mustafasrepo (via GitHub)" <gi...@apache.org> on 2023/06/06 08:35:02 UTC

[GitHub] [arrow-datafusion] mustafasrepo commented on pull request #6526: Add support for appending data to external tables - CSV

mustafasrepo commented on PR #6526:
URL: https://github.com/apache/arrow-datafusion/pull/6526#issuecomment-1578189210

   > Had a quick review, I think this is probably fine, but type-erasing the writer mode seems a little peculiar to me.
   > 
   > This is because each of the methods has rather different characteristics, and imo warrants writing in a different manner.
   > 
   > ## Put
   > The write is completely synchronous (it is writing to memory) and is then atomically flushed, with no need for abort behaviour or async write. All file formats can support this mode
   > 
   > ## Put Multipart
   > The write is async with a final atomic close. Requires custom abort logic. All file formats can support this mode
   > 
   > ## Append
   > Abort is fatal (not even entirely sure how to surface this), only supported by row oriented file formats. Even then requires custom handling for things like CSV headers, etc...
   > 
   > ## Proposal
   > I guess my proposal would be to simply add a match block within `DataSink` for each of the various `FileWriterMode`. Over time I expect we will be able to extract common logic for each of the variants, e.g. a generic `Put` version using a `RecordBatchWriter`, etc... but I'm not sure that trying to unify all the writer modes is a good abstraction and certainly at this stage where we only have one impl seems a touch premature perhaps?
   
   You proposal makes sense to me. I have removed `FileWriterFactory`. Now writer is created in the `CsvSink`. I in the future need for `trait` arises we can do so. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org