You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "metesynnada (via GitHub)" <gi...@apache.org> on 2023/06/06 10:25:26 UTC

[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #6526: Add support for appending data to external tables - CSV

metesynnada commented on code in PR #6526:
URL: https://github.com/apache/arrow-datafusion/pull/6526#discussion_r1219373481


##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -87,6 +98,277 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
         conf: FileScanConfig,
         filters: Option<&Arc<dyn PhysicalExpr>>,
     ) -> Result<Arc<dyn ExecutionPlan>>;
+
+    /// Take a list of files and the configuration to convert it to the
+    /// appropriate writer executor according to this file format.
+    async fn create_writer_physical_plan(
+        &self,
+        _input: Arc<dyn ExecutionPlan>,
+        _state: &SessionState,
+        _conf: FileSinkConfig,
+    ) -> Result<Arc<dyn ExecutionPlan>> {
+        let msg = "Writer not implemented for this format".to_owned();
+        Err(DataFusionError::NotImplemented(msg))
+    }
+}
+
+/// `AsyncPutWriter` is an object that facilitates asynchronous writing to object stores.
+/// It is specifically designed for the `object_store` crate's `put` method and sends
+/// whole bytes at once when the buffer is flushed.
+pub struct AsyncPutWriter {

Review Comment:
   We hypothesize that the consistent use of `put_multipart` for every put operation might adversely impact the cloud side, as it anticipates files exceeding a specific size (for example, 5MB for AWS). To mitigate this, we've developed a wrapper for the `put` operation that standardizes the write operation on `AsyncWrite`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org