You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/19 10:47:40 UTC

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2246: Add write and delete operations to ObjectStore

tustvold commented on code in PR #2246:
URL: https://github.com/apache/arrow-datafusion/pull/2246#discussion_r852853304


##########
ballista/rust/core/src/serde/logical_plan/mod.rs:
##########
@@ -1016,6 +1016,55 @@ mod roundtrip_tests {
                 "this is only a test object store".to_string(),
             ))
         }
+
+        fn file_writer(
+            &self,
+            _path: &str,
+        ) -> datafusion_data_access::Result<Arc<dyn ObjectWriter>> {
+            unimplemented!();
+        }
+
+        async fn create_dir(
+            &self,
+            _path: &str,
+            _recursive: bool,
+        ) -> datafusion_data_access::Result<()> {
+            unimplemented!();
+        }
+
+        async fn remove_dir_all(
+            &self,
+            _path: &str,
+        ) -> datafusion_data_access::Result<()> {
+            unimplemented!();
+        }
+
+        async fn remove_dir_contents(
+            &self,
+            _path: &str,
+        ) -> datafusion_data_access::Result<()> {
+            unimplemented!();
+        }
+
+        async fn remove_file(&self, _path: &str) -> datafusion_data_access::Result<()> {
+            unimplemented!();
+        }
+
+        async fn rename(

Review Comment:
   I'm not aware of any object stores that support native rename, I think this will need to be implemented as a copy and remove. Perhaps we could provide this as the default implementation?



##########
data-access/src/object_store/mod.rs:
##########
@@ -101,4 +110,57 @@ pub trait ObjectStore: Sync + Send + Debug {
 
     /// Get object reader for one file
     fn file_reader(&self, file: SizedFile) -> Result<Arc<dyn ObjectReader>>;
+

Review Comment:
   I think it would be good to define somewhere the atomicity requirements of implementations. For example, S3 does not provide create if not exists APIs, and therefore cannot guarantee atomic writes.
   
   Similarly if rename, copy, etc... is called concurrently with a writer, the behaviour would likely be implementation defined. Renaming a local file would rename the writer, deleting a local file will vary if Windows or POSIX, etc... 
   
   I think it is fine to just say this is not supported, but if people use this trait and expect it to behave like a filesystem, they're likely to be surprised :sweat_smile: 



##########
ballista/rust/core/src/serde/logical_plan/mod.rs:
##########
@@ -1016,6 +1016,55 @@ mod roundtrip_tests {
                 "this is only a test object store".to_string(),
             ))
         }
+
+        fn file_writer(
+            &self,
+            _path: &str,
+        ) -> datafusion_data_access::Result<Arc<dyn ObjectWriter>> {
+            unimplemented!();
+        }
+
+        async fn create_dir(

Review Comment:
   I was curious what this corresponded to given that object stores don't have a concept of a directory. It would appear that at least in the case of the S3 implementation, it creates empty objects as pretend directories - https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc
   
   I'm not sure how I feel about this



##########
data-access/src/object_store/local.rs:
##########
@@ -185,7 +297,30 @@ pub fn local_unpartitioned_file(file: String) -> FileMeta {
             size: metadata.len(),
             path: file,
         },
-        last_modified: metadata.modified().map(chrono::DateTime::from).ok(),
+        last_modified: metadata.modified().map(DateTime::<Utc>::from).ok(),
+    }
+}
+
+struct LocalFileWriter {
+    path: String,
+}
+
+impl LocalFileWriter {
+    fn new(path: String) -> Result<Self> {
+        Ok(Self { path })
+    }
+}
+
+#[async_trait]
+impl ObjectWriter for LocalFileWriter {
+    async fn writer(&self) -> Result<Pin<Box<dyn AsyncWrite>>> {
+        let file = AsyncFile::open(&self.path).await?;
+        Ok(Box::pin(file))
+    }
+
+    fn sync_writer(&self) -> Result<Box<dyn Write + Send + Sync>> {

Review Comment:
   FYI currently writing parquet requires Seek - but there is a suggestion here on how that could be removed https://github.com/apache/arrow-rs/issues/937



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org