You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "RobinLin666 (via GitHub)" <gi...@apache.org> on 2023/11/18 07:07:20 UTC

[PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

RobinLin666 opened a new pull request, #5094:
URL: https://github.com/apache/arrow-rs/pull/5094

   # Which issue does this PR close?
   
   When we try to write to mounted path with blobfuse, we'll need to update object_store to explicitly close (drop) the file before calls to std::fs::rename, otherwise the metadata is not flushed in time for the rename. 
   Blobfuse may be for the sake of perf, so the upload file is only available when close file. Refer: https://github.com/Azure/azure-storage-fuse/blob/master/blobfuse/fileapis.cpp#L346
   
   Closes [#.](https://github.com/delta-io/delta-rs/issues/1765)
   
   # Rationale for this change
    
   we'll need to update object_store to explicitly close (drop) the file before calls to std::fs::rename, otherwise the metadata is not flushed in time for the rename. 
   
   # What changes are included in this PR?
   
   sync the file before renaming.
   
   # Are there any user-facing changes?
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398208982


##########
object_store/src/lib.rs:
##########
@@ -1064,6 +1064,8 @@ pub enum PutMode {
     /// Perform an atomic write operation if the current version of the object matches the
     /// provided [`UpdateVersion`], returning [`Error::Precondition`] otherwise
     Update(UpdateVersion),
+    /// Only for Mounted path, drop the file before renaming so that the file will upload.

Review Comment:
   maybe we can directly modify the option of `Override`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398470207


##########
object_store/src/local.rs:
##########
@@ -338,28 +338,39 @@ impl ObjectStore for LocalFileSystem {
         maybe_spawn_blocking(move || {
             let (mut file, suffix) = new_staged_upload(&path)?;
             let staging_path = staged_upload_path(&path, &suffix);
+            let mut e_tag = None;
 
             let err = match file.write_all(&bytes) {
-                Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
-                    PutMode::Create => match std::fs::hard_link(&staging_path, &path) {
-                        Ok(_) => {
-                            let _ = std::fs::remove_file(&staging_path); // Attempt to cleanup
-                            None
+                Ok(_) => {
+                    let metadata = file.metadata().map_err(|e| Error::Metadata {
+                        source: e.into(),
+                        path: path.to_string_lossy().to_string(),
+                    })?;
+                    e_tag = Some(get_etag(&metadata));
+                    match opts.mode {
+                        PutMode::Overwrite => {
+                            std::mem::drop(file);

Review Comment:
   Could we get a comment explaining the purpose of this to future readers, otherwise it is likely to be accidentally removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1817509527

   > What is the motivation for using blobfuse over the first-class support Azure Storage, AFAICT fuse is just adding overhead (and race conditions)
   
   Mount is a very common usage scenario in Microsoft Fabric notebook. And it's use blobfuse to mount ADLS/OneLake storage. The benefits of using mount can be imagined. For example, users can use Azure Storage like local files. Uniform Access, Simplified Data Movement, Ease of Management, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398470413


##########
object_store/src/local.rs:
##########
@@ -338,28 +338,39 @@ impl ObjectStore for LocalFileSystem {
         maybe_spawn_blocking(move || {
             let (mut file, suffix) = new_staged_upload(&path)?;
             let staging_path = staged_upload_path(&path, &suffix);
+            let mut e_tag = None;
 
             let err = match file.write_all(&bytes) {
-                Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
-                    PutMode::Create => match std::fs::hard_link(&staging_path, &path) {
-                        Ok(_) => {
-                            let _ = std::fs::remove_file(&staging_path); // Attempt to cleanup
-                            None
+                Ok(_) => {
+                    let metadata = file.metadata().map_err(|e| Error::Metadata {

Review Comment:
   Does this work with blobfuse, the original description suggested the file had to be closed before the metadata could be retrieved?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398182679


##########
object_store/src/lib.rs:
##########
@@ -1064,6 +1064,8 @@ pub enum PutMode {
     /// Perform an atomic write operation if the current version of the object matches the
     /// provided [`UpdateVersion`], returning [`Error::Precondition`] otherwise
     Update(UpdateVersion),
+    /// Only for Mounted path, drop the file before renaming so that the file will upload.

Review Comment:
   This is a breaking API change is there some way we can avoid this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398209658


##########
object_store/src/lib.rs:
##########
@@ -1064,6 +1064,8 @@ pub enum PutMode {
     /// Perform an atomic write operation if the current version of the object matches the
     /// provided [`UpdateVersion`], returning [`Error::Precondition`] otherwise
     Update(UpdateVersion),
+    /// Only for Mounted path, drop the file before renaming so that the file will upload.

Review Comment:
   You are adding a variant to public enumeration, this is a breaking change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398372598


##########
object_store/src/local.rs:
##########
@@ -341,10 +341,13 @@ impl ObjectStore for LocalFileSystem {
 
             let err = match file.write_all(&bytes) {
                 Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
+                    PutMode::Overwrite => {
+                        std::mem::drop(file);
+                        match std::fs::rename(&staging_path, &path) {
+                            Ok(_) => None,
+                            Err(source) => Some(Error::UnableToRenameFile { source }),
+                        }
+                    }
                     PutMode::Create => match std::fs::hard_link(&staging_path, &path) {

Review Comment:
   Yes, blobfuse,s3fs and goofys all don't support hard link. But they can choose `override` mode.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1826888246

   Marking as draft as waiting on feedback, please feel free to mark ready for review when you would like me to take another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1827617565

   Review comments, listing as "pending", aren't submitted until you complete your review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1817511214

   The problem is fuse does not behave like a filesystem (or an object store), as was commented on the linked issue, there isn't actually a way to support this as hard link is not supported.
   
   I think you will need to use the Azure abstraction, the performance will be orders of magnitude better and you won't run into these sorts of issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1401094599


##########
object_store/src/local.rs:
##########
@@ -338,28 +338,39 @@ impl ObjectStore for LocalFileSystem {
         maybe_spawn_blocking(move || {
             let (mut file, suffix) = new_staged_upload(&path)?;
             let staging_path = staged_upload_path(&path, &suffix);
+            let mut e_tag = None;
 
             let err = match file.write_all(&bytes) {
-                Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
-                    PutMode::Create => match std::fs::hard_link(&staging_path, &path) {
-                        Ok(_) => {
-                            let _ = std::fs::remove_file(&staging_path); // Attempt to cleanup
-                            None
+                Ok(_) => {
+                    let metadata = file.metadata().map_err(|e| Error::Metadata {

Review Comment:
   Can you confirm if this works



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1827310305

   > [#5094 (comment)](https://github.com/apache/arrow-rs/pull/5094#discussion_r1398470413)
   
   Thanks, I have answered. Please check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398330549


##########
object_store/src/local.rs:
##########
@@ -341,10 +341,13 @@ impl ObjectStore for LocalFileSystem {
 
             let err = match file.write_all(&bytes) {
                 Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
+                    PutMode::Overwrite => {
+                        std::mem::drop(file);
+                        match std::fs::rename(&staging_path, &path) {
+                            Ok(_) => None,
+                            Err(source) => Some(Error::UnableToRenameFile { source }),
+                        }
+                    }
                     PutMode::Create => match std::fs::hard_link(&staging_path, &path) {

Review Comment:
   Does this behave correctly on blobfuse, I'm not sure it does



##########
object_store/src/local.rs:
##########
@@ -368,7 +371,7 @@ impl ObjectStore for LocalFileSystem {
                 return Err(err.into());
             }
 
-            let metadata = file.metadata().map_err(|e| Error::Metadata {
+            let metadata = metadata(&path).map_err(|e| Error::Metadata {

Review Comment:
   This is now racy in the event of a concurrent put



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1817760178

   If there is a way to provide this without breaking existing workloads I suppose we could accomodate this. Although to be completely unambiguous I consider this an anti-pattern and would strongly discourage people from trying to pretend object stores are filesystems, it is a deeply problematic approach


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398372373


##########
object_store/src/local.rs:
##########
@@ -368,7 +371,7 @@ impl ObjectStore for LocalFileSystem {
                 return Err(err.into());
             }
 
-            let metadata = file.metadata().map_err(|e| Error::Metadata {
+            let metadata = metadata(&path).map_err(|e| Error::Metadata {

Review Comment:
   Sure, fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1817813253

   > If there is a way to provide this without breaking existing workloads I suppose we could accomodate this. Although to be completely unambiguous I consider this an anti-pattern and would strongly discourage people from trying to pretend object stores are filesystems, it is a deeply problematic approach
   
   Thanks, I actually agree with what you said, putting a Filesystem on Top of an Object Store is a Bad Idea. 
   But sometimes users may not have high requirements for performance, or they may only be able to use the traditional POSIX protocol. This is also one of the meanings of mount. So I think we should not block this usage. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398208982


##########
object_store/src/lib.rs:
##########
@@ -1064,6 +1064,8 @@ pub enum PutMode {
     /// Perform an atomic write operation if the current version of the object matches the
     /// provided [`UpdateVersion`], returning [`Error::Precondition`] otherwise
     Update(UpdateVersion),
+    /// Only for Mounted path, drop the file before renaming so that the file will upload.

Review Comment:
   It is not clear why this is breaking change, if so, maybe we can directly modify the option of `Override`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1827615790

   really weird
   
   ![image](https://github.com/apache/arrow-rs/assets/128118209/43e3e37e-b176-4aee-962e-b29d412d8d78)
   
   ![image](https://github.com/apache/arrow-rs/assets/128118209/a790eee5-2174-4d1e-9484-c81b966b8832)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1827207514

   https://github.com/apache/arrow-rs/pull/5094#discussion_r1398470413


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1826989433

   > Marking as draft as waiting on feedback, please feel free to mark ready for review when you would like me to take another look
   
   Hi @tustvold , any question needs I answer/feedback? Please take a look again, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on code in PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#discussion_r1398585138


##########
object_store/src/local.rs:
##########
@@ -338,28 +338,39 @@ impl ObjectStore for LocalFileSystem {
         maybe_spawn_blocking(move || {
             let (mut file, suffix) = new_staged_upload(&path)?;
             let staging_path = staged_upload_path(&path, &suffix);
+            let mut e_tag = None;
 
             let err = match file.write_all(&bytes) {
-                Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
-                    PutMode::Create => match std::fs::hard_link(&staging_path, &path) {
-                        Ok(_) => {
-                            let _ = std::fs::remove_file(&staging_path); // Attempt to cleanup
-                            None
+                Ok(_) => {
+                    let metadata = file.metadata().map_err(|e| Error::Metadata {

Review Comment:
   Yes, it works with blobfuse. Because when it sends `stat` system call, blobfuse will look for local file cache firstly and it will success.



##########
object_store/src/local.rs:
##########
@@ -338,28 +338,39 @@ impl ObjectStore for LocalFileSystem {
         maybe_spawn_blocking(move || {
             let (mut file, suffix) = new_staged_upload(&path)?;
             let staging_path = staged_upload_path(&path, &suffix);
+            let mut e_tag = None;
 
             let err = match file.write_all(&bytes) {
-                Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
-                    PutMode::Create => match std::fs::hard_link(&staging_path, &path) {
-                        Ok(_) => {
-                            let _ = std::fs::remove_file(&staging_path); // Attempt to cleanup
-                            None
+                Ok(_) => {
+                    let metadata = file.metadata().map_err(|e| Error::Metadata {
+                        source: e.into(),
+                        path: path.to_string_lossy().to_string(),
+                    })?;
+                    e_tag = Some(get_etag(&metadata));
+                    match opts.mode {
+                        PutMode::Overwrite => {
+                            std::mem::drop(file);

Review Comment:
   Sure, thanks



##########
object_store/src/local.rs:
##########
@@ -338,28 +338,39 @@ impl ObjectStore for LocalFileSystem {
         maybe_spawn_blocking(move || {
             let (mut file, suffix) = new_staged_upload(&path)?;
             let staging_path = staged_upload_path(&path, &suffix);
+            let mut e_tag = None;
 
             let err = match file.write_all(&bytes) {
-                Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
-                    PutMode::Create => match std::fs::hard_link(&staging_path, &path) {
-                        Ok(_) => {
-                            let _ = std::fs::remove_file(&staging_path); // Attempt to cleanup
-                            None
+                Ok(_) => {
+                    let metadata = file.metadata().map_err(|e| Error::Metadata {

Review Comment:
   Hi @tustvold, you can see https://github.com/Azure/azure-storage-fuse/blob/master/blobfuse/utilities.cpp#L136
   <img width="1156" alt="image" src="https://github.com/apache/arrow-rs/assets/128118209/4dfbb452-6bd0-4d97-984c-38e7a2b94e85">
   
   For `metadate` request, blobfuse will check local cache firstly and skip remote request. It's different with rename requests.



##########
object_store/src/local.rs:
##########
@@ -338,28 +338,39 @@ impl ObjectStore for LocalFileSystem {
         maybe_spawn_blocking(move || {
             let (mut file, suffix) = new_staged_upload(&path)?;
             let staging_path = staged_upload_path(&path, &suffix);
+            let mut e_tag = None;
 
             let err = match file.write_all(&bytes) {
-                Ok(_) => match opts.mode {
-                    PutMode::Overwrite => match std::fs::rename(&staging_path, &path) {
-                        Ok(_) => None,
-                        Err(source) => Some(Error::UnableToRenameFile { source }),
-                    },
-                    PutMode::Create => match std::fs::hard_link(&staging_path, &path) {
-                        Ok(_) => {
-                            let _ = std::fs::remove_file(&staging_path); // Attempt to cleanup
-                            None
+                Ok(_) => {
+                    let metadata = file.metadata().map_err(|e| Error::Metadata {

Review Comment:
   Yes, I have tested it in local and it worked.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1827528512

   I am not seeing anything 😅


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1817467096

   What is the motivation for using blobfuse over the first-class support Azure Storage, AFAICT fuse is just adding overhead (and race conditions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Fix ObjectStore.LocalFileSystem.put_opts for blobfuse [arrow-rs]

Posted by "RobinLin666 (via GitHub)" <gi...@apache.org>.
RobinLin666 commented on PR #5094:
URL: https://github.com/apache/arrow-rs/pull/5094#issuecomment-1817721569

   Hi @tustvold , thank you very much for your suggestion.
   I think this line **has no impact on the atomicity of the whole function**, but it may affect the performance and error handling. 
   ```rust
   std::mem::drop(file);
   ```
   This line releases the ownership of the file variable, but does not delete or modify the file itself. The file system is only affected when std::fs::rename or std::fs::hard_link are called. These operations are atomic, meaning they either succeed or fail without leaving any intermediate state. Therefore, the code is safe and does not cause file corruption or partial write.
   
   However, this line may have some impact on the performance and error handling. If this line is not added, the file variable will be automatically dropped at the end of the function, which may delay some time. If this line is added, the file variable will be dropped immediately, which may improve some performance. But this also means that the file variable cannot be used for any further operations, such as checking for write errors, or closing the file.
   
   BTW, the put method of object_store Azure abstraction also sends an http request, which is the same behavior as blobfuse, and blobfuse also sends some http requests. However, in this scenario, due to the implementation of local.rs, blobfuse will send some more requests, such as rename, etc. But this does not affect the atomicity of the whole put behavior. Yes, that’s right, in terms of performance, it may indeed be worse than using azure abstraction directly, but for some reason, users may indeed want to use mount to operate (some of the advantages of mount have been mentioned above) and do not care about this point of performance impact. I think we should support this user operation. What do you think?
   
   If I say something wrong, please correct me, because I just started to learn arrow. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org