You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opendal.apache.org by "cuichenli (via GitHub)" <gi...@apache.org> on 2023/03/30 05:16:31 UTC

[GitHub] [incubator-opendal] cuichenli opened a new pull request, #1801: feat(gcs): add support for gcs append

cuichenli opened a new pull request, #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801

   Fixes #1452
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] cuichenli commented on a diff in pull request #1801: feat(gcs): add support for gcs append

Posted by "cuichenli (via GitHub)" <gi...@apache.org>.
cuichenli commented on code in PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#discussion_r1152822268


##########
core/src/services/gcs/backend.rs:
##########
@@ -625,6 +641,85 @@ impl GcsBackend {
 
         self.client.send_async(req).await
     }
+
+    async fn gcs_initiate_multipart_upload(
+        &self,
+        path: &str,
+    ) -> Result<Response<IncomingAsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!("{}/{}/{}?uploads", self.endpoint, self.bucket, p);
+        let mut req = Request::post(&url)
+            .body(AsyncBody::Empty)
+            .map_err(new_request_build_error)?;
+        self.signer.sign(&mut req).map_err(new_request_sign_error)?;
+        self.client.send_async(req).await
+    }
+
+    pub fn gcs_upload_part_request(
+        &self,
+        path: &str,
+        upload_id: &str,
+        part_number: usize,
+        size: Option<u64>,
+        body: AsyncBody,
+    ) -> Result<Request<AsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!(
+            "{}/{}/{}?partNumber={}&uploadId={}",

Review Comment:
   Just quickly checked the resumable upload document, and noticed that to use it we need to either:
   1. know the file size in advance
   2. or indicate the last part by the user (with streaming upload
   >Since you don't know the total file size until you get to the final chunk, use a * for the total file size in the Content-Range header of intermediate chunks.
   >
   > For example, if the first chunk you upload has a size of 512 KiB, the Content-Range header for the chunk is bytes 0-524287/*. If your upload has 64000 bytes remaining after the first chunk, you then send a final chunk that contains the remaining bytes and has a Content-Range header with the value bytes 524288-588287/588288.
   
   I don't think the method 1 is feasible, for method 2, we then need to add one extra parameter for the `append` method  to indicate the last part.
   
   Alternatively, we can try in the `close` method upload one empty chunk - I am not sure if that is feasible yet though.
   
   Anyway, I will convert this PR to draft for now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] Xuanwo commented on a diff in pull request #1801: feat(gcs): add support for gcs append

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo commented on code in PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#discussion_r1167673694


##########
core/src/services/gcs/backend.rs:
##########
@@ -529,6 +546,23 @@ struct GetObjectJsonResponse {
     content_type: String,
 }
 
+#[derive(Clone, Default, Debug, Serialize)]
+#[serde(default, rename_all = "PascalCase")]
+pub struct CompleteMultipartUploadRequestPart {

Review Comment:
   DO we still need this?



##########
core/src/services/gcs/writer.rs:
##########
@@ -32,11 +32,28 @@ pub struct GcsWriter {
 
     op: OpWrite,
     path: String,
+    upload_location: Option<String>,
+    already_uploaded_chunk: u64,

Review Comment:
   `chunk` is not a native idea in gcs. How about use `written` to represents bytes that already written into gcs?



##########
core/src/services/gcs/writer.rs:
##########
@@ -32,11 +32,28 @@ pub struct GcsWriter {
 
     op: OpWrite,
     path: String,
+    upload_location: Option<String>,

Review Comment:
   How about use `location`?



##########
core/src/services/gcs/writer.rs:
##########
@@ -32,11 +32,28 @@ pub struct GcsWriter {
 
     op: OpWrite,
     path: String,
+    upload_location: Option<String>,
+    already_uploaded_chunk: u64,
+    last_chunk_uploaded: bool,

Review Comment:
   How about use `last: Option<Bytes>` instead?
   
   Everytime users called `append`, we do the following thing:
   
   - If last is none, store `bs` into last.
   - If last is some, upload last to gcs, and replace with `bs`
   
   While users call `close`:
   
   - If last is none, users not call `append`, we can simply ignore this upload (I'm not sure about this).
   - If last is some, upload this as last chunk.
   
   This way, no extra data need to be resent.



##########
core/src/services/gcs/backend.rs:
##########
@@ -410,16 +411,32 @@ impl Accessor for GcsBackend {
     }
 
     async fn write(&self, path: &str, args: OpWrite) -> Result<(RpWrite, Self::Writer)> {
-        if args.append() {
-            return Err(Error::new(
-                ErrorKind::Unsupported,
-                "append write is not supported",
-            ));
-        }
+        let upload_location = if args.append() {
+            let resp = self.core.gcs_initiate_resumable_upload(path).await?;
+            let status = resp.status();
+
+            match status {
+                StatusCode::OK => {
+                    let bs = resp

Review Comment:
   We can use [parse_location](https://github.com/apache/incubator-opendal/blob/main/core/src/raw/http_util/header.rs#L45)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] Xuanwo commented on pull request #1801: feat(gcs): add support for gcs append

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo commented on PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#issuecomment-1492805082

   Hello @cuichenli, thank you for submitting this PR!
   
   I will temporarily close the PR to give you more space to improve it. By closing on-going (instead of on-reviewing) PR allow both of us foucing our own work. Please note that closing the PR does not mean rejection; feel free to reopen it when you are ready for further review or feedback.
   
   Thank you again and have a great day!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] Xuanwo commented on pull request #1801: feat(gcs): add support for gcs append

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo commented on PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#issuecomment-1508164257

   I apologize for closing this pull request. Our policy has been updated, and we will no longer close pull requests so hastily. Additionally, I have recently learned that once a maintainer closes a pull request, it cannot be reopened.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] Xuanwo commented on a diff in pull request #1801: feat(gcs): add support for gcs append

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo commented on code in PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#discussion_r1152753889


##########
core/src/services/gcs/backend.rs:
##########
@@ -625,6 +641,85 @@ impl GcsBackend {
 
         self.client.send_async(req).await
     }
+
+    async fn gcs_initiate_multipart_upload(
+        &self,
+        path: &str,
+    ) -> Result<Response<IncomingAsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!("{}/{}/{}?uploads", self.endpoint, self.bucket, p);
+        let mut req = Request::post(&url)
+            .body(AsyncBody::Empty)
+            .map_err(new_request_build_error)?;
+        self.signer.sign(&mut req).map_err(new_request_sign_error)?;
+        self.client.send_async(req).await
+    }
+
+    pub fn gcs_upload_part_request(
+        &self,
+        path: &str,
+        upload_id: &str,
+        part_number: usize,
+        size: Option<u64>,
+        body: AsyncBody,
+    ) -> Result<Request<AsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!(
+            "{}/{}/{}?partNumber={}&uploadId={}",

Review Comment:
   This API is part of gcs's XML API is can't be used here. We should use gcs's JSON API. For example: https://cloud.google.com/storage/docs/resumable-uploads



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] Xuanwo commented on a diff in pull request #1801: feat(gcs): add support for gcs append

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo commented on code in PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#discussion_r1152802816


##########
core/src/services/gcs/backend.rs:
##########
@@ -625,6 +641,85 @@ impl GcsBackend {
 
         self.client.send_async(req).await
     }
+
+    async fn gcs_initiate_multipart_upload(
+        &self,
+        path: &str,
+    ) -> Result<Response<IncomingAsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!("{}/{}/{}?uploads", self.endpoint, self.bucket, p);
+        let mut req = Request::post(&url)
+            .body(AsyncBody::Empty)
+            .map_err(new_request_build_error)?;
+        self.signer.sign(&mut req).map_err(new_request_sign_error)?;
+        self.client.send_async(req).await
+    }
+
+    pub fn gcs_upload_part_request(
+        &self,
+        path: &str,
+        upload_id: &str,
+        part_number: usize,
+        size: Option<u64>,
+        body: AsyncBody,
+    ) -> Result<Request<AsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!(
+            "{}/{}/{}?partNumber={}&uploadId={}",

Review Comment:
   The `gcs` services are designed to utilize the native JSON API instead of the S3 compatible API. While it is possible to use S3's multi-platform API directly, it should be noted that users can use the `s3` service already.



##########
core/src/services/gcs/backend.rs:
##########
@@ -625,6 +641,85 @@ impl GcsBackend {
 
         self.client.send_async(req).await
     }
+
+    async fn gcs_initiate_multipart_upload(
+        &self,
+        path: &str,
+    ) -> Result<Response<IncomingAsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!("{}/{}/{}?uploads", self.endpoint, self.bucket, p);
+        let mut req = Request::post(&url)
+            .body(AsyncBody::Empty)
+            .map_err(new_request_build_error)?;
+        self.signer.sign(&mut req).map_err(new_request_sign_error)?;
+        self.client.send_async(req).await
+    }
+
+    pub fn gcs_upload_part_request(
+        &self,
+        path: &str,
+        upload_id: &str,
+        part_number: usize,
+        size: Option<u64>,
+        body: AsyncBody,
+    ) -> Result<Request<AsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!(
+            "{}/{}/{}?partNumber={}&uploadId={}",

Review Comment:
   The `gcs` services are designed to utilize the native JSON API instead of the S3 compatible API. While it is possible to use S3's multipart API directly, it should be noted that users can use the `s3` service already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] Xuanwo closed pull request #1801: feat(gcs): add support for gcs append

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo closed pull request #1801: feat(gcs): add support for gcs append
URL: https://github.com/apache/incubator-opendal/pull/1801


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] cuichenli commented on pull request #1801: feat(gcs): add support for gcs append

Posted by "cuichenli (via GitHub)" <gi...@apache.org>.
cuichenli commented on PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#issuecomment-1509882067

   > I apologize for closing this pull request. Our policy has been updated, and we will no longer close pull requests so hastily. Additionally, I have recently learned that once a maintainer closes a pull request, it cannot be reopened.
   
   No problem at all! I apologize for the delay in updating the PR as I have been busy with other tasks.
   
   I have made some changes to use the streaming API in the latest commit. However, I must admit that the implementation is not optimal. The main issue arises during the final chunk, where we need to specify the final chunk size. Since this size is unknown until the user invokes the close function, the implementation became a bit tricky. I attempted to send a POST request with an empty body, but unfortunately, it did not work. As an alternative, I recorded the last copy sent and when the user invokes the close function, I resend it with the correct headers. I am uncertain whether this approach is acceptable, but I cannot think of a better solution at this time. 
   
   Please take a look. Thanks
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] cuichenli commented on pull request #1801: feat(gcs): add support for gcs append

Posted by "cuichenli (via GitHub)" <gi...@apache.org>.
cuichenli commented on PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#issuecomment-1513150934

   I have updated based on the comments, please review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] Xuanwo merged pull request #1801: feat(gcs): add support for gcs append

Posted by "Xuanwo (via GitHub)" <gi...@apache.org>.
Xuanwo merged PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-opendal] cuichenli commented on a diff in pull request #1801: feat(gcs): add support for gcs append

Posted by "cuichenli (via GitHub)" <gi...@apache.org>.
cuichenli commented on code in PR #1801:
URL: https://github.com/apache/incubator-opendal/pull/1801#discussion_r1152790233


##########
core/src/services/gcs/backend.rs:
##########
@@ -625,6 +641,85 @@ impl GcsBackend {
 
         self.client.send_async(req).await
     }
+
+    async fn gcs_initiate_multipart_upload(
+        &self,
+        path: &str,
+    ) -> Result<Response<IncomingAsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!("{}/{}/{}?uploads", self.endpoint, self.bucket, p);
+        let mut req = Request::post(&url)
+            .body(AsyncBody::Empty)
+            .map_err(new_request_build_error)?;
+        self.signer.sign(&mut req).map_err(new_request_sign_error)?;
+        self.client.send_async(req).await
+    }
+
+    pub fn gcs_upload_part_request(
+        &self,
+        path: &str,
+        upload_id: &str,
+        part_number: usize,
+        size: Option<u64>,
+        body: AsyncBody,
+    ) -> Result<Request<AsyncBody>> {
+        let p = build_abs_path(&self.root, path);
+        let url = format!(
+            "{}/{}/{}?partNumber={}&uploadId={}",

Review Comment:
   Sorry can you elaborate about why XML API can not be used here? I have tested locally and looks like it is working correctly. Thanks.
   
   By testing locally I mean invoking the newly added methods to finish one multipart upload.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@opendal.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org