You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jychen7 (via GitHub)" <gi...@apache.org> on 2023/02/02 03:32:54 UTC

[GitHub] [arrow-rs] jychen7 opened a new issue, #3651: object_store: support encoded path

jychen7 opened a new issue, #3651:
URL: https://github.com/apache/arrow-rs/issues/3651

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   while passing an percentage encoded Path to object store, it re-encode before get object.
   e.g.
   
   S3
   https://github.com/apache/arrow-rs/blob/2b9bbce44abbd93048c674f49c4eb0db72a0a1c8/object_store/src/aws/client.rs#L459
   
   GCS
   https://github.com/apache/arrow-rs/blob/2b9bbce44abbd93048c674f49c4eb0db72a0a1c8/object_store/src/gcp/mod.rs#L270
   
   a workaround is to pass decoded path from upstream application, even though upstream already receive encoded path.
   
   e.g. https://github.com/roapi/roapi/issues/98#issuecomment-1409869735
   
   **Describe the solution you'd like**
   ideally, path should just need encode once. One idea is to improve `Path` to include new attribute called 'percent_encoded'
   
   https://github.com/apache/arrow-rs/blob/2b9bbce44abbd93048c674f49c4eb0db72a0a1c8/object_store/src/path/mod.rs#L135-L138
   
   **Describe alternatives you've considered**
   auto detect whether already percentage encode?
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jychen7 commented on issue #3651: object_store: support encoded path as input

Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #3651:
URL: https://github.com/apache/arrow-rs/issues/3651#issuecomment-1414505284

   > if you call Path::parse("blogs space.parquet") everything should work?
   
   yes, it work
   
   > What is percent encoding this input
   > Is it possible something is receiving a URL and not decoding it properly, and just handing off the raw path?
   
   this is from ROAPI config, the user input can be already percent encoded. ROAPI currently decode it before passing to `object_store`. I am wondering whether `object_store` can support receiving encoded path, so from end to end, we don't need additional `decode -> encode`
   More detail here: https://github.com/roapi/roapi/issues/98#issuecomment-1409869735


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3651: object_store: support encoded path as input

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3651:
URL: https://github.com/apache/arrow-rs/issues/3651#issuecomment-1414512964

   > can support receiving encoded path
   
   Adding a `Path::from_url_path` that accepts a `Url` argument makes sense to me, I would be happy to review a PR that adds this :+1: 
   
   This would still need to decode the path, in order to ensure [path safety](https://docs.rs/object_store/latest/object_store/path/struct.Path.html#path-safety), but would save users that complexity.
   
   FWIW overheads of encoding and [signing](https://docs.aws.amazon.com/general/latest/gr/create-signed-request.html#create-canonical-request) the requests will dominate any additional latency resulting from this encoding dance, and both will be completely dominated by the latency of the store itself (typically in the 10s or 100s of milliseconds).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3651: object_store: support encoded path

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3651:
URL: https://github.com/apache/arrow-rs/issues/3651#issuecomment-1414496548

   > However, when client GET object, it will re-encode at L246 -> L211 -> L459 below
   
   Yes, this is so that the path of the created object matches exactly the `Path` provided. So if you provide `blogs%20space.parquet` that is what will be created in object storage.
   
   > Path::parse("blogs%20space.parquet")
   
   What is percent encoding this input, if you call `Path::parse("blogs space.parquet")` everything should work?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jychen7 commented on issue #3651: object_store: support encoded path as input

Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #3651:
URL: https://github.com/apache/arrow-rs/issues/3651#issuecomment-1417980768

   > I would be happy to review a PR that adds this
   
   thank you, I have draft the PR here: https://github.com/apache/arrow-rs/pull/3663, waiting CI to start


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jychen7 commented on issue #3651: object_store: support encoded path

Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #3651:
URL: https://github.com/apache/arrow-rs/issues/3651#issuecomment-1414483081

   @tustvold yes, I have tried.
   
   I understand that `Path::parse("blogs%20space.parquet")` will NOT re-encode, the result is `Path { raw: "blogs%20space.parquet" }`.
   
   However, when client GET object, it will re-encode at L246 -> L211 -> L459 below
   https://github.com/apache/arrow-rs/blob/2b9bbce44abbd93048c674f49c4eb0db72a0a1c8/object_store/src/aws/client.rs#L237-L246
   https://github.com/apache/arrow-rs/blob/2b9bbce44abbd93048c674f49c4eb0db72a0a1c8/object_store/src/aws/client.rs#L209-L213
   https://github.com/apache/arrow-rs/blob/2b9bbce44abbd93048c674f49c4eb0db72a0a1c8/object_store/src/aws/client.rs#L458-L460
   
   I imagine L211 need to know whether input `Path` is already encoded, either via new attribute or auto detect


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3651: object_store: support encoded path

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3651:
URL: https://github.com/apache/arrow-rs/issues/3651#issuecomment-1413552479

   Have you tried using `Path::parse`, this shouldn't re-encode


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3651: object_store: support encoded path as input

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #3651: object_store: support encoded path as input
URL: https://github.com/apache/arrow-rs/issues/3651


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3651: object_store: support encoded path as input

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3651:
URL: https://github.com/apache/arrow-rs/issues/3651#issuecomment-1426076811

   `label_issue.py` automatically added labels {'object-store'} from #3663


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org