You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "dimaryaz (via GitHub)" <gi...@apache.org> on 2024/03/01 03:04:25 UTC

[I] object-store: Path::from should not percent-encode its input [arrow-rs]

dimaryaz opened a new issue, #5446:
URL: https://github.com/apache/arrow-rs/issues/5446

   **Describe the bug**
   Suppose I have a file name with a special character such as `%.txt`. When percent-encoded, it would be `%25.txt`. What is the correct way to represent it using a `Path` object?
   
   The following three all produce the same result, with `Path::raw` value being the un-encoded `%.txt` - which seems to indicate that is the expected behavior:
   ```
   let p1 = Path::from_url_path("%25.txt").unwrap();
   let p2 = Path::parse("%.txt").unwrap();
   let p3 = Path::from_filesystem_path("%.txt").unwrap();  // if it actually exists
   ```
   The third one in particular implies that there should not be any encoding done (otherwise, it will fail to find the file).
   
   However, `Path::from("%.txt")` _will_ percent-encode its input (and it is even documented), resulting in the "raw" value being `%25.txt`. In fact, there is no possible value that could be passed into `Path::from` to create a `Path` object for the `%.txt` file.
   
   What this means is:
   - `Path::from` is inconsistent with the other APIs
   - Because of the `From` trait, it is easy to call it by accident by using e.g. `"%.txt".into()`
   - There is really no use case for it, since other APIs treat the "raw" value as the original, not percent-encoded value
   
   Therefore, I believe this is a bug in the API.
   
   **To Reproduce**
   Call `Path::from("%.txt")`.
   
   **Expected behavior**
   Should produce the "raw" value of `%.txt`, not `%25.txt`.
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] object-store: Path::from should not percent-encode its input [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #5446:
URL: https://github.com/apache/arrow-rs/issues/5446#issuecomment-1972533899

   As you state in your issue, this behaviour is documented and explained [here](https://docs.rs/object_store/latest/object_store/path/struct.Path.html#encode), it is not a bug and is intentional. The rationale for performing percent-encoding is to obtain a safe path that will work across a majority of storage backends. This is a common pattern when storing arbitrary data in object_stores, and is widely used by systems like Hive.
   
   > Path::from is inconsistent with the other APIs
   
   We by default encode because it is guaranteed to infallibly yield a valid `Path` that will work with all stores. We do, however, provide fallible conversions from various other path representations that may be more or less opinionated.
   
   Whilst I recognise opinions may differ on the validity of this approach, ultimately this is the approach that this crate adopted, and it would be extremely disruptive to revisit this now. On a related note, I might suggest that filing an issue attesting a clearly documented behaviour to actually be a bug, is not a fantastic way to engender support for your proposal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] object-store: Path::from should not percent-encode its input [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #5446: object-store: Path::from should not percent-encode its input
URL: https://github.com/apache/arrow-rs/issues/5446


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] object-store: Path::from should not percent-encode its input [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #5446:
URL: https://github.com/apache/arrow-rs/issues/5446#issuecomment-2002159363

   Closing this as I believe the question has been answered


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org