You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/06 15:26:06 UTC

[GitHub] [arrow-datafusion] tustvold commented on issue #2445: ObjectStore Directory Semantics

tustvold commented on issue #2445:
URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119739041

   > Are there examples of ObjectStore implementations
   
   I'm not sure what you mean by this, but object stores are really just key value stores with a vaguely RESTful API, i.e.
   
   * PutObject - associate an object (set of bytes) with a string key, replacing any existing value
   * GetObject - get the object associated with a key
   * CopyObject - copy the object associated with one key, to another
   * ListObjects - list the keys with a given prefix
   * DeleteObject - delete the value with a given key
   
   There are more complex APIs for things like multipart uploads, bucket creation, etc... but in terms of what a client would be interested in that is the entirety of the API. To put it another way, **the interface of object storage is significantly less expressive than that of a filesystem**.
   
   Trying to make object storage behave exactly like a filesystem is impossible (e.g. S3 doesn't support CreateIfNotExists), however, my thesis is that no query engine actually wants filesystem semantics, and this is why these linked abstractions **kind of** work (https://github.com/apache/arrow-datafusion/issues/2205#issuecomment-1100069800).
   
   My suggestion is that by instead implementing the less expressive object storage semantics, we can avoid a whole host of funky edge-cases around directories, paths, etc...
   
   > in a way compatible with other systems that may use a FileSystem approach
   
   Could you expand on what you mean by this, do you mean being able to read data written by another system which should be trivial, or are you talking about some sort of API-level integration like FFI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org