You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/08 03:07:33 UTC

[GitHub] [arrow-datafusion] yahoNanJing opened a new pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

yahoNanJing opened a new pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1778.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   There are two changes in this PR.
   - The key of object store considers the host and port rather than just scheme.
   - The returned path value of get_by_uri is self-described with information of scheme, host, port and path in the object store.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#issuecomment-1032230025


   @yahoNanJing looks like there are some test failures that need to be addressed as well. Changes in `get_by_uri` looks good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r801279141



##########
File path: datafusion/src/datasource/object_store/local.rs
##########
@@ -39,6 +39,11 @@ pub struct LocalFileSystem;
 #[async_trait]
 impl ObjectStore for LocalFileSystem {
     async fn list_file(&self, prefix: &str) -> Result<FileMetaStream> {
+        let prefix = if let Some((_scheme, path)) = prefix.split_once("://") {

Review comment:
       Got it. Shouldn't this vip host:port be managed as an attribute in the hdfs object store instance? for example, when we create the object store instance and registry it in the registry, we specify the vip, then we only need to provide the object path in list and get object store calls. Internally when the object store handles the list operation, it will perform the pass in the vip host to the hdfs client. Basically my thinking on this is if we have an one to one mapping between an objectstore instance and vip, then we shouldn't need to pass in the vip as part of the object key in these method calls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r802237474



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's registered
+    /// Returns a tuple with the store and the self-described uri of the file in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       @alamb, I totally agree with you, since the uri components may differ for different object store instance. That's why I prefer to keep the whole uri info in the return value. However, here this PR mainly focus on the key of hash map for object stores. The scheme info is not enough. In general, a remote object store can be identified by scheme://host:port




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r801262929



##########
File path: datafusion/src/datasource/object_store/local.rs
##########
@@ -39,6 +39,11 @@ pub struct LocalFileSystem;
 #[async_trait]
 impl ObjectStore for LocalFileSystem {
     async fn list_file(&self, prefix: &str) -> Result<FileMetaStream> {
+        let prefix = if let Some((_scheme, path)) = prefix.split_once("://") {

Review comment:
       if objecstore instances are already sharded by host and port based on changes in `get_by_uri`, shouldn't we only accept path of the uri in this method? In other words, I think we can safely assume for a given object store instance, all requests should be referencing the same port and host right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r801987584



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's registered
+    /// Returns a tuple with the store and the self-described uri of the file in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       I would personally prefer to keep any `uri` parsing in the datafusion crate as simple as possible and leave more sophisticated uri interpretation to the actual `ObjectStoreInstance`. 
   
   Among other things this makes it easier to implement arbitrary `ObjectStoreInstance`
   
   For the usecase mentioned in https://github.com/apache/arrow-datafusion/issues/1778 I wonder if you could write a wrapping object store like:
   
   ```rust
   struct HostAwareHDFS {
   
   }
   
   impl ObjectStore for HostAwareHDFS {
       async fn list_file(&self, prefix: &str) -> Result<FileMetaStream> {
         /// form valid hdfs url:
         let hdfs_url = format!("hdfs://{}", prefix);
         match Url::parse(&hdfs_url) {
           Ok(url) => { 
              self.get_object_store_for_host(url.host()).list_file(prefix)?
              ...
          Err(..) ...
       }
   ...
   }
   ```
   
   
   In other words, push the interpretation of urls into the ObjectStore
   
   
   If this won't work, I would suggest passing the entire parsed url into `store.get()` rather than some synthetic key that is datafusion specific

##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's registered

Review comment:
       URI schemes also allow for `username` and `password` 
   
   e.g. `s3://user:pass@host:port`
   
   https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1779: The returned path value of get_by_uri should be self-described with entire path

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#issuecomment-1038977209


   Thank you @yahoNanJing  -- looks much better to me now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] seddonm1 commented on pull request #1779: The returned path value of get_by_uri should be self-described with entire path

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#issuecomment-1039463660


   Thank you for the heads up @alamb. I had identified this issue last week but didn't have time to dig deeper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r803128504



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's registered
+    /// Returns a tuple with the store and the self-described uri of the file in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       I should point out that my reason for disliking adding `host` and `port` to the object store key is that it doesn't make sense for many types of object stores (such as `LocalFileSystem` or `S3` which have no notion of host/port). It seems like this change is too HDFS specific and can be accomplished in a different way




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1779: The returned path value of get_by_uri should be self-described with entire path

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] seddonm1 commented on pull request #1779: The returned path value of get_by_uri should be self-described with entire path

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#issuecomment-1039463660


   Thank you for the heads up @alamb. I had identified this issue last week but didn't have time to dig deeper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r801322675



##########
File path: datafusion/src/datasource/object_store/local.rs
##########
@@ -39,6 +39,11 @@ pub struct LocalFileSystem;
 #[async_trait]
 impl ObjectStore for LocalFileSystem {
     async fn list_file(&self, prefix: &str) -> Result<FileMetaStream> {
+        let prefix = if let Some((_scheme, path)) = prefix.split_once("://") {

Review comment:
       For whether we need to include the vip info in the path, I prefer to include it especially when running with Ballista for distributed execution. Then we will be able to do self-registration or self-detection based on the uri without transfer the object store. I also mentioned this in #1702.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r803570443



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's registered
+    /// Returns a tuple with the store and the self-described uri of the file in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       > I still think the returned path should include the original scheme info rather than throw it away. Then for the object store instance, it will be able to deal with the path for different kinds of schemes. What do you think?
   
   I think returning the entire path, rather than stripping away the `scheme` makes a lot of sense 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on pull request #1779: The returned path value of get_by_uri should be self-described with entire path

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#issuecomment-1038815163


   Hi @alamb and @houqp, could you help recheck this PR? Now it's only for changing the returned path value of get_by_uri.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r801275370



##########
File path: datafusion/src/datasource/object_store/local.rs
##########
@@ -39,6 +39,11 @@ pub struct LocalFileSystem;
 #[async_trait]
 impl ObjectStore for LocalFileSystem {
     async fn list_file(&self, prefix: &str) -> Result<FileMetaStream> {
+        let prefix = if let Some((_scheme, path)) = prefix.split_once("://") {

Review comment:
       For hdfs, there's a HA mechanism to provide a kind of vip host name. The hostname is not a name for a real host, by which ping will not work. However, the dependent object store client is able to recognize that name to direct requests to the real host.
   
   I think this part should be the capability of remote object store, either by the way of HDFS or providing a vip service.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#issuecomment-1032233623


   > @yahoNanJing looks like there are some test failures that need to be addressed as well. Changes in get_by_uri looks good to me.
   
   It's Windows issue. Let me fix it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r803127610



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's registered
+    /// Returns a tuple with the store and the self-described uri of the file in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       I think the distinction I am trying to draw is that the current Object Store API is mapped by scheme and it would be up to the object store implementation to figure out how to handle host/port information
   
   So rather than having one `HDFSObjectStore` instance for `server1:8000` and a second `HDFSObjectStore` instance for `server2:8290`, there would be a single `HDFSObjectStore` that would need to know how to dispatch appropriately to the different server hosts / ports
   
   The same basic pattern holds for file systems (for example, there is a single `LocalFileSyetem` instance even though the local file system might have different disks mounted to `/data` and `/data2`).
   
   I think also it would hold for S3 and other types of object stores (where depending on the region you need to request to a different endpoint)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on a change in pull request #1779: Improve object store key with considering host and port in ObjectStoreRegistry

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r803258324



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's registered
+    /// Returns a tuple with the store and the self-described uri of the file in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       Thanks @alamb. Finally I get your point. To use the object store, there are three hierarchies: ObjectStoreRegistry -> ObjectStore -> ObjectStoreInstance. The ObjectStore is just for managing one kind of object store rather than being the instance. Previously I misunderstood it as a real instance.
   
   One more question, for some object store, they support many kinds of schemes. For example, HDFS support:
   - file://
   - hdfs://
   - viewfs://
   
   I still think the returned path should include the original scheme info rather than throw it away. Then for the object store instance, it will be able to deal with the path for different kinds of schemes. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org