You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/14 13:04:38 UTC

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2906: Introduce ObjectStoreSelfDetector for detector an object store based on the url

tustvold commented on code in PR #2906:
URL: https://github.com/apache/arrow-datafusion/pull/2906#discussion_r921122262


##########
datafusion/core/src/datasource/object_store.rs:
##########
@@ -81,10 +81,19 @@ impl std::fmt::Display for ObjectStoreUrl {
     }
 }
 
+/// Object store self detector can detector an object store based on the url
+pub trait ObjectStoreSelfDetector: Send + Sync + 'static {

Review Comment:
   I wonder if we might call this `ObjectStoreSchemeProvider`, to make it more pluggable??



##########
datafusion/core/src/datasource/object_store.rs:
##########
@@ -81,10 +81,19 @@ impl std::fmt::Display for ObjectStoreUrl {
     }
 }
 
+/// Object store self detector can detector an object store based on the url
+pub trait ObjectStoreSelfDetector: Send + Sync + 'static {
+    /// Detector a suitable object store based on its url if possible
+    /// Return the key and object store
+    fn get_by_url(&self, url: &Url) -> Option<(String, Arc<dyn ObjectStore>)>;
+}
+
 /// Object store registry
+#[derive(Clone)]
 pub struct ObjectStoreRegistry {
     /// A map from scheme to object store that serve list / read operations for the store
-    object_stores: RwLock<HashMap<String, Arc<dyn ObjectStore>>>,
+    object_stores: Arc<RwLock<HashMap<String, Arc<dyn ObjectStore>>>>,
+    self_detector: Option<Arc<dyn ObjectStoreSelfDetector>>,

Review Comment:
   See above comment, I wonder if instead we make this a `HashMap<String, Arc<dyn ObjectStoreSchemeProvider>>` keyed by url scheme



##########
datafusion/core/src/datasource/object_store.rs:
##########
@@ -132,19 +149,51 @@ impl ObjectStoreRegistry {
     ///
     /// - URL with scheme `file:///` or no schema will return the default LocalFS store
     /// - URL with scheme `s3://bucket/` will return the S3 store if it's registered
+    /// - URL with scheme `hdfs://hostname:port` will return the hdfs store if it's registered
     ///
     pub fn get_by_url(&self, url: impl AsRef<Url>) -> Result<Arc<dyn ObjectStore>> {
         let url = url.as_ref();
-        let s = &url[url::Position::BeforeScheme..url::Position::AfterHost];
-        let stores = self.object_stores.read();
-        let store = stores.get(s).ok_or_else(|| {
-            DataFusionError::Internal(format!(
-                "No suitable object store found for {}",
-                url
-            ))
-        })?;
-
-        Ok(store.clone())
+        // First check whether can get object store from registry
+        let store = {
+            let stores = self.object_stores.read();
+            let s = &url[url::Position::BeforeScheme..url::Position::BeforeHost];

Review Comment:
   I'm not sure about this, the host should not be and is not part of the path passed to `ObjectStore` implementations



##########
datafusion/core/src/datasource/listing/helpers.rs:
##########
@@ -162,7 +162,7 @@ pub fn split_files(
 pub async fn pruned_partition_list<'a>(
     store: &'a dyn ObjectStore,
     table_path: &'a ListingTableUrl,
-    filters: &[Expr],
+    filters: &'a [Expr],

Review Comment:
   Why this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org