You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/04 20:55:05 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5477: Minor: Move TableProviderFactories up out of RuntimeEnv and into Sess…

alamb opened a new pull request, #5477:
URL: https://github.com/apache/arrow-datafusion/pull/5477

   # Which issue does this PR close?
   Part of https://github.com/apache/arrow-datafusion/issues/1754
   
   # Rationale for this change
   I am trying to extract the physical_plan code into its own crate; and to do so I need to remove the circular dependencies between core --> datasource --> execution --> datasource
   
   Table factories are used for planning, not for execution, but RuntimeEnv is used for execution (and I am trying to move it into the `datafusion_execution` crate)
   
   See more details in https://github.com/apache/arrow-datafusion/issues/1754#issuecomment-1452438453
   
   # What changes are included in this PR?
   1. Move table_factories off of RuntimeEnv and directly to SessionState
   
   # Are these changes tested?
   Covered by existing tests (and the test changes illustrate what happened to the API). I believe this will affect Ballista (@avantgardnerio ) and delta-rs (cc @roeap )
   
   # Are there any user-facing changes?
   
   Users who are specifying custom TableFactoryProviders have a slightly different API to register them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5477: Minor: Move TableProviderFactories up out of `RuntimeEnv` and into `SessionState`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5477:
URL: https://github.com/apache/arrow-datafusion/pull/5477#issuecomment-1458941590

   Thank you for the review @avantgardnerio 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5477: Minor: Move TableProviderFactories up out of `RuntimeEnv` and into `SessionState`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5477:
URL: https://github.com/apache/arrow-datafusion/pull/5477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5477: Minor: Move TableProviderFactories up out of RuntimeEnv and into Sess…

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5477:
URL: https://github.com/apache/arrow-datafusion/pull/5477#discussion_r1125553764


##########
datafusion/core/src/execution/context.rs:
##########
@@ -278,6 +282,15 @@ impl SessionContext {
         self.session_id.clone()
     }
 
+    /// Return the [`TableFactoryProvider`] that is registered for the

Review Comment:
   ```suggestion
       /// Return the [`TableProviderFactory`] that is registered for the
   ```



##########
datafusion/core/src/execution/runtime_env.rs:
##########
@@ -129,24 +114,12 @@ pub struct RuntimeConfig {
     pub memory_pool: Option<Arc<dyn MemoryPool>>,
     /// ObjectStoreRegistry to get object store based on url
     pub object_store_registry: Arc<ObjectStoreRegistry>,
-    /// Custom table factories for things like deltalake that are not part of core datafusion
-    pub table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,
 }
 
 impl RuntimeConfig {
     /// New with default values
     pub fn new() -> Self {
-        let mut table_factories: HashMap<String, Arc<dyn TableProviderFactory>> =
-            HashMap::new();
-        table_factories.insert("PARQUET".into(), Arc::new(ListingTableFactory::new()));

Review Comment:
   this construction now happens as part of creating SessionState



##########
datafusion/core/tests/sql/create_drop.rs:
##########
@@ -126,12 +129,14 @@ async fn create_custom_table() -> Result<()> {
 
 #[tokio::test]
 async fn create_external_table_with_ddl() -> Result<()> {
-    let mut cfg = RuntimeConfig::new();
-    cfg.table_factories
-        .insert("MOCKTABLE".to_string(), Arc::new(TestTableFactory {}));
+    let cfg = RuntimeConfig::new();
     let env = RuntimeEnv::new(cfg).unwrap();
     let ses = SessionConfig::new();
-    let ctx = SessionContext::with_config_rt(ses, Arc::new(env));
+    let mut state = SessionState::with_config_rt(ses, Arc::new(env));

Review Comment:
   Here is the new pattern of how to add a table provider factory



##########
datafusion/core/src/execution/runtime_env.rs:
##########
@@ -22,10 +22,7 @@ use crate::{
     error::Result,
     execution::disk_manager::{DiskManager, DiskManagerConfig},
 };
-use std::collections::HashMap;
 
-use crate::datasource::datasource::TableProviderFactory;

Review Comment:
   The whole point of the PR is to remove the `datasource` dependency from the `runtime_env` module (as I am trying to move the `runtime_env` module into a new crate, `datafusion-execution`, that does not depend on `datasource`)
   
   I plan to move object_store registry in a separate PR to keep the reviews smaller
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org